Postgres: auto vacuum during a huge query
In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.
An example of calculation is:
UPDATE mytable
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;
the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.
When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min
But for 15 000 000, it didn't finish after many hours...
I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update
ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);
But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?
postgresql autovacuum
add a comment |
In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.
An example of calculation is:
UPDATE mytable
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;
the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.
When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min
But for 15 000 000, it didn't finish after many hours...
I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update
ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);
But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?
postgresql autovacuum
1
If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values
– a_horse_with_no_name
Nov 14 '18 at 11:01
There are more complicated calculations, using join, that would make the view too slow
– Rolintocour
Nov 14 '18 at 11:08
2
Then show us the real update that includes the JOINs
– a_horse_with_no_name
Nov 14 '18 at 11:11
Autovacuum doesn't blockUPDATE
processing, so it can't be the problem.
– Laurenz Albe
Nov 14 '18 at 11:22
add a comment |
In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.
An example of calculation is:
UPDATE mytable
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;
the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.
When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min
But for 15 000 000, it didn't finish after many hours...
I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update
ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);
But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?
postgresql autovacuum
In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.
An example of calculation is:
UPDATE mytable
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;
the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.
When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min
But for 15 000 000, it didn't finish after many hours...
I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update
ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);
But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?
postgresql autovacuum
postgresql autovacuum
edited Nov 14 '18 at 11:00
a_horse_with_no_name
296k46451546
296k46451546
asked Nov 14 '18 at 10:12
RolintocourRolintocour
76511228
76511228
1
If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values
– a_horse_with_no_name
Nov 14 '18 at 11:01
There are more complicated calculations, using join, that would make the view too slow
– Rolintocour
Nov 14 '18 at 11:08
2
Then show us the real update that includes the JOINs
– a_horse_with_no_name
Nov 14 '18 at 11:11
Autovacuum doesn't blockUPDATE
processing, so it can't be the problem.
– Laurenz Albe
Nov 14 '18 at 11:22
add a comment |
1
If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values
– a_horse_with_no_name
Nov 14 '18 at 11:01
There are more complicated calculations, using join, that would make the view too slow
– Rolintocour
Nov 14 '18 at 11:08
2
Then show us the real update that includes the JOINs
– a_horse_with_no_name
Nov 14 '18 at 11:11
Autovacuum doesn't blockUPDATE
processing, so it can't be the problem.
– Laurenz Albe
Nov 14 '18 at 11:22
1
1
If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values
– a_horse_with_no_name
Nov 14 '18 at 11:01
If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values
– a_horse_with_no_name
Nov 14 '18 at 11:01
There are more complicated calculations, using join, that would make the view too slow
– Rolintocour
Nov 14 '18 at 11:08
There are more complicated calculations, using join, that would make the view too slow
– Rolintocour
Nov 14 '18 at 11:08
2
2
Then show us the real update that includes the JOINs
– a_horse_with_no_name
Nov 14 '18 at 11:11
Then show us the real update that includes the JOINs
– a_horse_with_no_name
Nov 14 '18 at 11:11
Autovacuum doesn't block
UPDATE
processing, so it can't be the problem.– Laurenz Albe
Nov 14 '18 at 11:22
Autovacuum doesn't block
UPDATE
processing, so it can't be the problem.– Laurenz Albe
Nov 14 '18 at 11:22
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297692%2fpostgres-auto-vacuum-during-a-huge-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297692%2fpostgres-auto-vacuum-during-a-huge-query%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values
– a_horse_with_no_name
Nov 14 '18 at 11:01
There are more complicated calculations, using join, that would make the view too slow
– Rolintocour
Nov 14 '18 at 11:08
2
Then show us the real update that includes the JOINs
– a_horse_with_no_name
Nov 14 '18 at 11:11
Autovacuum doesn't block
UPDATE
processing, so it can't be the problem.– Laurenz Albe
Nov 14 '18 at 11:22