Postgres: auto vacuum during a huge query












0















In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.



An example of calculation is:



UPDATE mytable 
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;


the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.



When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min



But for 15 000 000, it didn't finish after many hours...



I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update



ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);


But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?










share|improve this question




















  • 1





    If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values

    – a_horse_with_no_name
    Nov 14 '18 at 11:01











  • There are more complicated calculations, using join, that would make the view too slow

    – Rolintocour
    Nov 14 '18 at 11:08






  • 2





    Then show us the real update that includes the JOINs

    – a_horse_with_no_name
    Nov 14 '18 at 11:11











  • Autovacuum doesn't block UPDATE processing, so it can't be the problem.

    – Laurenz Albe
    Nov 14 '18 at 11:22
















0















In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.



An example of calculation is:



UPDATE mytable 
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;


the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.



When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min



But for 15 000 000, it didn't finish after many hours...



I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update



ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);


But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?










share|improve this question




















  • 1





    If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values

    – a_horse_with_no_name
    Nov 14 '18 at 11:01











  • There are more complicated calculations, using join, that would make the view too slow

    – Rolintocour
    Nov 14 '18 at 11:08






  • 2





    Then show us the real update that includes the JOINs

    – a_horse_with_no_name
    Nov 14 '18 at 11:11











  • Autovacuum doesn't block UPDATE processing, so it can't be the problem.

    – Laurenz Albe
    Nov 14 '18 at 11:22














0












0








0








In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.



An example of calculation is:



UPDATE mytable 
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;


the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.



When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min



But for 15 000 000, it didn't finish after many hours...



I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update



ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);


But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?










share|improve this question
















In a Postgres DB (9.4), we have a table with 35 millions rows. The data is fast static, it is first inserted with a COPY, then few fields are calculated.



An example of calculation is:



UPDATE mytable 
SET the_name = the_rank || '_' || the_number || '_' || the_name
WHERE the_partition=5;


the_partition may have 5 different values, so the computations are not done on the whole table, but 5 requests are required to update the_name.



When 500 000 are updated => it takes about 30s
When 5 000 000 => it should take 300s, ia 5 min



But for 15 000 000, it didn't finish after many hours...



I suspect the auto vacuum for not doing its job well, so I tried to set it up more aggressive, expecting it should autovacuum after 500 000 rows update



ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.00);
ALTER TABLE mytable SET (autovacuum_vacuum_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_analyze_threshold = 500000);
ALTER TABLE mytable SET (autovacuum_vacuum_cost_limit = 10000);


But as I understand it, when I run a big update of 15 000 000, it is in the same transaction, so the auto vacuum won't be applied ad 15 000 000 of news rows will be stored until the request is finished (as postgresql makes a copy of updated rows)? So how to make the computation time kind of linear?







postgresql autovacuum






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 11:00









a_horse_with_no_name

296k46451546




296k46451546










asked Nov 14 '18 at 10:12









RolintocourRolintocour

76511228




76511228








  • 1





    If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values

    – a_horse_with_no_name
    Nov 14 '18 at 11:01











  • There are more complicated calculations, using join, that would make the view too slow

    – Rolintocour
    Nov 14 '18 at 11:08






  • 2





    Then show us the real update that includes the JOINs

    – a_horse_with_no_name
    Nov 14 '18 at 11:11











  • Autovacuum doesn't block UPDATE processing, so it can't be the problem.

    – Laurenz Albe
    Nov 14 '18 at 11:22














  • 1





    If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values

    – a_horse_with_no_name
    Nov 14 '18 at 11:01











  • There are more complicated calculations, using join, that would make the view too slow

    – Rolintocour
    Nov 14 '18 at 11:08






  • 2





    Then show us the real update that includes the JOINs

    – a_horse_with_no_name
    Nov 14 '18 at 11:11











  • Autovacuum doesn't block UPDATE processing, so it can't be the problem.

    – Laurenz Albe
    Nov 14 '18 at 11:22








1




1





If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values

– a_horse_with_no_name
Nov 14 '18 at 11:01





If the data is static I don't see the reason for storing redundant data to begin with. Why are you doing that update at all? You could simply create a view that returns the concatenated values

– a_horse_with_no_name
Nov 14 '18 at 11:01













There are more complicated calculations, using join, that would make the view too slow

– Rolintocour
Nov 14 '18 at 11:08





There are more complicated calculations, using join, that would make the view too slow

– Rolintocour
Nov 14 '18 at 11:08




2




2





Then show us the real update that includes the JOINs

– a_horse_with_no_name
Nov 14 '18 at 11:11





Then show us the real update that includes the JOINs

– a_horse_with_no_name
Nov 14 '18 at 11:11













Autovacuum doesn't block UPDATE processing, so it can't be the problem.

– Laurenz Albe
Nov 14 '18 at 11:22





Autovacuum doesn't block UPDATE processing, so it can't be the problem.

– Laurenz Albe
Nov 14 '18 at 11:22












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297692%2fpostgres-auto-vacuum-during-a-huge-query%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53297692%2fpostgres-auto-vacuum-during-a-huge-query%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python