Optimising a MySQL script with 1.7 billion records












0















I am trying to optimise a MySQL script that I have created. Here is a bit of background on the project.



We have 2 system which we are now merging into one system, I am currently writing the scripts to merge the databases together. I need this to run as fast as possible, while this script is running we have to shut down the systems and we don't want too much down time.



Both systems have a table called venue, zone and device_vendor which are identical they just have different data. When I merged the tables together obviously the IDs for the values in those tables have changed and they are foreign keys in a lot of other tables. The table I am currently working on has 1.7 billion records and all has the values venue_id, zone_id and device_vendor_id so I have to change these values to the new IDs.



Here is the script that I have written to change the IDs to the new values. All these values are also indexes which is why it is taking so long.



INSERT INTO `intelli_sense`.`tracking_daily_stats_zone_unique_device_uuids_per_hour` (day_epoch, day_of_week, hour, venue_id, zone_id, device_uuid, device_vendor_id, first_seen, last_seen, is_repeat)

SELECT day_epoch, day_of_week, hour, (CASE WHEN intelli_sense_venue.id!=0 THEN intelli_sense_venue.id ELSE 0 END), (CASE WHEN intelli_sense_zone.id!=0 THEN intelli_sense_zone.id ELSE 0 END), device_uuid, (CASE WHEN intelli_sense_device_vendor.id!=0 THEN intelli_sense_device_vendor.id ELSE 0 END), first_seen, last_seen, is_repeat
FROM geo_sense.daily_stats_zone_unique_device_uuids_per_hour AS tracking_daily_stats_zone_unique_device_uuids_per_hour
LEFT JOIN geo_sense.venue AS tracking_venue ON tracking_venue.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.venue_id
LEFT JOIN intelli_sense.venue AS intelli_sense_venue ON intelli_sense_venue.name = tracking_venue.name
LEFT JOIN geo_sense.zone AS tracking_zone ON tracking_zone.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.zone_id
LEFT JOIN intelli_sense.zone AS intelli_sense_zone ON intelli_sense_zone.name = tracking_zone.name AND intelli_sense_zone.lat = tracking_zone.lat AND intelli_sense_zone.lon = tracking_zone.lon
LEFT JOIN geo_sense.device_vendor AS tracking_device_vendor ON tracking_device_vendor.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.device_vendor_id
LEFT JOIN intelli_sense.device_vendor AS intelli_sense_device_vendor ON intelli_sense_device_vendor.name = tracking_device_vendor.name AND intelli_sense_device_vendor.description = tracking_device_vendor.description;


As you can see from the script above I have used LEFT JOINS to update these values. I have basically imported both databases into one server and now I am merging them together to create a new database called intelli_sense.



In the script above this isn't merging the 2 databases together it is just updating the values in the tracking_daily_stats_zone_unique_device_uuids_per_hour table which is from the geo_sense database because after merging the venue, zone and device_vendor tables the IDs have changed. I need to update all the tables which are related to those tables.



I have ran this script and it took 3 days to insert 50mil records which is way to long so I stopped it.



I need this script to finish in less then a day because I have 3 tables with this amount of data and I don't want too much system downtime.



I can't share the data since there is so much so I hope that script explains it enough, the actual script works its just a case of speeding it up. The data is also personal data so I wouldn't be able to share it anyway.



If you don't think this is impossible let me know and I will find a new way to do it while leaving both systems on so it can take as long as it wants.



Thanks in advance.



EDIT: Here is the explain statement for the above query.
Explain Statement










share|improve this question

























  • EXPLAIN statement and DESCRIBE TABLE statements results are generally handy, when trying to optimize a query.

    – Madhur Bhaiya
    Nov 16 '18 at 10:24











  • @MadhurBhaiya I have added the explain statement, I am no SQL guru so in all honesty I don't fully understand it. I am researching it right now.

    – Luke Rayner
    Nov 16 '18 at 10:54











  • I'm not a SQL guru eigher, but I see lot's of ALL there, which is the worst join type and usually indicates the lack of appropriate indexes on the table (Reference), so some indexing is required

    – Alon Eitan
    Nov 16 '18 at 10:58











  • @AlonEitan I believe that is due to joining them on the name of the venue, zone or device_vendor. This is the only way I am able to do this because all the IDs have changes so I use the name to find the correct one. Does that sound right?

    – Luke Rayner
    Nov 16 '18 at 11:02






  • 1





    @AlonEitan I will keep the question but I will also post one on that forum.

    – Luke Rayner
    Nov 16 '18 at 11:30
















0















I am trying to optimise a MySQL script that I have created. Here is a bit of background on the project.



We have 2 system which we are now merging into one system, I am currently writing the scripts to merge the databases together. I need this to run as fast as possible, while this script is running we have to shut down the systems and we don't want too much down time.



Both systems have a table called venue, zone and device_vendor which are identical they just have different data. When I merged the tables together obviously the IDs for the values in those tables have changed and they are foreign keys in a lot of other tables. The table I am currently working on has 1.7 billion records and all has the values venue_id, zone_id and device_vendor_id so I have to change these values to the new IDs.



Here is the script that I have written to change the IDs to the new values. All these values are also indexes which is why it is taking so long.



INSERT INTO `intelli_sense`.`tracking_daily_stats_zone_unique_device_uuids_per_hour` (day_epoch, day_of_week, hour, venue_id, zone_id, device_uuid, device_vendor_id, first_seen, last_seen, is_repeat)

SELECT day_epoch, day_of_week, hour, (CASE WHEN intelli_sense_venue.id!=0 THEN intelli_sense_venue.id ELSE 0 END), (CASE WHEN intelli_sense_zone.id!=0 THEN intelli_sense_zone.id ELSE 0 END), device_uuid, (CASE WHEN intelli_sense_device_vendor.id!=0 THEN intelli_sense_device_vendor.id ELSE 0 END), first_seen, last_seen, is_repeat
FROM geo_sense.daily_stats_zone_unique_device_uuids_per_hour AS tracking_daily_stats_zone_unique_device_uuids_per_hour
LEFT JOIN geo_sense.venue AS tracking_venue ON tracking_venue.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.venue_id
LEFT JOIN intelli_sense.venue AS intelli_sense_venue ON intelli_sense_venue.name = tracking_venue.name
LEFT JOIN geo_sense.zone AS tracking_zone ON tracking_zone.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.zone_id
LEFT JOIN intelli_sense.zone AS intelli_sense_zone ON intelli_sense_zone.name = tracking_zone.name AND intelli_sense_zone.lat = tracking_zone.lat AND intelli_sense_zone.lon = tracking_zone.lon
LEFT JOIN geo_sense.device_vendor AS tracking_device_vendor ON tracking_device_vendor.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.device_vendor_id
LEFT JOIN intelli_sense.device_vendor AS intelli_sense_device_vendor ON intelli_sense_device_vendor.name = tracking_device_vendor.name AND intelli_sense_device_vendor.description = tracking_device_vendor.description;


As you can see from the script above I have used LEFT JOINS to update these values. I have basically imported both databases into one server and now I am merging them together to create a new database called intelli_sense.



In the script above this isn't merging the 2 databases together it is just updating the values in the tracking_daily_stats_zone_unique_device_uuids_per_hour table which is from the geo_sense database because after merging the venue, zone and device_vendor tables the IDs have changed. I need to update all the tables which are related to those tables.



I have ran this script and it took 3 days to insert 50mil records which is way to long so I stopped it.



I need this script to finish in less then a day because I have 3 tables with this amount of data and I don't want too much system downtime.



I can't share the data since there is so much so I hope that script explains it enough, the actual script works its just a case of speeding it up. The data is also personal data so I wouldn't be able to share it anyway.



If you don't think this is impossible let me know and I will find a new way to do it while leaving both systems on so it can take as long as it wants.



Thanks in advance.



EDIT: Here is the explain statement for the above query.
Explain Statement










share|improve this question

























  • EXPLAIN statement and DESCRIBE TABLE statements results are generally handy, when trying to optimize a query.

    – Madhur Bhaiya
    Nov 16 '18 at 10:24











  • @MadhurBhaiya I have added the explain statement, I am no SQL guru so in all honesty I don't fully understand it. I am researching it right now.

    – Luke Rayner
    Nov 16 '18 at 10:54











  • I'm not a SQL guru eigher, but I see lot's of ALL there, which is the worst join type and usually indicates the lack of appropriate indexes on the table (Reference), so some indexing is required

    – Alon Eitan
    Nov 16 '18 at 10:58











  • @AlonEitan I believe that is due to joining them on the name of the venue, zone or device_vendor. This is the only way I am able to do this because all the IDs have changes so I use the name to find the correct one. Does that sound right?

    – Luke Rayner
    Nov 16 '18 at 11:02






  • 1





    @AlonEitan I will keep the question but I will also post one on that forum.

    – Luke Rayner
    Nov 16 '18 at 11:30














0












0








0


0






I am trying to optimise a MySQL script that I have created. Here is a bit of background on the project.



We have 2 system which we are now merging into one system, I am currently writing the scripts to merge the databases together. I need this to run as fast as possible, while this script is running we have to shut down the systems and we don't want too much down time.



Both systems have a table called venue, zone and device_vendor which are identical they just have different data. When I merged the tables together obviously the IDs for the values in those tables have changed and they are foreign keys in a lot of other tables. The table I am currently working on has 1.7 billion records and all has the values venue_id, zone_id and device_vendor_id so I have to change these values to the new IDs.



Here is the script that I have written to change the IDs to the new values. All these values are also indexes which is why it is taking so long.



INSERT INTO `intelli_sense`.`tracking_daily_stats_zone_unique_device_uuids_per_hour` (day_epoch, day_of_week, hour, venue_id, zone_id, device_uuid, device_vendor_id, first_seen, last_seen, is_repeat)

SELECT day_epoch, day_of_week, hour, (CASE WHEN intelli_sense_venue.id!=0 THEN intelli_sense_venue.id ELSE 0 END), (CASE WHEN intelli_sense_zone.id!=0 THEN intelli_sense_zone.id ELSE 0 END), device_uuid, (CASE WHEN intelli_sense_device_vendor.id!=0 THEN intelli_sense_device_vendor.id ELSE 0 END), first_seen, last_seen, is_repeat
FROM geo_sense.daily_stats_zone_unique_device_uuids_per_hour AS tracking_daily_stats_zone_unique_device_uuids_per_hour
LEFT JOIN geo_sense.venue AS tracking_venue ON tracking_venue.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.venue_id
LEFT JOIN intelli_sense.venue AS intelli_sense_venue ON intelli_sense_venue.name = tracking_venue.name
LEFT JOIN geo_sense.zone AS tracking_zone ON tracking_zone.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.zone_id
LEFT JOIN intelli_sense.zone AS intelli_sense_zone ON intelli_sense_zone.name = tracking_zone.name AND intelli_sense_zone.lat = tracking_zone.lat AND intelli_sense_zone.lon = tracking_zone.lon
LEFT JOIN geo_sense.device_vendor AS tracking_device_vendor ON tracking_device_vendor.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.device_vendor_id
LEFT JOIN intelli_sense.device_vendor AS intelli_sense_device_vendor ON intelli_sense_device_vendor.name = tracking_device_vendor.name AND intelli_sense_device_vendor.description = tracking_device_vendor.description;


As you can see from the script above I have used LEFT JOINS to update these values. I have basically imported both databases into one server and now I am merging them together to create a new database called intelli_sense.



In the script above this isn't merging the 2 databases together it is just updating the values in the tracking_daily_stats_zone_unique_device_uuids_per_hour table which is from the geo_sense database because after merging the venue, zone and device_vendor tables the IDs have changed. I need to update all the tables which are related to those tables.



I have ran this script and it took 3 days to insert 50mil records which is way to long so I stopped it.



I need this script to finish in less then a day because I have 3 tables with this amount of data and I don't want too much system downtime.



I can't share the data since there is so much so I hope that script explains it enough, the actual script works its just a case of speeding it up. The data is also personal data so I wouldn't be able to share it anyway.



If you don't think this is impossible let me know and I will find a new way to do it while leaving both systems on so it can take as long as it wants.



Thanks in advance.



EDIT: Here is the explain statement for the above query.
Explain Statement










share|improve this question
















I am trying to optimise a MySQL script that I have created. Here is a bit of background on the project.



We have 2 system which we are now merging into one system, I am currently writing the scripts to merge the databases together. I need this to run as fast as possible, while this script is running we have to shut down the systems and we don't want too much down time.



Both systems have a table called venue, zone and device_vendor which are identical they just have different data. When I merged the tables together obviously the IDs for the values in those tables have changed and they are foreign keys in a lot of other tables. The table I am currently working on has 1.7 billion records and all has the values venue_id, zone_id and device_vendor_id so I have to change these values to the new IDs.



Here is the script that I have written to change the IDs to the new values. All these values are also indexes which is why it is taking so long.



INSERT INTO `intelli_sense`.`tracking_daily_stats_zone_unique_device_uuids_per_hour` (day_epoch, day_of_week, hour, venue_id, zone_id, device_uuid, device_vendor_id, first_seen, last_seen, is_repeat)

SELECT day_epoch, day_of_week, hour, (CASE WHEN intelli_sense_venue.id!=0 THEN intelli_sense_venue.id ELSE 0 END), (CASE WHEN intelli_sense_zone.id!=0 THEN intelli_sense_zone.id ELSE 0 END), device_uuid, (CASE WHEN intelli_sense_device_vendor.id!=0 THEN intelli_sense_device_vendor.id ELSE 0 END), first_seen, last_seen, is_repeat
FROM geo_sense.daily_stats_zone_unique_device_uuids_per_hour AS tracking_daily_stats_zone_unique_device_uuids_per_hour
LEFT JOIN geo_sense.venue AS tracking_venue ON tracking_venue.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.venue_id
LEFT JOIN intelli_sense.venue AS intelli_sense_venue ON intelli_sense_venue.name = tracking_venue.name
LEFT JOIN geo_sense.zone AS tracking_zone ON tracking_zone.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.zone_id
LEFT JOIN intelli_sense.zone AS intelli_sense_zone ON intelli_sense_zone.name = tracking_zone.name AND intelli_sense_zone.lat = tracking_zone.lat AND intelli_sense_zone.lon = tracking_zone.lon
LEFT JOIN geo_sense.device_vendor AS tracking_device_vendor ON tracking_device_vendor.id = tracking_daily_stats_zone_unique_device_uuids_per_hour.device_vendor_id
LEFT JOIN intelli_sense.device_vendor AS intelli_sense_device_vendor ON intelli_sense_device_vendor.name = tracking_device_vendor.name AND intelli_sense_device_vendor.description = tracking_device_vendor.description;


As you can see from the script above I have used LEFT JOINS to update these values. I have basically imported both databases into one server and now I am merging them together to create a new database called intelli_sense.



In the script above this isn't merging the 2 databases together it is just updating the values in the tracking_daily_stats_zone_unique_device_uuids_per_hour table which is from the geo_sense database because after merging the venue, zone and device_vendor tables the IDs have changed. I need to update all the tables which are related to those tables.



I have ran this script and it took 3 days to insert 50mil records which is way to long so I stopped it.



I need this script to finish in less then a day because I have 3 tables with this amount of data and I don't want too much system downtime.



I can't share the data since there is so much so I hope that script explains it enough, the actual script works its just a case of speeding it up. The data is also personal data so I wouldn't be able to share it anyway.



If you don't think this is impossible let me know and I will find a new way to do it while leaving both systems on so it can take as long as it wants.



Thanks in advance.



EDIT: Here is the explain statement for the above query.
Explain Statement







mysql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 10:52







Luke Rayner

















asked Nov 16 '18 at 9:47









Luke RaynerLuke Rayner

117115




117115













  • EXPLAIN statement and DESCRIBE TABLE statements results are generally handy, when trying to optimize a query.

    – Madhur Bhaiya
    Nov 16 '18 at 10:24











  • @MadhurBhaiya I have added the explain statement, I am no SQL guru so in all honesty I don't fully understand it. I am researching it right now.

    – Luke Rayner
    Nov 16 '18 at 10:54











  • I'm not a SQL guru eigher, but I see lot's of ALL there, which is the worst join type and usually indicates the lack of appropriate indexes on the table (Reference), so some indexing is required

    – Alon Eitan
    Nov 16 '18 at 10:58











  • @AlonEitan I believe that is due to joining them on the name of the venue, zone or device_vendor. This is the only way I am able to do this because all the IDs have changes so I use the name to find the correct one. Does that sound right?

    – Luke Rayner
    Nov 16 '18 at 11:02






  • 1





    @AlonEitan I will keep the question but I will also post one on that forum.

    – Luke Rayner
    Nov 16 '18 at 11:30



















  • EXPLAIN statement and DESCRIBE TABLE statements results are generally handy, when trying to optimize a query.

    – Madhur Bhaiya
    Nov 16 '18 at 10:24











  • @MadhurBhaiya I have added the explain statement, I am no SQL guru so in all honesty I don't fully understand it. I am researching it right now.

    – Luke Rayner
    Nov 16 '18 at 10:54











  • I'm not a SQL guru eigher, but I see lot's of ALL there, which is the worst join type and usually indicates the lack of appropriate indexes on the table (Reference), so some indexing is required

    – Alon Eitan
    Nov 16 '18 at 10:58











  • @AlonEitan I believe that is due to joining them on the name of the venue, zone or device_vendor. This is the only way I am able to do this because all the IDs have changes so I use the name to find the correct one. Does that sound right?

    – Luke Rayner
    Nov 16 '18 at 11:02






  • 1





    @AlonEitan I will keep the question but I will also post one on that forum.

    – Luke Rayner
    Nov 16 '18 at 11:30

















EXPLAIN statement and DESCRIBE TABLE statements results are generally handy, when trying to optimize a query.

– Madhur Bhaiya
Nov 16 '18 at 10:24





EXPLAIN statement and DESCRIBE TABLE statements results are generally handy, when trying to optimize a query.

– Madhur Bhaiya
Nov 16 '18 at 10:24













@MadhurBhaiya I have added the explain statement, I am no SQL guru so in all honesty I don't fully understand it. I am researching it right now.

– Luke Rayner
Nov 16 '18 at 10:54





@MadhurBhaiya I have added the explain statement, I am no SQL guru so in all honesty I don't fully understand it. I am researching it right now.

– Luke Rayner
Nov 16 '18 at 10:54













I'm not a SQL guru eigher, but I see lot's of ALL there, which is the worst join type and usually indicates the lack of appropriate indexes on the table (Reference), so some indexing is required

– Alon Eitan
Nov 16 '18 at 10:58





I'm not a SQL guru eigher, but I see lot's of ALL there, which is the worst join type and usually indicates the lack of appropriate indexes on the table (Reference), so some indexing is required

– Alon Eitan
Nov 16 '18 at 10:58













@AlonEitan I believe that is due to joining them on the name of the venue, zone or device_vendor. This is the only way I am able to do this because all the IDs have changes so I use the name to find the correct one. Does that sound right?

– Luke Rayner
Nov 16 '18 at 11:02





@AlonEitan I believe that is due to joining them on the name of the venue, zone or device_vendor. This is the only way I am able to do this because all the IDs have changes so I use the name to find the correct one. Does that sound right?

– Luke Rayner
Nov 16 '18 at 11:02




1




1





@AlonEitan I will keep the question but I will also post one on that forum.

– Luke Rayner
Nov 16 '18 at 11:30





@AlonEitan I will keep the question but I will also post one on that forum.

– Luke Rayner
Nov 16 '18 at 11:30












0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335210%2foptimising-a-mysql-script-with-1-7-billion-records%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335210%2foptimising-a-mysql-script-with-1-7-billion-records%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

List item for chat from Array inside array React Native

Thiostrepton

Caerphilly