How to find gaps in dates using pandas












1















I have a data set which shows the performance of a device per month. Here is how the data looks like



device, date, performance
1 , 06/16, 50.4
1 , 07/16, 54
1 , 08/16, 6
1 , 09/17, 67
1 , 10/17, 56
2 , 01/12, 34
....


A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like



device, start, end
1 , 06/16, 08/16
1 , 09/17, 10/17
....


How can I create that using pandas?










share|improve this question



























    1















    I have a data set which shows the performance of a device per month. Here is how the data looks like



    device, date, performance
    1 , 06/16, 50.4
    1 , 07/16, 54
    1 , 08/16, 6
    1 , 09/17, 67
    1 , 10/17, 56
    2 , 01/12, 34
    ....


    A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like



    device, start, end
    1 , 06/16, 08/16
    1 , 09/17, 10/17
    ....


    How can I create that using pandas?










    share|improve this question

























      1












      1








      1


      2






      I have a data set which shows the performance of a device per month. Here is how the data looks like



      device, date, performance
      1 , 06/16, 50.4
      1 , 07/16, 54
      1 , 08/16, 6
      1 , 09/17, 67
      1 , 10/17, 56
      2 , 01/12, 34
      ....


      A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like



      device, start, end
      1 , 06/16, 08/16
      1 , 09/17, 10/17
      ....


      How can I create that using pandas?










      share|improve this question














      I have a data set which shows the performance of a device per month. Here is how the data looks like



      device, date, performance
      1 , 06/16, 50.4
      1 , 07/16, 54
      1 , 08/16, 6
      1 , 09/17, 67
      1 , 10/17, 56
      2 , 01/12, 34
      ....


      A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like



      device, start, end
      1 , 06/16, 08/16
      1 , 09/17, 10/17
      ....


      How can I create that using pandas?







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 17:07









      H.Z.H.Z.

      1,79854392




      1,79854392
























          2 Answers
          2






          active

          oldest

          votes


















          3














          You can create series with the same number when consecutive months or same device. To do this, you can use shift and add a a month with DateOffset to check if the following row is a month apart. Use cumsum to increment the value for each new group. First you need the column 'date' as datetime in a series:



          ser_date = pd.to_datetime(df['date'],format='%m/%y')
          ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
          (df.device.diff() != 0)) # different device
          .cumsum()) #increment value for each group


          Now you can use groupby on ser_group, join the first and the last of each group such as:



          g_df = df.groupby(ser_group) #grouped data
          df_new = (g_df['device','date'].first() #first of each group
          .join(g_df['date'].last(),rsuffix='_') #joined with last of each group
          .rename(columns={'date':'start','date_':'end'})) #rename column start/end

          print (df_new)
          device start end
          1 1 06/16 08/16
          2 1 09/17 10/17
          3 2 01/12 01/12





          share|improve this answer































            1














            To be more instructive, I extended a little your source data, so that it
            contains full sets of 5 readings for 2 devices.
            I also decided to change date column to true dates.



            The concept is based on groupping by device and then generating
            start / end dates for each group.



            The script generating start / end periods can be as follows:



            import pandas as pd

            # Source data
            df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
            'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
            '03/16', '04/16', '11/16', '12/16', '01/17' ],
            'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
            # Convert date to true dates
            df.date = pd.to_datetime(df.date, format='%m/%y')
            grp = df.groupby(by='device')
            # Starts of periods
            st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
            != x.date.shift()]).reset_index(level=1,drop=True)
            # Ends of periods
            en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
            != x.date.shift(-1)]).reset_index(level=1,drop=True)
            # Result - concat start / end dates
            res = pd.concat([st, en], axis=1).reset_index()
            # Set column names
            res.columns = ['device', 'start', 'end']


            The result is:



               device      start        end
            0 1 2016-06-01 2016-08-01
            1 1 2017-09-01 2017-10-01
            2 2 2016-03-01 2016-04-01
            3 2 2016-11-01 2017-01-01





            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305411%2fhow-to-find-gaps-in-dates-using-pandas%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              You can create series with the same number when consecutive months or same device. To do this, you can use shift and add a a month with DateOffset to check if the following row is a month apart. Use cumsum to increment the value for each new group. First you need the column 'date' as datetime in a series:



              ser_date = pd.to_datetime(df['date'],format='%m/%y')
              ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
              (df.device.diff() != 0)) # different device
              .cumsum()) #increment value for each group


              Now you can use groupby on ser_group, join the first and the last of each group such as:



              g_df = df.groupby(ser_group) #grouped data
              df_new = (g_df['device','date'].first() #first of each group
              .join(g_df['date'].last(),rsuffix='_') #joined with last of each group
              .rename(columns={'date':'start','date_':'end'})) #rename column start/end

              print (df_new)
              device start end
              1 1 06/16 08/16
              2 1 09/17 10/17
              3 2 01/12 01/12





              share|improve this answer




























                3














                You can create series with the same number when consecutive months or same device. To do this, you can use shift and add a a month with DateOffset to check if the following row is a month apart. Use cumsum to increment the value for each new group. First you need the column 'date' as datetime in a series:



                ser_date = pd.to_datetime(df['date'],format='%m/%y')
                ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
                (df.device.diff() != 0)) # different device
                .cumsum()) #increment value for each group


                Now you can use groupby on ser_group, join the first and the last of each group such as:



                g_df = df.groupby(ser_group) #grouped data
                df_new = (g_df['device','date'].first() #first of each group
                .join(g_df['date'].last(),rsuffix='_') #joined with last of each group
                .rename(columns={'date':'start','date_':'end'})) #rename column start/end

                print (df_new)
                device start end
                1 1 06/16 08/16
                2 1 09/17 10/17
                3 2 01/12 01/12





                share|improve this answer


























                  3












                  3








                  3







                  You can create series with the same number when consecutive months or same device. To do this, you can use shift and add a a month with DateOffset to check if the following row is a month apart. Use cumsum to increment the value for each new group. First you need the column 'date' as datetime in a series:



                  ser_date = pd.to_datetime(df['date'],format='%m/%y')
                  ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
                  (df.device.diff() != 0)) # different device
                  .cumsum()) #increment value for each group


                  Now you can use groupby on ser_group, join the first and the last of each group such as:



                  g_df = df.groupby(ser_group) #grouped data
                  df_new = (g_df['device','date'].first() #first of each group
                  .join(g_df['date'].last(),rsuffix='_') #joined with last of each group
                  .rename(columns={'date':'start','date_':'end'})) #rename column start/end

                  print (df_new)
                  device start end
                  1 1 06/16 08/16
                  2 1 09/17 10/17
                  3 2 01/12 01/12





                  share|improve this answer













                  You can create series with the same number when consecutive months or same device. To do this, you can use shift and add a a month with DateOffset to check if the following row is a month apart. Use cumsum to increment the value for each new group. First you need the column 'date' as datetime in a series:



                  ser_date = pd.to_datetime(df['date'],format='%m/%y')
                  ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
                  (df.device.diff() != 0)) # different device
                  .cumsum()) #increment value for each group


                  Now you can use groupby on ser_group, join the first and the last of each group such as:



                  g_df = df.groupby(ser_group) #grouped data
                  df_new = (g_df['device','date'].first() #first of each group
                  .join(g_df['date'].last(),rsuffix='_') #joined with last of each group
                  .rename(columns={'date':'start','date_':'end'})) #rename column start/end

                  print (df_new)
                  device start end
                  1 1 06/16 08/16
                  2 1 09/17 10/17
                  3 2 01/12 01/12






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 14 '18 at 18:59









                  Ben.TBen.T

                  6,1802825




                  6,1802825

























                      1














                      To be more instructive, I extended a little your source data, so that it
                      contains full sets of 5 readings for 2 devices.
                      I also decided to change date column to true dates.



                      The concept is based on groupping by device and then generating
                      start / end dates for each group.



                      The script generating start / end periods can be as follows:



                      import pandas as pd

                      # Source data
                      df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
                      'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
                      '03/16', '04/16', '11/16', '12/16', '01/17' ],
                      'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
                      # Convert date to true dates
                      df.date = pd.to_datetime(df.date, format='%m/%y')
                      grp = df.groupby(by='device')
                      # Starts of periods
                      st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
                      != x.date.shift()]).reset_index(level=1,drop=True)
                      # Ends of periods
                      en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
                      != x.date.shift(-1)]).reset_index(level=1,drop=True)
                      # Result - concat start / end dates
                      res = pd.concat([st, en], axis=1).reset_index()
                      # Set column names
                      res.columns = ['device', 'start', 'end']


                      The result is:



                         device      start        end
                      0 1 2016-06-01 2016-08-01
                      1 1 2017-09-01 2017-10-01
                      2 2 2016-03-01 2016-04-01
                      3 2 2016-11-01 2017-01-01





                      share|improve this answer




























                        1














                        To be more instructive, I extended a little your source data, so that it
                        contains full sets of 5 readings for 2 devices.
                        I also decided to change date column to true dates.



                        The concept is based on groupping by device and then generating
                        start / end dates for each group.



                        The script generating start / end periods can be as follows:



                        import pandas as pd

                        # Source data
                        df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
                        'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
                        '03/16', '04/16', '11/16', '12/16', '01/17' ],
                        'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
                        # Convert date to true dates
                        df.date = pd.to_datetime(df.date, format='%m/%y')
                        grp = df.groupby(by='device')
                        # Starts of periods
                        st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
                        != x.date.shift()]).reset_index(level=1,drop=True)
                        # Ends of periods
                        en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
                        != x.date.shift(-1)]).reset_index(level=1,drop=True)
                        # Result - concat start / end dates
                        res = pd.concat([st, en], axis=1).reset_index()
                        # Set column names
                        res.columns = ['device', 'start', 'end']


                        The result is:



                           device      start        end
                        0 1 2016-06-01 2016-08-01
                        1 1 2017-09-01 2017-10-01
                        2 2 2016-03-01 2016-04-01
                        3 2 2016-11-01 2017-01-01





                        share|improve this answer


























                          1












                          1








                          1







                          To be more instructive, I extended a little your source data, so that it
                          contains full sets of 5 readings for 2 devices.
                          I also decided to change date column to true dates.



                          The concept is based on groupping by device and then generating
                          start / end dates for each group.



                          The script generating start / end periods can be as follows:



                          import pandas as pd

                          # Source data
                          df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
                          'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
                          '03/16', '04/16', '11/16', '12/16', '01/17' ],
                          'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
                          # Convert date to true dates
                          df.date = pd.to_datetime(df.date, format='%m/%y')
                          grp = df.groupby(by='device')
                          # Starts of periods
                          st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
                          != x.date.shift()]).reset_index(level=1,drop=True)
                          # Ends of periods
                          en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
                          != x.date.shift(-1)]).reset_index(level=1,drop=True)
                          # Result - concat start / end dates
                          res = pd.concat([st, en], axis=1).reset_index()
                          # Set column names
                          res.columns = ['device', 'start', 'end']


                          The result is:



                             device      start        end
                          0 1 2016-06-01 2016-08-01
                          1 1 2017-09-01 2017-10-01
                          2 2 2016-03-01 2016-04-01
                          3 2 2016-11-01 2017-01-01





                          share|improve this answer













                          To be more instructive, I extended a little your source data, so that it
                          contains full sets of 5 readings for 2 devices.
                          I also decided to change date column to true dates.



                          The concept is based on groupping by device and then generating
                          start / end dates for each group.



                          The script generating start / end periods can be as follows:



                          import pandas as pd

                          # Source data
                          df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
                          'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
                          '03/16', '04/16', '11/16', '12/16', '01/17' ],
                          'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
                          # Convert date to true dates
                          df.date = pd.to_datetime(df.date, format='%m/%y')
                          grp = df.groupby(by='device')
                          # Starts of periods
                          st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
                          != x.date.shift()]).reset_index(level=1,drop=True)
                          # Ends of periods
                          en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
                          != x.date.shift(-1)]).reset_index(level=1,drop=True)
                          # Result - concat start / end dates
                          res = pd.concat([st, en], axis=1).reset_index()
                          # Set column names
                          res.columns = ['device', 'start', 'end']


                          The result is:



                             device      start        end
                          0 1 2016-06-01 2016-08-01
                          1 1 2017-09-01 2017-10-01
                          2 2 2016-03-01 2016-04-01
                          3 2 2016-11-01 2017-01-01






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 14 '18 at 21:03









                          Valdi_BoValdi_Bo

                          4,8752815




                          4,8752815






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305411%2fhow-to-find-gaps-in-dates-using-pandas%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Xamarin.iOS Cant Deploy on Iphone

                              Glorious Revolution

                              Dulmage-Mendelsohn matrix decomposition in Python