How to find gaps in dates using pandas

I have a data set which shows the performance of a device per month. Here is how the data looks like

device, date, performance

   1  , 06/16, 50.4

   1  , 07/16, 54

   1  , 08/16, 6

   1  , 09/17, 67

   1  , 10/17, 56

   2  , 01/12, 34

       ....

A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like

device, start, end

 1    ,  06/16, 08/16

 1    ,  09/17, 10/17

     ....

How can I create that using pandas?

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

add a comment |

I have a data set which shows the performance of a device per month. Here is how the data looks like

device, date, performance

   1  , 06/16, 50.4

   1  , 07/16, 54

   1  , 08/16, 6

   1  , 09/17, 67

   1  , 10/17, 56

   2  , 01/12, 34

       ....

device, start, end

 1    ,  06/16, 08/16

 1    ,  09/17, 10/17

     ....

How can I create that using pandas?

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

add a comment |

I have a data set which shows the performance of a device per month. Here is how the data looks like

device, date, performance

   1  , 06/16, 50.4

   1  , 07/16, 54

   1  , 08/16, 6

   1  , 09/17, 67

   1  , 10/17, 56

   2  , 01/12, 34

       ....

device, start, end

 1    ,  06/16, 08/16

 1    ,  09/17, 10/17

     ....

How can I create that using pandas?

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

I have a data set which shows the performance of a device per month. Here is how the data looks like

device, date, performance

   1  , 06/16, 50.4

   1  , 07/16, 54

   1  , 08/16, 6

   1  , 09/17, 67

   1  , 10/17, 56

   2  , 01/12, 34

       ....

device, start, end

 1    ,  06/16, 08/16

 1    ,  09/17, 10/17

     ....

How can I create that using pandas?

python pandas

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

asked Nov 14 '18 at 17:07

H.Z.

1,79854392

add a comment |

2 Answers
2

active

oldest

votes

You can create series with the same number when consecutive months or same device. To do this, you can use shift and add a a month with DateOffset to check if the following row is a month apart. Use cumsum to increment the value for each new group. First you need the column 'date' as datetime in a series:

ser_date = pd.to_datetime(df['date'],format='%m/%y')

ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart

              (df.device.diff() != 0)) # different device

              .cumsum()) #increment value for each group

Now you can use groupby on ser_group, join the first and the last of each group such as:

g_df = df.groupby(ser_group) #grouped data

df_new = (g_df['device','date'].first() #first of each group

          .join(g_df['date'].last(),rsuffix='_') #joined with last of each group

          .rename(columns={'date':'start','date_':'end'})) #rename column start/end



print (df_new)

   device   start     end

1       1   06/16   08/16

2       1   09/17   10/17

3       2   01/12   01/12

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

add a comment |

To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date column to true dates.

The concept is based on groupping by device and then generating
start / end dates for each group.

The script generating start / end periods can be as follows:

import pandas as pd



# Source data

df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],

    'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',

        '03/16', '04/16', '11/16', '12/16', '01/17' ],

    'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })

# Convert date to true dates

df.date = pd.to_datetime(df.date, format='%m/%y')

grp = df.groupby(by='device')

# Starts of periods

st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)

    != x.date.shift()]).reset_index(level=1,drop=True)

# Ends of periods

en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)

    != x.date.shift(-1)]).reset_index(level=1,drop=True)

# Result - concat start / end dates

res = pd.concat([st, en], axis=1).reset_index()

# Set column names

res.columns = ['device', 'start', 'end']

The result is:

   device      start        end

0       1 2016-06-01 2016-08-01

1       1 2017-09-01 2017-10-01

2       2 2016-03-01 2016-04-01

3       2 2016-11-01 2017-01-01

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305411%2fhow-to-find-gaps-in-dates-using-pandas%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

ser_date = pd.to_datetime(df['date'],format='%m/%y')

ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart

              (df.device.diff() != 0)) # different device

              .cumsum()) #increment value for each group

Now you can use groupby on ser_group, join the first and the last of each group such as:

g_df = df.groupby(ser_group) #grouped data

df_new = (g_df['device','date'].first() #first of each group

          .join(g_df['date'].last(),rsuffix='_') #joined with last of each group

          .rename(columns={'date':'start','date_':'end'})) #rename column start/end



print (df_new)

   device   start     end

1       1   06/16   08/16

2       1   09/17   10/17

3       2   01/12   01/12

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

add a comment |

ser_date = pd.to_datetime(df['date'],format='%m/%y')

ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart

              (df.device.diff() != 0)) # different device

              .cumsum()) #increment value for each group

Now you can use groupby on ser_group, join the first and the last of each group such as:

g_df = df.groupby(ser_group) #grouped data

df_new = (g_df['device','date'].first() #first of each group

          .join(g_df['date'].last(),rsuffix='_') #joined with last of each group

          .rename(columns={'date':'start','date_':'end'})) #rename column start/end



print (df_new)

   device   start     end

1       1   06/16   08/16

2       1   09/17   10/17

3       2   01/12   01/12

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

add a comment |

ser_date = pd.to_datetime(df['date'],format='%m/%y')

ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart

              (df.device.diff() != 0)) # different device

              .cumsum()) #increment value for each group

Now you can use groupby on ser_group, join the first and the last of each group such as:

g_df = df.groupby(ser_group) #grouped data

df_new = (g_df['device','date'].first() #first of each group

          .join(g_df['date'].last(),rsuffix='_') #joined with last of each group

          .rename(columns={'date':'start','date_':'end'})) #rename column start/end



print (df_new)

   device   start     end

1       1   06/16   08/16

2       1   09/17   10/17

3       2   01/12   01/12

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

ser_date = pd.to_datetime(df['date'],format='%m/%y')

ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart

              (df.device.diff() != 0)) # different device

              .cumsum()) #increment value for each group

Now you can use groupby on ser_group, join the first and the last of each group such as:

g_df = df.groupby(ser_group) #grouped data

df_new = (g_df['device','date'].first() #first of each group

          .join(g_df['date'].last(),rsuffix='_') #joined with last of each group

          .rename(columns={'date':'start','date_':'end'})) #rename column start/end



print (df_new)

   device   start     end

1       1   06/16   08/16

2       1   09/17   10/17

3       2   01/12   01/12

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

answered Nov 14 '18 at 18:59

Ben.T

6,1802825

add a comment |

To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date column to true dates.

The concept is based on groupping by device and then generating
start / end dates for each group.

The script generating start / end periods can be as follows:

import pandas as pd



# Source data

df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],

    'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',

        '03/16', '04/16', '11/16', '12/16', '01/17' ],

    'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })

# Convert date to true dates

df.date = pd.to_datetime(df.date, format='%m/%y')

grp = df.groupby(by='device')

# Starts of periods

st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)

    != x.date.shift()]).reset_index(level=1,drop=True)

# Ends of periods

en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)

    != x.date.shift(-1)]).reset_index(level=1,drop=True)

# Result - concat start / end dates

res = pd.concat([st, en], axis=1).reset_index()

# Set column names

res.columns = ['device', 'start', 'end']

The result is:

   device      start        end

0       1 2016-06-01 2016-08-01

1       1 2017-09-01 2017-10-01

2       2 2016-03-01 2016-04-01

3       2 2016-11-01 2017-01-01

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

add a comment |

To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date column to true dates.

The concept is based on groupping by device and then generating
start / end dates for each group.

The script generating start / end periods can be as follows:

import pandas as pd



# Source data

df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],

    'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',

        '03/16', '04/16', '11/16', '12/16', '01/17' ],

    'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })

# Convert date to true dates

df.date = pd.to_datetime(df.date, format='%m/%y')

grp = df.groupby(by='device')

# Starts of periods

st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)

    != x.date.shift()]).reset_index(level=1,drop=True)

# Ends of periods

en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)

    != x.date.shift(-1)]).reset_index(level=1,drop=True)

# Result - concat start / end dates

res = pd.concat([st, en], axis=1).reset_index()

# Set column names

res.columns = ['device', 'start', 'end']

The result is:

   device      start        end

0       1 2016-06-01 2016-08-01

1       1 2017-09-01 2017-10-01

2       2 2016-03-01 2016-04-01

3       2 2016-11-01 2017-01-01

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

add a comment |

To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date column to true dates.

The concept is based on groupping by device and then generating
start / end dates for each group.

The script generating start / end periods can be as follows:

import pandas as pd



# Source data

df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],

    'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',

        '03/16', '04/16', '11/16', '12/16', '01/17' ],

    'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })

# Convert date to true dates

df.date = pd.to_datetime(df.date, format='%m/%y')

grp = df.groupby(by='device')

# Starts of periods

st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)

    != x.date.shift()]).reset_index(level=1,drop=True)

# Ends of periods

en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)

    != x.date.shift(-1)]).reset_index(level=1,drop=True)

# Result - concat start / end dates

res = pd.concat([st, en], axis=1).reset_index()

# Set column names

res.columns = ['device', 'start', 'end']

The result is:

   device      start        end

0       1 2016-06-01 2016-08-01

1       1 2017-09-01 2017-10-01

2       2 2016-03-01 2016-04-01

3       2 2016-11-01 2017-01-01

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date column to true dates.

The concept is based on groupping by device and then generating
start / end dates for each group.

The script generating start / end periods can be as follows:

import pandas as pd



# Source data

df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],

    'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',

        '03/16', '04/16', '11/16', '12/16', '01/17' ],

    'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })

# Convert date to true dates

df.date = pd.to_datetime(df.date, format='%m/%y')

grp = df.groupby(by='device')

# Starts of periods

st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)

    != x.date.shift()]).reset_index(level=1,drop=True)

# Ends of periods

en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)

    != x.date.shift(-1)]).reset_index(level=1,drop=True)

# Result - concat start / end dates

res = pd.concat([st, en], axis=1).reset_index()

# Set column names

res.columns = ['device', 'start', 'end']

The result is:

   device      start        end

0       1 2016-06-01 2016-08-01

1       1 2017-09-01 2017-10-01

2       2 2016-03-01 2016-04-01

3       2 2016-11-01 2017-01-01

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

answered Nov 14 '18 at 21:03

Valdi_Bo

4,8752815

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky