How to find gaps in dates using pandas
I have a data set which shows the performance of a device per month. Here is how the data looks like
device, date, performance
1 , 06/16, 50.4
1 , 07/16, 54
1 , 08/16, 6
1 , 09/17, 67
1 , 10/17, 56
2 , 01/12, 34
....
A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like
device, start, end
1 , 06/16, 08/16
1 , 09/17, 10/17
....
How can I create that using pandas?
python pandas
add a comment |
I have a data set which shows the performance of a device per month. Here is how the data looks like
device, date, performance
1 , 06/16, 50.4
1 , 07/16, 54
1 , 08/16, 6
1 , 09/17, 67
1 , 10/17, 56
2 , 01/12, 34
....
A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like
device, start, end
1 , 06/16, 08/16
1 , 09/17, 10/17
....
How can I create that using pandas?
python pandas
add a comment |
I have a data set which shows the performance of a device per month. Here is how the data looks like
device, date, performance
1 , 06/16, 50.4
1 , 07/16, 54
1 , 08/16, 6
1 , 09/17, 67
1 , 10/17, 56
2 , 01/12, 34
....
A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like
device, start, end
1 , 06/16, 08/16
1 , 09/17, 10/17
....
How can I create that using pandas?
python pandas
I have a data set which shows the performance of a device per month. Here is how the data looks like
device, date, performance
1 , 06/16, 50.4
1 , 07/16, 54
1 , 08/16, 6
1 , 09/17, 67
1 , 10/17, 56
2 , 01/12, 34
....
A device may not be working in some months so there are some gaps in the data. For example, device one has some gaps in the above data. I'd like to create a data frame which shows the time ranges each device was working, here is how the resulting data set should look like
device, start, end
1 , 06/16, 08/16
1 , 09/17, 10/17
....
How can I create that using pandas?
python pandas
python pandas
asked Nov 14 '18 at 17:07
H.Z.H.Z.
1,79854392
1,79854392
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
You can create series with the same number when consecutive months or same device. To do this, you can use shift
and add a a month with DateOffset
to check if the following row is a month apart. Use cumsum
to increment the value for each new group. First you need the column 'date' as datetime
in a series:
ser_date = pd.to_datetime(df['date'],format='%m/%y')
ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
(df.device.diff() != 0)) # different device
.cumsum()) #increment value for each group
Now you can use groupby
on ser_group
, join
the first
and the last
of each group such as:
g_df = df.groupby(ser_group) #grouped data
df_new = (g_df['device','date'].first() #first of each group
.join(g_df['date'].last(),rsuffix='_') #joined with last of each group
.rename(columns={'date':'start','date_':'end'})) #rename column start/end
print (df_new)
device start end
1 1 06/16 08/16
2 1 09/17 10/17
3 2 01/12 01/12
add a comment |
To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date
column to true dates.
The concept is based on groupping by device
and then generating
start / end dates for each group.
The script generating start / end periods can be as follows:
import pandas as pd
# Source data
df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
'03/16', '04/16', '11/16', '12/16', '01/17' ],
'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
# Convert date to true dates
df.date = pd.to_datetime(df.date, format='%m/%y')
grp = df.groupby(by='device')
# Starts of periods
st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
!= x.date.shift()]).reset_index(level=1,drop=True)
# Ends of periods
en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
!= x.date.shift(-1)]).reset_index(level=1,drop=True)
# Result - concat start / end dates
res = pd.concat([st, en], axis=1).reset_index()
# Set column names
res.columns = ['device', 'start', 'end']
The result is:
device start end
0 1 2016-06-01 2016-08-01
1 1 2017-09-01 2017-10-01
2 2 2016-03-01 2016-04-01
3 2 2016-11-01 2017-01-01
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305411%2fhow-to-find-gaps-in-dates-using-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can create series with the same number when consecutive months or same device. To do this, you can use shift
and add a a month with DateOffset
to check if the following row is a month apart. Use cumsum
to increment the value for each new group. First you need the column 'date' as datetime
in a series:
ser_date = pd.to_datetime(df['date'],format='%m/%y')
ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
(df.device.diff() != 0)) # different device
.cumsum()) #increment value for each group
Now you can use groupby
on ser_group
, join
the first
and the last
of each group such as:
g_df = df.groupby(ser_group) #grouped data
df_new = (g_df['device','date'].first() #first of each group
.join(g_df['date'].last(),rsuffix='_') #joined with last of each group
.rename(columns={'date':'start','date_':'end'})) #rename column start/end
print (df_new)
device start end
1 1 06/16 08/16
2 1 09/17 10/17
3 2 01/12 01/12
add a comment |
You can create series with the same number when consecutive months or same device. To do this, you can use shift
and add a a month with DateOffset
to check if the following row is a month apart. Use cumsum
to increment the value for each new group. First you need the column 'date' as datetime
in a series:
ser_date = pd.to_datetime(df['date'],format='%m/%y')
ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
(df.device.diff() != 0)) # different device
.cumsum()) #increment value for each group
Now you can use groupby
on ser_group
, join
the first
and the last
of each group such as:
g_df = df.groupby(ser_group) #grouped data
df_new = (g_df['device','date'].first() #first of each group
.join(g_df['date'].last(),rsuffix='_') #joined with last of each group
.rename(columns={'date':'start','date_':'end'})) #rename column start/end
print (df_new)
device start end
1 1 06/16 08/16
2 1 09/17 10/17
3 2 01/12 01/12
add a comment |
You can create series with the same number when consecutive months or same device. To do this, you can use shift
and add a a month with DateOffset
to check if the following row is a month apart. Use cumsum
to increment the value for each new group. First you need the column 'date' as datetime
in a series:
ser_date = pd.to_datetime(df['date'],format='%m/%y')
ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
(df.device.diff() != 0)) # different device
.cumsum()) #increment value for each group
Now you can use groupby
on ser_group
, join
the first
and the last
of each group such as:
g_df = df.groupby(ser_group) #grouped data
df_new = (g_df['device','date'].first() #first of each group
.join(g_df['date'].last(),rsuffix='_') #joined with last of each group
.rename(columns={'date':'start','date_':'end'})) #rename column start/end
print (df_new)
device start end
1 1 06/16 08/16
2 1 09/17 10/17
3 2 01/12 01/12
You can create series with the same number when consecutive months or same device. To do this, you can use shift
and add a a month with DateOffset
to check if the following row is a month apart. Use cumsum
to increment the value for each new group. First you need the column 'date' as datetime
in a series:
ser_date = pd.to_datetime(df['date'],format='%m/%y')
ser_group = ((((ser_date.shift() + pd.DateOffset(months=1)) != ser_date) | #month apart
(df.device.diff() != 0)) # different device
.cumsum()) #increment value for each group
Now you can use groupby
on ser_group
, join
the first
and the last
of each group such as:
g_df = df.groupby(ser_group) #grouped data
df_new = (g_df['device','date'].first() #first of each group
.join(g_df['date'].last(),rsuffix='_') #joined with last of each group
.rename(columns={'date':'start','date_':'end'})) #rename column start/end
print (df_new)
device start end
1 1 06/16 08/16
2 1 09/17 10/17
3 2 01/12 01/12
answered Nov 14 '18 at 18:59
Ben.TBen.T
6,1802825
6,1802825
add a comment |
add a comment |
To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date
column to true dates.
The concept is based on groupping by device
and then generating
start / end dates for each group.
The script generating start / end periods can be as follows:
import pandas as pd
# Source data
df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
'03/16', '04/16', '11/16', '12/16', '01/17' ],
'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
# Convert date to true dates
df.date = pd.to_datetime(df.date, format='%m/%y')
grp = df.groupby(by='device')
# Starts of periods
st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
!= x.date.shift()]).reset_index(level=1,drop=True)
# Ends of periods
en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
!= x.date.shift(-1)]).reset_index(level=1,drop=True)
# Result - concat start / end dates
res = pd.concat([st, en], axis=1).reset_index()
# Set column names
res.columns = ['device', 'start', 'end']
The result is:
device start end
0 1 2016-06-01 2016-08-01
1 1 2017-09-01 2017-10-01
2 2 2016-03-01 2016-04-01
3 2 2016-11-01 2017-01-01
add a comment |
To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date
column to true dates.
The concept is based on groupping by device
and then generating
start / end dates for each group.
The script generating start / end periods can be as follows:
import pandas as pd
# Source data
df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
'03/16', '04/16', '11/16', '12/16', '01/17' ],
'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
# Convert date to true dates
df.date = pd.to_datetime(df.date, format='%m/%y')
grp = df.groupby(by='device')
# Starts of periods
st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
!= x.date.shift()]).reset_index(level=1,drop=True)
# Ends of periods
en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
!= x.date.shift(-1)]).reset_index(level=1,drop=True)
# Result - concat start / end dates
res = pd.concat([st, en], axis=1).reset_index()
# Set column names
res.columns = ['device', 'start', 'end']
The result is:
device start end
0 1 2016-06-01 2016-08-01
1 1 2017-09-01 2017-10-01
2 2 2016-03-01 2016-04-01
3 2 2016-11-01 2017-01-01
add a comment |
To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date
column to true dates.
The concept is based on groupping by device
and then generating
start / end dates for each group.
The script generating start / end periods can be as follows:
import pandas as pd
# Source data
df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
'03/16', '04/16', '11/16', '12/16', '01/17' ],
'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
# Convert date to true dates
df.date = pd.to_datetime(df.date, format='%m/%y')
grp = df.groupby(by='device')
# Starts of periods
st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
!= x.date.shift()]).reset_index(level=1,drop=True)
# Ends of periods
en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
!= x.date.shift(-1)]).reset_index(level=1,drop=True)
# Result - concat start / end dates
res = pd.concat([st, en], axis=1).reset_index()
# Set column names
res.columns = ['device', 'start', 'end']
The result is:
device start end
0 1 2016-06-01 2016-08-01
1 1 2017-09-01 2017-10-01
2 2 2016-03-01 2016-04-01
3 2 2016-11-01 2017-01-01
To be more instructive, I extended a little your source data, so that it
contains full sets of 5 readings for 2 devices.
I also decided to change date
column to true dates.
The concept is based on groupping by device
and then generating
start / end dates for each group.
The script generating start / end periods can be as follows:
import pandas as pd
# Source data
df = pd.DataFrame(data={ 'device': [ 1, 1, 1, 1, 1, 2, 2, 2, 2, 2 ],
'date': [ '06/16', '07/16', '08/16', '09/17', '10/17',
'03/16', '04/16', '11/16', '12/16', '01/17' ],
'performance': [ 50.4, 54, 6, 67, 56, 50, 45, 30, 72.3, 42 ] })
# Convert date to true dates
df.date = pd.to_datetime(df.date, format='%m/%y')
grp = df.groupby(by='device')
# Starts of periods
st = grp.apply(lambda x: x.date[x.date - pd.DateOffset(months = 1)
!= x.date.shift()]).reset_index(level=1,drop=True)
# Ends of periods
en = grp.apply(lambda x: x.date[x.date + pd.DateOffset(months = 1)
!= x.date.shift(-1)]).reset_index(level=1,drop=True)
# Result - concat start / end dates
res = pd.concat([st, en], axis=1).reset_index()
# Set column names
res.columns = ['device', 'start', 'end']
The result is:
device start end
0 1 2016-06-01 2016-08-01
1 1 2017-09-01 2017-10-01
2 2 2016-03-01 2016-04-01
3 2 2016-11-01 2017-01-01
answered Nov 14 '18 at 21:03
Valdi_BoValdi_Bo
4,8752815
4,8752815
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305411%2fhow-to-find-gaps-in-dates-using-pandas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown