Filling in past and future data from partial data in Python
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
add a comment |
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
1
Have you tried any methods yet? Look atfillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 '18 at 17:36
add a comment |
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:
State Year Month Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0
The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.
I want the missing values to be filled with values preceding from the values before to get an output that looks like:
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0
python python-3.x pandas dataframe missing-data
python python-3.x pandas dataframe missing-data
asked Nov 13 '18 at 17:32
HelloToEarthHelloToEarth
435214
435214
1
Have you tried any methods yet? Look atfillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 '18 at 17:36
add a comment |
1
Have you tried any methods yet? Look atfillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)
– soundstripe
Nov 13 '18 at 17:36
1
1
Have you tried any methods yet? Look at
fillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)– soundstripe
Nov 13 '18 at 17:36
Have you tried any methods yet? Look at
fillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)– soundstripe
Nov 13 '18 at 17:36
add a comment |
2 Answers
2
active
oldest
votes
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
add a comment |
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286584%2ffilling-in-past-and-future-data-from-partial-data-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
add a comment |
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
add a comment |
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
How about pandas.interpolate
?: interpolate values according to different methods
See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html
And some existing example previously posted: Pandas interpolate() backwards in dataframe
answered Nov 13 '18 at 18:00
kon_ukon_u
1966
1966
add a comment |
add a comment |
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
add a comment |
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
add a comment |
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
You can create a dataframe with the missing months and them merge your result with it:
dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
end='1/08/%d' %df['Year'].max(),
freq='MS', closed='left')
>> dates
DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
'1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
'1987-10-01', '1987-11-01',
...
'2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
'2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
'2015-12-01', '2016-01-01'],
dtype='datetime64[ns]', length=348, freq='MS')
Then you can create a dataframe with all the months:
all_months = pd.DataFrame.from_records((dates.year, dates.month),
index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])
And then merge it with the original dataframe and forward-fill it:
df.merge(all_months, how='right').ffill()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1996.0 1.0 24410.0
5 TN 1996.0 2.0 24410.0
6 TN 1996.0 3.0 24410.0
7 TN 1996.0 4.0 24410.0
8 TN 1996.0 5.0 37109.0
9 TN 1996.0 6.0 37109.0
10 TN 1996.0 7.0 37109.0
11 TN 1996.0 8.0 37109.0
12 TN 1996.0 9.0 37109.0
13 TN 1996.0 10.0 37109.0
14 TN 1996.0 11.0 37109.0
15 TN 1996.0 12.0 37109.0
16 TN 2016.0 1.0 49808.0
17 TN 1987.0 5.0 49808.0
18 TN 1987.0 6.0 49808.0
19 TN 1987.0 7.0 49808.0
20 TN 1987.0 8.0 49808.0
21 TN 1987.0 9.0 49808.0
22 TN 1987.0 10.0 49808.0
23 TN 1987.0 11.0 49808.0
24 TN 1987.0 12.0 49808.0
25 TN 1988.0 1.0 49808.0
26 TN 1988.0 2.0 49808.0
27 TN 1988.0 3.0 49808.0
28 TN 1988.0 4.0 49808.0
29 TN 1988.0 5.0 49808.0
.. ... ... ... ...
319 TN 2013.0 7.0 49808.0
320 TN 2013.0 8.0 49808.0
321 TN 2013.0 9.0 49808.0
322 TN 2013.0 10.0 49808.0
323 TN 2013.0 11.0 49808.0
324 TN 2013.0 12.0 49808.0
325 TN 2014.0 1.0 49808.0
326 TN 2014.0 2.0 49808.0
327 TN 2014.0 3.0 49808.0
328 TN 2014.0 4.0 49808.0
329 TN 2014.0 5.0 49808.0
330 TN 2014.0 6.0 49808.0
331 TN 2014.0 7.0 49808.0
332 TN 2014.0 8.0 49808.0
333 TN 2014.0 9.0 49808.0
334 TN 2014.0 10.0 49808.0
335 TN 2014.0 11.0 49808.0
336 TN 2014.0 12.0 49808.0
337 TN 2015.0 1.0 49808.0
338 TN 2015.0 2.0 49808.0
339 TN 2015.0 3.0 49808.0
340 TN 2015.0 4.0 49808.0
341 TN 2015.0 5.0 49808.0
342 TN 2015.0 6.0 49808.0
343 TN 2015.0 7.0 49808.0
344 TN 2015.0 8.0 49808.0
345 TN 2015.0 9.0 49808.0
346 TN 2015.0 10.0 49808.0
347 TN 2015.0 11.0 49808.0
348 TN 2015.0 12.0 49808.0
Using pandas.resample
Another solution is index by date and then resample there:
df['Day'] = 1
df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')
>> df1
State Year Month Value Day
date
1987-01-01 TN 1987.0 1.0 24410.0 1
1987-02-01 TN 1987.0 2.0 24410.0 1
1987-03-01 TN 1987.0 3.0 24410.0 1
1987-04-01 TN 1987.0 4.0 24410.0 1
1996-01-01 TN 1996.0 1.0 24410.0 1
1996-02-01 TN 1996.0 2.0 24410.0 1
1996-03-01 TN 1996.0 3.0 24410.0 1
1996-04-01 TN 1996.0 4.0 24410.0 1
1996-05-01 TN 1996.0 5.0 37109.0 1
1996-06-01 TN 1996.0 6.0 37109.0 1
1996-07-01 TN 1996.0 7.0 37109.0 1
1996-08-01 TN 1996.0 8.0 37109.0 1
1996-09-01 TN 1996.0 9.0 37109.0 1
1996-10-01 TN 1996.0 10.0 37109.0 1
1996-11-01 TN 1996.0 11.0 37109.0 1
1996-12-01 TN 1996.0 12.0 37109.0 1
2016-01-01 TN 2016.0 1.0 49808.0 1
2016-02-01 TN 2016.0 2.0 49808.0 1
Then you can resample it by month by doing:
res = df1.resample('M').first().ffill()
>> res
State Year Month Value Day
date
1987-01-31 TN 1987.0 1.0 24410.0 1.0
1987-02-28 TN 1987.0 2.0 24410.0 1.0
1987-03-31 TN 1987.0 3.0 24410.0 1.0
1987-04-30 TN 1987.0 4.0 24410.0 1.0
1987-05-31 TN 1987.0 4.0 24410.0 1.0
1987-06-30 TN 1987.0 4.0 24410.0 1.0
1987-07-31 TN 1987.0 4.0 24410.0 1.0
1987-08-31 TN 1987.0 4.0 24410.0 1.0
1987-09-30 TN 1987.0 4.0 24410.0 1.0
1987-10-31 TN 1987.0 4.0 24410.0 1.0
1987-11-30 TN 1987.0 4.0 24410.0 1.0
1987-12-31 TN 1987.0 4.0 24410.0 1.0
1988-01-31 TN 1987.0 4.0 24410.0 1.0
1988-02-29 TN 1987.0 4.0 24410.0 1.0
1988-03-31 TN 1987.0 4.0 24410.0 1.0
1988-04-30 TN 1987.0 4.0 24410.0 1.0
1988-05-31 TN 1987.0 4.0 24410.0 1.0
1988-06-30 TN 1987.0 4.0 24410.0 1.0
1988-07-31 TN 1987.0 4.0 24410.0 1.0
1988-08-31 TN 1987.0 4.0 24410.0 1.0
1988-09-30 TN 1987.0 4.0 24410.0 1.0
1988-10-31 TN 1987.0 4.0 24410.0 1.0
1988-11-30 TN 1987.0 4.0 24410.0 1.0
1988-12-31 TN 1987.0 4.0 24410.0 1.0
1989-01-31 TN 1987.0 4.0 24410.0 1.0
1989-02-28 TN 1987.0 4.0 24410.0 1.0
1989-03-31 TN 1987.0 4.0 24410.0 1.0
1989-04-30 TN 1987.0 4.0 24410.0 1.0
1989-05-31 TN 1987.0 4.0 24410.0 1.0
1989-06-30 TN 1987.0 4.0 24410.0 1.0
... ... ... ... ... ...
2013-09-30 TN 1996.0 12.0 37109.0 1.0
2013-10-31 TN 1996.0 12.0 37109.0 1.0
2013-11-30 TN 1996.0 12.0 37109.0 1.0
2013-12-31 TN 1996.0 12.0 37109.0 1.0
2014-01-31 TN 1996.0 12.0 37109.0 1.0
2014-02-28 TN 1996.0 12.0 37109.0 1.0
2014-03-31 TN 1996.0 12.0 37109.0 1.0
2014-04-30 TN 1996.0 12.0 37109.0 1.0
2014-05-31 TN 1996.0 12.0 37109.0 1.0
2014-06-30 TN 1996.0 12.0 37109.0 1.0
2014-07-31 TN 1996.0 12.0 37109.0 1.0
2014-08-31 TN 1996.0 12.0 37109.0 1.0
2014-09-30 TN 1996.0 12.0 37109.0 1.0
2014-10-31 TN 1996.0 12.0 37109.0 1.0
2014-11-30 TN 1996.0 12.0 37109.0 1.0
2014-12-31 TN 1996.0 12.0 37109.0 1.0
2015-01-31 TN 1996.0 12.0 37109.0 1.0
2015-02-28 TN 1996.0 12.0 37109.0 1.0
2015-03-31 TN 1996.0 12.0 37109.0 1.0
2015-04-30 TN 1996.0 12.0 37109.0 1.0
2015-05-31 TN 1996.0 12.0 37109.0 1.0
2015-06-30 TN 1996.0 12.0 37109.0 1.0
2015-07-31 TN 1996.0 12.0 37109.0 1.0
2015-08-31 TN 1996.0 12.0 37109.0 1.0
2015-09-30 TN 1996.0 12.0 37109.0 1.0
2015-10-31 TN 1996.0 12.0 37109.0 1.0
2015-11-30 TN 1996.0 12.0 37109.0 1.0
2015-12-31 TN 1996.0 12.0 37109.0 1.0
2016-01-31 TN 2016.0 1.0 49808.0 1.0
2016-02-29 TN 2016.0 2.0 49808.0 1.0
You can get the original structure by doing:
>> res.reset_index(drop=True).drop(['Day'], axis=1).head()
State Year Month Value
0 TN 1987.0 1.0 24410.0
1 TN 1987.0 2.0 24410.0
2 TN 1987.0 3.0 24410.0
3 TN 1987.0 4.0 24410.0
4 TN 1987.0 4.0 24410.0
5 TN 1987.0 4.0 24410.0
6 TN 1987.0 4.0 24410.0
7 TN 1987.0 4.0 24410.0
8 TN 1987.0 4.0 24410.0
edited Nov 13 '18 at 18:04
answered Nov 13 '18 at 17:51
Mabel VillalbaMabel Villalba
1,485214
1,485214
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286584%2ffilling-in-past-and-future-data-from-partial-data-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Have you tried any methods yet? Look at
fillna()
(pandas.pydata.org/pandas-docs/stable/generated/…)– soundstripe
Nov 13 '18 at 17:36