Filling in past and future data from partial data in Python












0















I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0









share|improve this question


















  • 1





    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)

    – soundstripe
    Nov 13 '18 at 17:36


















0















I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0









share|improve this question


















  • 1





    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)

    – soundstripe
    Nov 13 '18 at 17:36
















0












0








0








I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0









share|improve this question














I have take a cumulative sum for data that goes up from 198x to 2016 and is now in the form:



State   Year    Month   Value
TN 1987 1 24410.0
TN 1987 2 24410.0
TN 1987 3 24410.0
TN 1987 4 24410.0
.
.
TN 1996 1 24410.0
TN 1996 2 24410.0
TN 1996 3 24410.0
TN 1996 4 24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
TN 1996 7 37109.0
TN 1996 8 37109.0
TN 1996 9 37109.0
TN 1996 10 37109.0
TN 1996 11 37109.0
TN 1996 12 37109.0
TN 2016 1 49808.0
TN 2016 2 49808.0


The data actually skips from 1996 to 2016 (for the case of TN but varies on State to State). I need to find a method to generally fill all the missing holes in the data because some years just don't exist (2010-2015) and I want to fill them so that the output goes all the way to 2018.



I want the missing values to be filled with values preceding from the values before to get an output that looks like:



TN      1996    4       24410.0
TN 1996 5 37109.0
TN 1996 6 37109.0
.
.
TN 2010 1 37109.0
TN 2010 2 37109.0
TN 2010 3 37109.0
.
.
TN 2016 1 37109.0
TN 2016 2 37109.0
.
.
TN 2016 11 49808.0
TN 2016 12 49808.0
.
.
TN 2017 1 49808.0
TN 2017 2 49808.0
TN 2017 3 49808.0
TN 2017 4 49808.0
.
.
TN 2018 1 49808.0
TN 2018 2 49808.0






python python-3.x pandas dataframe missing-data






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 13 '18 at 17:32









HelloToEarthHelloToEarth

435214




435214








  • 1





    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)

    – soundstripe
    Nov 13 '18 at 17:36
















  • 1





    Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)

    – soundstripe
    Nov 13 '18 at 17:36










1




1





Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)

– soundstripe
Nov 13 '18 at 17:36







Have you tried any methods yet? Look at fillna() (pandas.pydata.org/pandas-docs/stable/generated/…)

– soundstripe
Nov 13 '18 at 17:36














2 Answers
2






active

oldest

votes


















0














How about pandas.interpolate?: interpolate values according to different methods



See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



And some existing example previously posted: Pandas interpolate() backwards in dataframe






share|improve this answer































    0














    You can create a dataframe with the missing months and them merge your result with it:



    dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
    end='1/08/%d' %df['Year'].max(),
    freq='MS', closed='left')

    >> dates

    DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
    '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
    '1987-10-01', '1987-11-01',
    ...
    '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
    '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
    '2015-12-01', '2016-01-01'],
    dtype='datetime64[ns]', length=348, freq='MS')


    Then you can create a dataframe with all the months:



    all_months = pd.DataFrame.from_records((dates.year, dates.month),
    index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


    And then merge it with the original dataframe and forward-fill it:



    df.merge(all_months, how='right').ffill()

    State Year Month Value
    0 TN 1987.0 1.0 24410.0
    1 TN 1987.0 2.0 24410.0
    2 TN 1987.0 3.0 24410.0
    3 TN 1987.0 4.0 24410.0
    4 TN 1996.0 1.0 24410.0
    5 TN 1996.0 2.0 24410.0
    6 TN 1996.0 3.0 24410.0
    7 TN 1996.0 4.0 24410.0
    8 TN 1996.0 5.0 37109.0
    9 TN 1996.0 6.0 37109.0
    10 TN 1996.0 7.0 37109.0
    11 TN 1996.0 8.0 37109.0
    12 TN 1996.0 9.0 37109.0
    13 TN 1996.0 10.0 37109.0
    14 TN 1996.0 11.0 37109.0
    15 TN 1996.0 12.0 37109.0
    16 TN 2016.0 1.0 49808.0
    17 TN 1987.0 5.0 49808.0
    18 TN 1987.0 6.0 49808.0
    19 TN 1987.0 7.0 49808.0
    20 TN 1987.0 8.0 49808.0
    21 TN 1987.0 9.0 49808.0
    22 TN 1987.0 10.0 49808.0
    23 TN 1987.0 11.0 49808.0
    24 TN 1987.0 12.0 49808.0
    25 TN 1988.0 1.0 49808.0
    26 TN 1988.0 2.0 49808.0
    27 TN 1988.0 3.0 49808.0
    28 TN 1988.0 4.0 49808.0
    29 TN 1988.0 5.0 49808.0
    .. ... ... ... ...
    319 TN 2013.0 7.0 49808.0
    320 TN 2013.0 8.0 49808.0
    321 TN 2013.0 9.0 49808.0
    322 TN 2013.0 10.0 49808.0
    323 TN 2013.0 11.0 49808.0
    324 TN 2013.0 12.0 49808.0
    325 TN 2014.0 1.0 49808.0
    326 TN 2014.0 2.0 49808.0
    327 TN 2014.0 3.0 49808.0
    328 TN 2014.0 4.0 49808.0
    329 TN 2014.0 5.0 49808.0
    330 TN 2014.0 6.0 49808.0
    331 TN 2014.0 7.0 49808.0
    332 TN 2014.0 8.0 49808.0
    333 TN 2014.0 9.0 49808.0
    334 TN 2014.0 10.0 49808.0
    335 TN 2014.0 11.0 49808.0
    336 TN 2014.0 12.0 49808.0
    337 TN 2015.0 1.0 49808.0
    338 TN 2015.0 2.0 49808.0
    339 TN 2015.0 3.0 49808.0
    340 TN 2015.0 4.0 49808.0
    341 TN 2015.0 5.0 49808.0
    342 TN 2015.0 6.0 49808.0
    343 TN 2015.0 7.0 49808.0
    344 TN 2015.0 8.0 49808.0
    345 TN 2015.0 9.0 49808.0
    346 TN 2015.0 10.0 49808.0
    347 TN 2015.0 11.0 49808.0
    348 TN 2015.0 12.0 49808.0


    Using pandas.resample



    Another solution is index by date and then resample there:



    df['Day'] = 1

    df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

    >> df1

    State Year Month Value Day
    date
    1987-01-01 TN 1987.0 1.0 24410.0 1
    1987-02-01 TN 1987.0 2.0 24410.0 1
    1987-03-01 TN 1987.0 3.0 24410.0 1
    1987-04-01 TN 1987.0 4.0 24410.0 1
    1996-01-01 TN 1996.0 1.0 24410.0 1
    1996-02-01 TN 1996.0 2.0 24410.0 1
    1996-03-01 TN 1996.0 3.0 24410.0 1
    1996-04-01 TN 1996.0 4.0 24410.0 1
    1996-05-01 TN 1996.0 5.0 37109.0 1
    1996-06-01 TN 1996.0 6.0 37109.0 1
    1996-07-01 TN 1996.0 7.0 37109.0 1
    1996-08-01 TN 1996.0 8.0 37109.0 1
    1996-09-01 TN 1996.0 9.0 37109.0 1
    1996-10-01 TN 1996.0 10.0 37109.0 1
    1996-11-01 TN 1996.0 11.0 37109.0 1
    1996-12-01 TN 1996.0 12.0 37109.0 1
    2016-01-01 TN 2016.0 1.0 49808.0 1
    2016-02-01 TN 2016.0 2.0 49808.0 1


    Then you can resample it by month by doing:



        res = df1.resample('M').first().ffill()

    >> res

    State Year Month Value Day
    date
    1987-01-31 TN 1987.0 1.0 24410.0 1.0
    1987-02-28 TN 1987.0 2.0 24410.0 1.0
    1987-03-31 TN 1987.0 3.0 24410.0 1.0
    1987-04-30 TN 1987.0 4.0 24410.0 1.0
    1987-05-31 TN 1987.0 4.0 24410.0 1.0
    1987-06-30 TN 1987.0 4.0 24410.0 1.0
    1987-07-31 TN 1987.0 4.0 24410.0 1.0
    1987-08-31 TN 1987.0 4.0 24410.0 1.0
    1987-09-30 TN 1987.0 4.0 24410.0 1.0
    1987-10-31 TN 1987.0 4.0 24410.0 1.0
    1987-11-30 TN 1987.0 4.0 24410.0 1.0
    1987-12-31 TN 1987.0 4.0 24410.0 1.0
    1988-01-31 TN 1987.0 4.0 24410.0 1.0
    1988-02-29 TN 1987.0 4.0 24410.0 1.0
    1988-03-31 TN 1987.0 4.0 24410.0 1.0
    1988-04-30 TN 1987.0 4.0 24410.0 1.0
    1988-05-31 TN 1987.0 4.0 24410.0 1.0
    1988-06-30 TN 1987.0 4.0 24410.0 1.0
    1988-07-31 TN 1987.0 4.0 24410.0 1.0
    1988-08-31 TN 1987.0 4.0 24410.0 1.0
    1988-09-30 TN 1987.0 4.0 24410.0 1.0
    1988-10-31 TN 1987.0 4.0 24410.0 1.0
    1988-11-30 TN 1987.0 4.0 24410.0 1.0
    1988-12-31 TN 1987.0 4.0 24410.0 1.0
    1989-01-31 TN 1987.0 4.0 24410.0 1.0
    1989-02-28 TN 1987.0 4.0 24410.0 1.0
    1989-03-31 TN 1987.0 4.0 24410.0 1.0
    1989-04-30 TN 1987.0 4.0 24410.0 1.0
    1989-05-31 TN 1987.0 4.0 24410.0 1.0
    1989-06-30 TN 1987.0 4.0 24410.0 1.0
    ... ... ... ... ... ...
    2013-09-30 TN 1996.0 12.0 37109.0 1.0
    2013-10-31 TN 1996.0 12.0 37109.0 1.0
    2013-11-30 TN 1996.0 12.0 37109.0 1.0
    2013-12-31 TN 1996.0 12.0 37109.0 1.0
    2014-01-31 TN 1996.0 12.0 37109.0 1.0
    2014-02-28 TN 1996.0 12.0 37109.0 1.0
    2014-03-31 TN 1996.0 12.0 37109.0 1.0
    2014-04-30 TN 1996.0 12.0 37109.0 1.0
    2014-05-31 TN 1996.0 12.0 37109.0 1.0
    2014-06-30 TN 1996.0 12.0 37109.0 1.0
    2014-07-31 TN 1996.0 12.0 37109.0 1.0
    2014-08-31 TN 1996.0 12.0 37109.0 1.0
    2014-09-30 TN 1996.0 12.0 37109.0 1.0
    2014-10-31 TN 1996.0 12.0 37109.0 1.0
    2014-11-30 TN 1996.0 12.0 37109.0 1.0
    2014-12-31 TN 1996.0 12.0 37109.0 1.0
    2015-01-31 TN 1996.0 12.0 37109.0 1.0
    2015-02-28 TN 1996.0 12.0 37109.0 1.0
    2015-03-31 TN 1996.0 12.0 37109.0 1.0
    2015-04-30 TN 1996.0 12.0 37109.0 1.0
    2015-05-31 TN 1996.0 12.0 37109.0 1.0
    2015-06-30 TN 1996.0 12.0 37109.0 1.0
    2015-07-31 TN 1996.0 12.0 37109.0 1.0
    2015-08-31 TN 1996.0 12.0 37109.0 1.0
    2015-09-30 TN 1996.0 12.0 37109.0 1.0
    2015-10-31 TN 1996.0 12.0 37109.0 1.0
    2015-11-30 TN 1996.0 12.0 37109.0 1.0
    2015-12-31 TN 1996.0 12.0 37109.0 1.0
    2016-01-31 TN 2016.0 1.0 49808.0 1.0
    2016-02-29 TN 2016.0 2.0 49808.0 1.0


    You can get the original structure by doing:



    >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

    State Year Month Value
    0 TN 1987.0 1.0 24410.0
    1 TN 1987.0 2.0 24410.0
    2 TN 1987.0 3.0 24410.0
    3 TN 1987.0 4.0 24410.0
    4 TN 1987.0 4.0 24410.0
    5 TN 1987.0 4.0 24410.0
    6 TN 1987.0 4.0 24410.0
    7 TN 1987.0 4.0 24410.0
    8 TN 1987.0 4.0 24410.0





    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286584%2ffilling-in-past-and-future-data-from-partial-data-in-python%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      How about pandas.interpolate?: interpolate values according to different methods



      See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



      And some existing example previously posted: Pandas interpolate() backwards in dataframe






      share|improve this answer




























        0














        How about pandas.interpolate?: interpolate values according to different methods



        See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



        And some existing example previously posted: Pandas interpolate() backwards in dataframe






        share|improve this answer


























          0












          0








          0







          How about pandas.interpolate?: interpolate values according to different methods



          See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



          And some existing example previously posted: Pandas interpolate() backwards in dataframe






          share|improve this answer













          How about pandas.interpolate?: interpolate values according to different methods



          See section 'interpolate' here: https://pandas.pydata.org/pandas-docs/stable/missing_data.html



          And some existing example previously posted: Pandas interpolate() backwards in dataframe







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 13 '18 at 18:00









          kon_ukon_u

          1966




          1966

























              0














              You can create a dataframe with the missing months and them merge your result with it:



              dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
              end='1/08/%d' %df['Year'].max(),
              freq='MS', closed='left')

              >> dates

              DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
              '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
              '1987-10-01', '1987-11-01',
              ...
              '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
              '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
              '2015-12-01', '2016-01-01'],
              dtype='datetime64[ns]', length=348, freq='MS')


              Then you can create a dataframe with all the months:



              all_months = pd.DataFrame.from_records((dates.year, dates.month),
              index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


              And then merge it with the original dataframe and forward-fill it:



              df.merge(all_months, how='right').ffill()

              State Year Month Value
              0 TN 1987.0 1.0 24410.0
              1 TN 1987.0 2.0 24410.0
              2 TN 1987.0 3.0 24410.0
              3 TN 1987.0 4.0 24410.0
              4 TN 1996.0 1.0 24410.0
              5 TN 1996.0 2.0 24410.0
              6 TN 1996.0 3.0 24410.0
              7 TN 1996.0 4.0 24410.0
              8 TN 1996.0 5.0 37109.0
              9 TN 1996.0 6.0 37109.0
              10 TN 1996.0 7.0 37109.0
              11 TN 1996.0 8.0 37109.0
              12 TN 1996.0 9.0 37109.0
              13 TN 1996.0 10.0 37109.0
              14 TN 1996.0 11.0 37109.0
              15 TN 1996.0 12.0 37109.0
              16 TN 2016.0 1.0 49808.0
              17 TN 1987.0 5.0 49808.0
              18 TN 1987.0 6.0 49808.0
              19 TN 1987.0 7.0 49808.0
              20 TN 1987.0 8.0 49808.0
              21 TN 1987.0 9.0 49808.0
              22 TN 1987.0 10.0 49808.0
              23 TN 1987.0 11.0 49808.0
              24 TN 1987.0 12.0 49808.0
              25 TN 1988.0 1.0 49808.0
              26 TN 1988.0 2.0 49808.0
              27 TN 1988.0 3.0 49808.0
              28 TN 1988.0 4.0 49808.0
              29 TN 1988.0 5.0 49808.0
              .. ... ... ... ...
              319 TN 2013.0 7.0 49808.0
              320 TN 2013.0 8.0 49808.0
              321 TN 2013.0 9.0 49808.0
              322 TN 2013.0 10.0 49808.0
              323 TN 2013.0 11.0 49808.0
              324 TN 2013.0 12.0 49808.0
              325 TN 2014.0 1.0 49808.0
              326 TN 2014.0 2.0 49808.0
              327 TN 2014.0 3.0 49808.0
              328 TN 2014.0 4.0 49808.0
              329 TN 2014.0 5.0 49808.0
              330 TN 2014.0 6.0 49808.0
              331 TN 2014.0 7.0 49808.0
              332 TN 2014.0 8.0 49808.0
              333 TN 2014.0 9.0 49808.0
              334 TN 2014.0 10.0 49808.0
              335 TN 2014.0 11.0 49808.0
              336 TN 2014.0 12.0 49808.0
              337 TN 2015.0 1.0 49808.0
              338 TN 2015.0 2.0 49808.0
              339 TN 2015.0 3.0 49808.0
              340 TN 2015.0 4.0 49808.0
              341 TN 2015.0 5.0 49808.0
              342 TN 2015.0 6.0 49808.0
              343 TN 2015.0 7.0 49808.0
              344 TN 2015.0 8.0 49808.0
              345 TN 2015.0 9.0 49808.0
              346 TN 2015.0 10.0 49808.0
              347 TN 2015.0 11.0 49808.0
              348 TN 2015.0 12.0 49808.0


              Using pandas.resample



              Another solution is index by date and then resample there:



              df['Day'] = 1

              df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

              >> df1

              State Year Month Value Day
              date
              1987-01-01 TN 1987.0 1.0 24410.0 1
              1987-02-01 TN 1987.0 2.0 24410.0 1
              1987-03-01 TN 1987.0 3.0 24410.0 1
              1987-04-01 TN 1987.0 4.0 24410.0 1
              1996-01-01 TN 1996.0 1.0 24410.0 1
              1996-02-01 TN 1996.0 2.0 24410.0 1
              1996-03-01 TN 1996.0 3.0 24410.0 1
              1996-04-01 TN 1996.0 4.0 24410.0 1
              1996-05-01 TN 1996.0 5.0 37109.0 1
              1996-06-01 TN 1996.0 6.0 37109.0 1
              1996-07-01 TN 1996.0 7.0 37109.0 1
              1996-08-01 TN 1996.0 8.0 37109.0 1
              1996-09-01 TN 1996.0 9.0 37109.0 1
              1996-10-01 TN 1996.0 10.0 37109.0 1
              1996-11-01 TN 1996.0 11.0 37109.0 1
              1996-12-01 TN 1996.0 12.0 37109.0 1
              2016-01-01 TN 2016.0 1.0 49808.0 1
              2016-02-01 TN 2016.0 2.0 49808.0 1


              Then you can resample it by month by doing:



                  res = df1.resample('M').first().ffill()

              >> res

              State Year Month Value Day
              date
              1987-01-31 TN 1987.0 1.0 24410.0 1.0
              1987-02-28 TN 1987.0 2.0 24410.0 1.0
              1987-03-31 TN 1987.0 3.0 24410.0 1.0
              1987-04-30 TN 1987.0 4.0 24410.0 1.0
              1987-05-31 TN 1987.0 4.0 24410.0 1.0
              1987-06-30 TN 1987.0 4.0 24410.0 1.0
              1987-07-31 TN 1987.0 4.0 24410.0 1.0
              1987-08-31 TN 1987.0 4.0 24410.0 1.0
              1987-09-30 TN 1987.0 4.0 24410.0 1.0
              1987-10-31 TN 1987.0 4.0 24410.0 1.0
              1987-11-30 TN 1987.0 4.0 24410.0 1.0
              1987-12-31 TN 1987.0 4.0 24410.0 1.0
              1988-01-31 TN 1987.0 4.0 24410.0 1.0
              1988-02-29 TN 1987.0 4.0 24410.0 1.0
              1988-03-31 TN 1987.0 4.0 24410.0 1.0
              1988-04-30 TN 1987.0 4.0 24410.0 1.0
              1988-05-31 TN 1987.0 4.0 24410.0 1.0
              1988-06-30 TN 1987.0 4.0 24410.0 1.0
              1988-07-31 TN 1987.0 4.0 24410.0 1.0
              1988-08-31 TN 1987.0 4.0 24410.0 1.0
              1988-09-30 TN 1987.0 4.0 24410.0 1.0
              1988-10-31 TN 1987.0 4.0 24410.0 1.0
              1988-11-30 TN 1987.0 4.0 24410.0 1.0
              1988-12-31 TN 1987.0 4.0 24410.0 1.0
              1989-01-31 TN 1987.0 4.0 24410.0 1.0
              1989-02-28 TN 1987.0 4.0 24410.0 1.0
              1989-03-31 TN 1987.0 4.0 24410.0 1.0
              1989-04-30 TN 1987.0 4.0 24410.0 1.0
              1989-05-31 TN 1987.0 4.0 24410.0 1.0
              1989-06-30 TN 1987.0 4.0 24410.0 1.0
              ... ... ... ... ... ...
              2013-09-30 TN 1996.0 12.0 37109.0 1.0
              2013-10-31 TN 1996.0 12.0 37109.0 1.0
              2013-11-30 TN 1996.0 12.0 37109.0 1.0
              2013-12-31 TN 1996.0 12.0 37109.0 1.0
              2014-01-31 TN 1996.0 12.0 37109.0 1.0
              2014-02-28 TN 1996.0 12.0 37109.0 1.0
              2014-03-31 TN 1996.0 12.0 37109.0 1.0
              2014-04-30 TN 1996.0 12.0 37109.0 1.0
              2014-05-31 TN 1996.0 12.0 37109.0 1.0
              2014-06-30 TN 1996.0 12.0 37109.0 1.0
              2014-07-31 TN 1996.0 12.0 37109.0 1.0
              2014-08-31 TN 1996.0 12.0 37109.0 1.0
              2014-09-30 TN 1996.0 12.0 37109.0 1.0
              2014-10-31 TN 1996.0 12.0 37109.0 1.0
              2014-11-30 TN 1996.0 12.0 37109.0 1.0
              2014-12-31 TN 1996.0 12.0 37109.0 1.0
              2015-01-31 TN 1996.0 12.0 37109.0 1.0
              2015-02-28 TN 1996.0 12.0 37109.0 1.0
              2015-03-31 TN 1996.0 12.0 37109.0 1.0
              2015-04-30 TN 1996.0 12.0 37109.0 1.0
              2015-05-31 TN 1996.0 12.0 37109.0 1.0
              2015-06-30 TN 1996.0 12.0 37109.0 1.0
              2015-07-31 TN 1996.0 12.0 37109.0 1.0
              2015-08-31 TN 1996.0 12.0 37109.0 1.0
              2015-09-30 TN 1996.0 12.0 37109.0 1.0
              2015-10-31 TN 1996.0 12.0 37109.0 1.0
              2015-11-30 TN 1996.0 12.0 37109.0 1.0
              2015-12-31 TN 1996.0 12.0 37109.0 1.0
              2016-01-31 TN 2016.0 1.0 49808.0 1.0
              2016-02-29 TN 2016.0 2.0 49808.0 1.0


              You can get the original structure by doing:



              >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

              State Year Month Value
              0 TN 1987.0 1.0 24410.0
              1 TN 1987.0 2.0 24410.0
              2 TN 1987.0 3.0 24410.0
              3 TN 1987.0 4.0 24410.0
              4 TN 1987.0 4.0 24410.0
              5 TN 1987.0 4.0 24410.0
              6 TN 1987.0 4.0 24410.0
              7 TN 1987.0 4.0 24410.0
              8 TN 1987.0 4.0 24410.0





              share|improve this answer






























                0














                You can create a dataframe with the missing months and them merge your result with it:



                dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
                end='1/08/%d' %df['Year'].max(),
                freq='MS', closed='left')

                >> dates

                DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
                '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
                '1987-10-01', '1987-11-01',
                ...
                '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
                '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
                '2015-12-01', '2016-01-01'],
                dtype='datetime64[ns]', length=348, freq='MS')


                Then you can create a dataframe with all the months:



                all_months = pd.DataFrame.from_records((dates.year, dates.month),
                index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


                And then merge it with the original dataframe and forward-fill it:



                df.merge(all_months, how='right').ffill()

                State Year Month Value
                0 TN 1987.0 1.0 24410.0
                1 TN 1987.0 2.0 24410.0
                2 TN 1987.0 3.0 24410.0
                3 TN 1987.0 4.0 24410.0
                4 TN 1996.0 1.0 24410.0
                5 TN 1996.0 2.0 24410.0
                6 TN 1996.0 3.0 24410.0
                7 TN 1996.0 4.0 24410.0
                8 TN 1996.0 5.0 37109.0
                9 TN 1996.0 6.0 37109.0
                10 TN 1996.0 7.0 37109.0
                11 TN 1996.0 8.0 37109.0
                12 TN 1996.0 9.0 37109.0
                13 TN 1996.0 10.0 37109.0
                14 TN 1996.0 11.0 37109.0
                15 TN 1996.0 12.0 37109.0
                16 TN 2016.0 1.0 49808.0
                17 TN 1987.0 5.0 49808.0
                18 TN 1987.0 6.0 49808.0
                19 TN 1987.0 7.0 49808.0
                20 TN 1987.0 8.0 49808.0
                21 TN 1987.0 9.0 49808.0
                22 TN 1987.0 10.0 49808.0
                23 TN 1987.0 11.0 49808.0
                24 TN 1987.0 12.0 49808.0
                25 TN 1988.0 1.0 49808.0
                26 TN 1988.0 2.0 49808.0
                27 TN 1988.0 3.0 49808.0
                28 TN 1988.0 4.0 49808.0
                29 TN 1988.0 5.0 49808.0
                .. ... ... ... ...
                319 TN 2013.0 7.0 49808.0
                320 TN 2013.0 8.0 49808.0
                321 TN 2013.0 9.0 49808.0
                322 TN 2013.0 10.0 49808.0
                323 TN 2013.0 11.0 49808.0
                324 TN 2013.0 12.0 49808.0
                325 TN 2014.0 1.0 49808.0
                326 TN 2014.0 2.0 49808.0
                327 TN 2014.0 3.0 49808.0
                328 TN 2014.0 4.0 49808.0
                329 TN 2014.0 5.0 49808.0
                330 TN 2014.0 6.0 49808.0
                331 TN 2014.0 7.0 49808.0
                332 TN 2014.0 8.0 49808.0
                333 TN 2014.0 9.0 49808.0
                334 TN 2014.0 10.0 49808.0
                335 TN 2014.0 11.0 49808.0
                336 TN 2014.0 12.0 49808.0
                337 TN 2015.0 1.0 49808.0
                338 TN 2015.0 2.0 49808.0
                339 TN 2015.0 3.0 49808.0
                340 TN 2015.0 4.0 49808.0
                341 TN 2015.0 5.0 49808.0
                342 TN 2015.0 6.0 49808.0
                343 TN 2015.0 7.0 49808.0
                344 TN 2015.0 8.0 49808.0
                345 TN 2015.0 9.0 49808.0
                346 TN 2015.0 10.0 49808.0
                347 TN 2015.0 11.0 49808.0
                348 TN 2015.0 12.0 49808.0


                Using pandas.resample



                Another solution is index by date and then resample there:



                df['Day'] = 1

                df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

                >> df1

                State Year Month Value Day
                date
                1987-01-01 TN 1987.0 1.0 24410.0 1
                1987-02-01 TN 1987.0 2.0 24410.0 1
                1987-03-01 TN 1987.0 3.0 24410.0 1
                1987-04-01 TN 1987.0 4.0 24410.0 1
                1996-01-01 TN 1996.0 1.0 24410.0 1
                1996-02-01 TN 1996.0 2.0 24410.0 1
                1996-03-01 TN 1996.0 3.0 24410.0 1
                1996-04-01 TN 1996.0 4.0 24410.0 1
                1996-05-01 TN 1996.0 5.0 37109.0 1
                1996-06-01 TN 1996.0 6.0 37109.0 1
                1996-07-01 TN 1996.0 7.0 37109.0 1
                1996-08-01 TN 1996.0 8.0 37109.0 1
                1996-09-01 TN 1996.0 9.0 37109.0 1
                1996-10-01 TN 1996.0 10.0 37109.0 1
                1996-11-01 TN 1996.0 11.0 37109.0 1
                1996-12-01 TN 1996.0 12.0 37109.0 1
                2016-01-01 TN 2016.0 1.0 49808.0 1
                2016-02-01 TN 2016.0 2.0 49808.0 1


                Then you can resample it by month by doing:



                    res = df1.resample('M').first().ffill()

                >> res

                State Year Month Value Day
                date
                1987-01-31 TN 1987.0 1.0 24410.0 1.0
                1987-02-28 TN 1987.0 2.0 24410.0 1.0
                1987-03-31 TN 1987.0 3.0 24410.0 1.0
                1987-04-30 TN 1987.0 4.0 24410.0 1.0
                1987-05-31 TN 1987.0 4.0 24410.0 1.0
                1987-06-30 TN 1987.0 4.0 24410.0 1.0
                1987-07-31 TN 1987.0 4.0 24410.0 1.0
                1987-08-31 TN 1987.0 4.0 24410.0 1.0
                1987-09-30 TN 1987.0 4.0 24410.0 1.0
                1987-10-31 TN 1987.0 4.0 24410.0 1.0
                1987-11-30 TN 1987.0 4.0 24410.0 1.0
                1987-12-31 TN 1987.0 4.0 24410.0 1.0
                1988-01-31 TN 1987.0 4.0 24410.0 1.0
                1988-02-29 TN 1987.0 4.0 24410.0 1.0
                1988-03-31 TN 1987.0 4.0 24410.0 1.0
                1988-04-30 TN 1987.0 4.0 24410.0 1.0
                1988-05-31 TN 1987.0 4.0 24410.0 1.0
                1988-06-30 TN 1987.0 4.0 24410.0 1.0
                1988-07-31 TN 1987.0 4.0 24410.0 1.0
                1988-08-31 TN 1987.0 4.0 24410.0 1.0
                1988-09-30 TN 1987.0 4.0 24410.0 1.0
                1988-10-31 TN 1987.0 4.0 24410.0 1.0
                1988-11-30 TN 1987.0 4.0 24410.0 1.0
                1988-12-31 TN 1987.0 4.0 24410.0 1.0
                1989-01-31 TN 1987.0 4.0 24410.0 1.0
                1989-02-28 TN 1987.0 4.0 24410.0 1.0
                1989-03-31 TN 1987.0 4.0 24410.0 1.0
                1989-04-30 TN 1987.0 4.0 24410.0 1.0
                1989-05-31 TN 1987.0 4.0 24410.0 1.0
                1989-06-30 TN 1987.0 4.0 24410.0 1.0
                ... ... ... ... ... ...
                2013-09-30 TN 1996.0 12.0 37109.0 1.0
                2013-10-31 TN 1996.0 12.0 37109.0 1.0
                2013-11-30 TN 1996.0 12.0 37109.0 1.0
                2013-12-31 TN 1996.0 12.0 37109.0 1.0
                2014-01-31 TN 1996.0 12.0 37109.0 1.0
                2014-02-28 TN 1996.0 12.0 37109.0 1.0
                2014-03-31 TN 1996.0 12.0 37109.0 1.0
                2014-04-30 TN 1996.0 12.0 37109.0 1.0
                2014-05-31 TN 1996.0 12.0 37109.0 1.0
                2014-06-30 TN 1996.0 12.0 37109.0 1.0
                2014-07-31 TN 1996.0 12.0 37109.0 1.0
                2014-08-31 TN 1996.0 12.0 37109.0 1.0
                2014-09-30 TN 1996.0 12.0 37109.0 1.0
                2014-10-31 TN 1996.0 12.0 37109.0 1.0
                2014-11-30 TN 1996.0 12.0 37109.0 1.0
                2014-12-31 TN 1996.0 12.0 37109.0 1.0
                2015-01-31 TN 1996.0 12.0 37109.0 1.0
                2015-02-28 TN 1996.0 12.0 37109.0 1.0
                2015-03-31 TN 1996.0 12.0 37109.0 1.0
                2015-04-30 TN 1996.0 12.0 37109.0 1.0
                2015-05-31 TN 1996.0 12.0 37109.0 1.0
                2015-06-30 TN 1996.0 12.0 37109.0 1.0
                2015-07-31 TN 1996.0 12.0 37109.0 1.0
                2015-08-31 TN 1996.0 12.0 37109.0 1.0
                2015-09-30 TN 1996.0 12.0 37109.0 1.0
                2015-10-31 TN 1996.0 12.0 37109.0 1.0
                2015-11-30 TN 1996.0 12.0 37109.0 1.0
                2015-12-31 TN 1996.0 12.0 37109.0 1.0
                2016-01-31 TN 2016.0 1.0 49808.0 1.0
                2016-02-29 TN 2016.0 2.0 49808.0 1.0


                You can get the original structure by doing:



                >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

                State Year Month Value
                0 TN 1987.0 1.0 24410.0
                1 TN 1987.0 2.0 24410.0
                2 TN 1987.0 3.0 24410.0
                3 TN 1987.0 4.0 24410.0
                4 TN 1987.0 4.0 24410.0
                5 TN 1987.0 4.0 24410.0
                6 TN 1987.0 4.0 24410.0
                7 TN 1987.0 4.0 24410.0
                8 TN 1987.0 4.0 24410.0





                share|improve this answer




























                  0












                  0








                  0







                  You can create a dataframe with the missing months and them merge your result with it:



                  dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
                  end='1/08/%d' %df['Year'].max(),
                  freq='MS', closed='left')

                  >> dates

                  DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
                  '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
                  '1987-10-01', '1987-11-01',
                  ...
                  '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
                  '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
                  '2015-12-01', '2016-01-01'],
                  dtype='datetime64[ns]', length=348, freq='MS')


                  Then you can create a dataframe with all the months:



                  all_months = pd.DataFrame.from_records((dates.year, dates.month),
                  index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


                  And then merge it with the original dataframe and forward-fill it:



                  df.merge(all_months, how='right').ffill()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1996.0 1.0 24410.0
                  5 TN 1996.0 2.0 24410.0
                  6 TN 1996.0 3.0 24410.0
                  7 TN 1996.0 4.0 24410.0
                  8 TN 1996.0 5.0 37109.0
                  9 TN 1996.0 6.0 37109.0
                  10 TN 1996.0 7.0 37109.0
                  11 TN 1996.0 8.0 37109.0
                  12 TN 1996.0 9.0 37109.0
                  13 TN 1996.0 10.0 37109.0
                  14 TN 1996.0 11.0 37109.0
                  15 TN 1996.0 12.0 37109.0
                  16 TN 2016.0 1.0 49808.0
                  17 TN 1987.0 5.0 49808.0
                  18 TN 1987.0 6.0 49808.0
                  19 TN 1987.0 7.0 49808.0
                  20 TN 1987.0 8.0 49808.0
                  21 TN 1987.0 9.0 49808.0
                  22 TN 1987.0 10.0 49808.0
                  23 TN 1987.0 11.0 49808.0
                  24 TN 1987.0 12.0 49808.0
                  25 TN 1988.0 1.0 49808.0
                  26 TN 1988.0 2.0 49808.0
                  27 TN 1988.0 3.0 49808.0
                  28 TN 1988.0 4.0 49808.0
                  29 TN 1988.0 5.0 49808.0
                  .. ... ... ... ...
                  319 TN 2013.0 7.0 49808.0
                  320 TN 2013.0 8.0 49808.0
                  321 TN 2013.0 9.0 49808.0
                  322 TN 2013.0 10.0 49808.0
                  323 TN 2013.0 11.0 49808.0
                  324 TN 2013.0 12.0 49808.0
                  325 TN 2014.0 1.0 49808.0
                  326 TN 2014.0 2.0 49808.0
                  327 TN 2014.0 3.0 49808.0
                  328 TN 2014.0 4.0 49808.0
                  329 TN 2014.0 5.0 49808.0
                  330 TN 2014.0 6.0 49808.0
                  331 TN 2014.0 7.0 49808.0
                  332 TN 2014.0 8.0 49808.0
                  333 TN 2014.0 9.0 49808.0
                  334 TN 2014.0 10.0 49808.0
                  335 TN 2014.0 11.0 49808.0
                  336 TN 2014.0 12.0 49808.0
                  337 TN 2015.0 1.0 49808.0
                  338 TN 2015.0 2.0 49808.0
                  339 TN 2015.0 3.0 49808.0
                  340 TN 2015.0 4.0 49808.0
                  341 TN 2015.0 5.0 49808.0
                  342 TN 2015.0 6.0 49808.0
                  343 TN 2015.0 7.0 49808.0
                  344 TN 2015.0 8.0 49808.0
                  345 TN 2015.0 9.0 49808.0
                  346 TN 2015.0 10.0 49808.0
                  347 TN 2015.0 11.0 49808.0
                  348 TN 2015.0 12.0 49808.0


                  Using pandas.resample



                  Another solution is index by date and then resample there:



                  df['Day'] = 1

                  df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

                  >> df1

                  State Year Month Value Day
                  date
                  1987-01-01 TN 1987.0 1.0 24410.0 1
                  1987-02-01 TN 1987.0 2.0 24410.0 1
                  1987-03-01 TN 1987.0 3.0 24410.0 1
                  1987-04-01 TN 1987.0 4.0 24410.0 1
                  1996-01-01 TN 1996.0 1.0 24410.0 1
                  1996-02-01 TN 1996.0 2.0 24410.0 1
                  1996-03-01 TN 1996.0 3.0 24410.0 1
                  1996-04-01 TN 1996.0 4.0 24410.0 1
                  1996-05-01 TN 1996.0 5.0 37109.0 1
                  1996-06-01 TN 1996.0 6.0 37109.0 1
                  1996-07-01 TN 1996.0 7.0 37109.0 1
                  1996-08-01 TN 1996.0 8.0 37109.0 1
                  1996-09-01 TN 1996.0 9.0 37109.0 1
                  1996-10-01 TN 1996.0 10.0 37109.0 1
                  1996-11-01 TN 1996.0 11.0 37109.0 1
                  1996-12-01 TN 1996.0 12.0 37109.0 1
                  2016-01-01 TN 2016.0 1.0 49808.0 1
                  2016-02-01 TN 2016.0 2.0 49808.0 1


                  Then you can resample it by month by doing:



                      res = df1.resample('M').first().ffill()

                  >> res

                  State Year Month Value Day
                  date
                  1987-01-31 TN 1987.0 1.0 24410.0 1.0
                  1987-02-28 TN 1987.0 2.0 24410.0 1.0
                  1987-03-31 TN 1987.0 3.0 24410.0 1.0
                  1987-04-30 TN 1987.0 4.0 24410.0 1.0
                  1987-05-31 TN 1987.0 4.0 24410.0 1.0
                  1987-06-30 TN 1987.0 4.0 24410.0 1.0
                  1987-07-31 TN 1987.0 4.0 24410.0 1.0
                  1987-08-31 TN 1987.0 4.0 24410.0 1.0
                  1987-09-30 TN 1987.0 4.0 24410.0 1.0
                  1987-10-31 TN 1987.0 4.0 24410.0 1.0
                  1987-11-30 TN 1987.0 4.0 24410.0 1.0
                  1987-12-31 TN 1987.0 4.0 24410.0 1.0
                  1988-01-31 TN 1987.0 4.0 24410.0 1.0
                  1988-02-29 TN 1987.0 4.0 24410.0 1.0
                  1988-03-31 TN 1987.0 4.0 24410.0 1.0
                  1988-04-30 TN 1987.0 4.0 24410.0 1.0
                  1988-05-31 TN 1987.0 4.0 24410.0 1.0
                  1988-06-30 TN 1987.0 4.0 24410.0 1.0
                  1988-07-31 TN 1987.0 4.0 24410.0 1.0
                  1988-08-31 TN 1987.0 4.0 24410.0 1.0
                  1988-09-30 TN 1987.0 4.0 24410.0 1.0
                  1988-10-31 TN 1987.0 4.0 24410.0 1.0
                  1988-11-30 TN 1987.0 4.0 24410.0 1.0
                  1988-12-31 TN 1987.0 4.0 24410.0 1.0
                  1989-01-31 TN 1987.0 4.0 24410.0 1.0
                  1989-02-28 TN 1987.0 4.0 24410.0 1.0
                  1989-03-31 TN 1987.0 4.0 24410.0 1.0
                  1989-04-30 TN 1987.0 4.0 24410.0 1.0
                  1989-05-31 TN 1987.0 4.0 24410.0 1.0
                  1989-06-30 TN 1987.0 4.0 24410.0 1.0
                  ... ... ... ... ... ...
                  2013-09-30 TN 1996.0 12.0 37109.0 1.0
                  2013-10-31 TN 1996.0 12.0 37109.0 1.0
                  2013-11-30 TN 1996.0 12.0 37109.0 1.0
                  2013-12-31 TN 1996.0 12.0 37109.0 1.0
                  2014-01-31 TN 1996.0 12.0 37109.0 1.0
                  2014-02-28 TN 1996.0 12.0 37109.0 1.0
                  2014-03-31 TN 1996.0 12.0 37109.0 1.0
                  2014-04-30 TN 1996.0 12.0 37109.0 1.0
                  2014-05-31 TN 1996.0 12.0 37109.0 1.0
                  2014-06-30 TN 1996.0 12.0 37109.0 1.0
                  2014-07-31 TN 1996.0 12.0 37109.0 1.0
                  2014-08-31 TN 1996.0 12.0 37109.0 1.0
                  2014-09-30 TN 1996.0 12.0 37109.0 1.0
                  2014-10-31 TN 1996.0 12.0 37109.0 1.0
                  2014-11-30 TN 1996.0 12.0 37109.0 1.0
                  2014-12-31 TN 1996.0 12.0 37109.0 1.0
                  2015-01-31 TN 1996.0 12.0 37109.0 1.0
                  2015-02-28 TN 1996.0 12.0 37109.0 1.0
                  2015-03-31 TN 1996.0 12.0 37109.0 1.0
                  2015-04-30 TN 1996.0 12.0 37109.0 1.0
                  2015-05-31 TN 1996.0 12.0 37109.0 1.0
                  2015-06-30 TN 1996.0 12.0 37109.0 1.0
                  2015-07-31 TN 1996.0 12.0 37109.0 1.0
                  2015-08-31 TN 1996.0 12.0 37109.0 1.0
                  2015-09-30 TN 1996.0 12.0 37109.0 1.0
                  2015-10-31 TN 1996.0 12.0 37109.0 1.0
                  2015-11-30 TN 1996.0 12.0 37109.0 1.0
                  2015-12-31 TN 1996.0 12.0 37109.0 1.0
                  2016-01-31 TN 2016.0 1.0 49808.0 1.0
                  2016-02-29 TN 2016.0 2.0 49808.0 1.0


                  You can get the original structure by doing:



                  >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1987.0 4.0 24410.0
                  5 TN 1987.0 4.0 24410.0
                  6 TN 1987.0 4.0 24410.0
                  7 TN 1987.0 4.0 24410.0
                  8 TN 1987.0 4.0 24410.0





                  share|improve this answer















                  You can create a dataframe with the missing months and them merge your result with it:



                  dates = pd.date_range(start='1/1/%d' %df['Year'].min(),
                  end='1/08/%d' %df['Year'].max(),
                  freq='MS', closed='left')

                  >> dates

                  DatetimeIndex(['1987-02-01', '1987-03-01', '1987-04-01', '1987-05-01',
                  '1987-06-01', '1987-07-01', '1987-08-01', '1987-09-01',
                  '1987-10-01', '1987-11-01',
                  ...
                  '2015-04-01', '2015-05-01', '2015-06-01', '2015-07-01',
                  '2015-08-01', '2015-09-01', '2015-10-01', '2015-11-01',
                  '2015-12-01', '2016-01-01'],
                  dtype='datetime64[ns]', length=348, freq='MS')


                  Then you can create a dataframe with all the months:



                  all_months = pd.DataFrame.from_records((dates.year, dates.month),
                  index=['Year', 'Month']).T.sort_values(by=['Year', 'Month'])


                  And then merge it with the original dataframe and forward-fill it:



                  df.merge(all_months, how='right').ffill()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1996.0 1.0 24410.0
                  5 TN 1996.0 2.0 24410.0
                  6 TN 1996.0 3.0 24410.0
                  7 TN 1996.0 4.0 24410.0
                  8 TN 1996.0 5.0 37109.0
                  9 TN 1996.0 6.0 37109.0
                  10 TN 1996.0 7.0 37109.0
                  11 TN 1996.0 8.0 37109.0
                  12 TN 1996.0 9.0 37109.0
                  13 TN 1996.0 10.0 37109.0
                  14 TN 1996.0 11.0 37109.0
                  15 TN 1996.0 12.0 37109.0
                  16 TN 2016.0 1.0 49808.0
                  17 TN 1987.0 5.0 49808.0
                  18 TN 1987.0 6.0 49808.0
                  19 TN 1987.0 7.0 49808.0
                  20 TN 1987.0 8.0 49808.0
                  21 TN 1987.0 9.0 49808.0
                  22 TN 1987.0 10.0 49808.0
                  23 TN 1987.0 11.0 49808.0
                  24 TN 1987.0 12.0 49808.0
                  25 TN 1988.0 1.0 49808.0
                  26 TN 1988.0 2.0 49808.0
                  27 TN 1988.0 3.0 49808.0
                  28 TN 1988.0 4.0 49808.0
                  29 TN 1988.0 5.0 49808.0
                  .. ... ... ... ...
                  319 TN 2013.0 7.0 49808.0
                  320 TN 2013.0 8.0 49808.0
                  321 TN 2013.0 9.0 49808.0
                  322 TN 2013.0 10.0 49808.0
                  323 TN 2013.0 11.0 49808.0
                  324 TN 2013.0 12.0 49808.0
                  325 TN 2014.0 1.0 49808.0
                  326 TN 2014.0 2.0 49808.0
                  327 TN 2014.0 3.0 49808.0
                  328 TN 2014.0 4.0 49808.0
                  329 TN 2014.0 5.0 49808.0
                  330 TN 2014.0 6.0 49808.0
                  331 TN 2014.0 7.0 49808.0
                  332 TN 2014.0 8.0 49808.0
                  333 TN 2014.0 9.0 49808.0
                  334 TN 2014.0 10.0 49808.0
                  335 TN 2014.0 11.0 49808.0
                  336 TN 2014.0 12.0 49808.0
                  337 TN 2015.0 1.0 49808.0
                  338 TN 2015.0 2.0 49808.0
                  339 TN 2015.0 3.0 49808.0
                  340 TN 2015.0 4.0 49808.0
                  341 TN 2015.0 5.0 49808.0
                  342 TN 2015.0 6.0 49808.0
                  343 TN 2015.0 7.0 49808.0
                  344 TN 2015.0 8.0 49808.0
                  345 TN 2015.0 9.0 49808.0
                  346 TN 2015.0 10.0 49808.0
                  347 TN 2015.0 11.0 49808.0
                  348 TN 2015.0 12.0 49808.0


                  Using pandas.resample



                  Another solution is index by date and then resample there:



                  df['Day'] = 1

                  df1 = df.assign(date= lambda x:pd.to_datetime(x[['Year', 'Month', 'Day']])).set_index('date')

                  >> df1

                  State Year Month Value Day
                  date
                  1987-01-01 TN 1987.0 1.0 24410.0 1
                  1987-02-01 TN 1987.0 2.0 24410.0 1
                  1987-03-01 TN 1987.0 3.0 24410.0 1
                  1987-04-01 TN 1987.0 4.0 24410.0 1
                  1996-01-01 TN 1996.0 1.0 24410.0 1
                  1996-02-01 TN 1996.0 2.0 24410.0 1
                  1996-03-01 TN 1996.0 3.0 24410.0 1
                  1996-04-01 TN 1996.0 4.0 24410.0 1
                  1996-05-01 TN 1996.0 5.0 37109.0 1
                  1996-06-01 TN 1996.0 6.0 37109.0 1
                  1996-07-01 TN 1996.0 7.0 37109.0 1
                  1996-08-01 TN 1996.0 8.0 37109.0 1
                  1996-09-01 TN 1996.0 9.0 37109.0 1
                  1996-10-01 TN 1996.0 10.0 37109.0 1
                  1996-11-01 TN 1996.0 11.0 37109.0 1
                  1996-12-01 TN 1996.0 12.0 37109.0 1
                  2016-01-01 TN 2016.0 1.0 49808.0 1
                  2016-02-01 TN 2016.0 2.0 49808.0 1


                  Then you can resample it by month by doing:



                      res = df1.resample('M').first().ffill()

                  >> res

                  State Year Month Value Day
                  date
                  1987-01-31 TN 1987.0 1.0 24410.0 1.0
                  1987-02-28 TN 1987.0 2.0 24410.0 1.0
                  1987-03-31 TN 1987.0 3.0 24410.0 1.0
                  1987-04-30 TN 1987.0 4.0 24410.0 1.0
                  1987-05-31 TN 1987.0 4.0 24410.0 1.0
                  1987-06-30 TN 1987.0 4.0 24410.0 1.0
                  1987-07-31 TN 1987.0 4.0 24410.0 1.0
                  1987-08-31 TN 1987.0 4.0 24410.0 1.0
                  1987-09-30 TN 1987.0 4.0 24410.0 1.0
                  1987-10-31 TN 1987.0 4.0 24410.0 1.0
                  1987-11-30 TN 1987.0 4.0 24410.0 1.0
                  1987-12-31 TN 1987.0 4.0 24410.0 1.0
                  1988-01-31 TN 1987.0 4.0 24410.0 1.0
                  1988-02-29 TN 1987.0 4.0 24410.0 1.0
                  1988-03-31 TN 1987.0 4.0 24410.0 1.0
                  1988-04-30 TN 1987.0 4.0 24410.0 1.0
                  1988-05-31 TN 1987.0 4.0 24410.0 1.0
                  1988-06-30 TN 1987.0 4.0 24410.0 1.0
                  1988-07-31 TN 1987.0 4.0 24410.0 1.0
                  1988-08-31 TN 1987.0 4.0 24410.0 1.0
                  1988-09-30 TN 1987.0 4.0 24410.0 1.0
                  1988-10-31 TN 1987.0 4.0 24410.0 1.0
                  1988-11-30 TN 1987.0 4.0 24410.0 1.0
                  1988-12-31 TN 1987.0 4.0 24410.0 1.0
                  1989-01-31 TN 1987.0 4.0 24410.0 1.0
                  1989-02-28 TN 1987.0 4.0 24410.0 1.0
                  1989-03-31 TN 1987.0 4.0 24410.0 1.0
                  1989-04-30 TN 1987.0 4.0 24410.0 1.0
                  1989-05-31 TN 1987.0 4.0 24410.0 1.0
                  1989-06-30 TN 1987.0 4.0 24410.0 1.0
                  ... ... ... ... ... ...
                  2013-09-30 TN 1996.0 12.0 37109.0 1.0
                  2013-10-31 TN 1996.0 12.0 37109.0 1.0
                  2013-11-30 TN 1996.0 12.0 37109.0 1.0
                  2013-12-31 TN 1996.0 12.0 37109.0 1.0
                  2014-01-31 TN 1996.0 12.0 37109.0 1.0
                  2014-02-28 TN 1996.0 12.0 37109.0 1.0
                  2014-03-31 TN 1996.0 12.0 37109.0 1.0
                  2014-04-30 TN 1996.0 12.0 37109.0 1.0
                  2014-05-31 TN 1996.0 12.0 37109.0 1.0
                  2014-06-30 TN 1996.0 12.0 37109.0 1.0
                  2014-07-31 TN 1996.0 12.0 37109.0 1.0
                  2014-08-31 TN 1996.0 12.0 37109.0 1.0
                  2014-09-30 TN 1996.0 12.0 37109.0 1.0
                  2014-10-31 TN 1996.0 12.0 37109.0 1.0
                  2014-11-30 TN 1996.0 12.0 37109.0 1.0
                  2014-12-31 TN 1996.0 12.0 37109.0 1.0
                  2015-01-31 TN 1996.0 12.0 37109.0 1.0
                  2015-02-28 TN 1996.0 12.0 37109.0 1.0
                  2015-03-31 TN 1996.0 12.0 37109.0 1.0
                  2015-04-30 TN 1996.0 12.0 37109.0 1.0
                  2015-05-31 TN 1996.0 12.0 37109.0 1.0
                  2015-06-30 TN 1996.0 12.0 37109.0 1.0
                  2015-07-31 TN 1996.0 12.0 37109.0 1.0
                  2015-08-31 TN 1996.0 12.0 37109.0 1.0
                  2015-09-30 TN 1996.0 12.0 37109.0 1.0
                  2015-10-31 TN 1996.0 12.0 37109.0 1.0
                  2015-11-30 TN 1996.0 12.0 37109.0 1.0
                  2015-12-31 TN 1996.0 12.0 37109.0 1.0
                  2016-01-31 TN 2016.0 1.0 49808.0 1.0
                  2016-02-29 TN 2016.0 2.0 49808.0 1.0


                  You can get the original structure by doing:



                  >> res.reset_index(drop=True).drop(['Day'], axis=1).head()

                  State Year Month Value
                  0 TN 1987.0 1.0 24410.0
                  1 TN 1987.0 2.0 24410.0
                  2 TN 1987.0 3.0 24410.0
                  3 TN 1987.0 4.0 24410.0
                  4 TN 1987.0 4.0 24410.0
                  5 TN 1987.0 4.0 24410.0
                  6 TN 1987.0 4.0 24410.0
                  7 TN 1987.0 4.0 24410.0
                  8 TN 1987.0 4.0 24410.0






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 13 '18 at 18:04

























                  answered Nov 13 '18 at 17:51









                  Mabel VillalbaMabel Villalba

                  1,485214




                  1,485214






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286584%2ffilling-in-past-and-future-data-from-partial-data-in-python%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Xamarin.iOS Cant Deploy on Iphone

                      Glorious Revolution

                      Dulmage-Mendelsohn matrix decomposition in Python