Pandas Group then Shift Column and keep last row












3














I want to group column idx then shift column val and keep the last row with idx.



import pandas as pd

df = pd.DataFrame({'idx':['a','a','b','b'],
'val':['a1','a2','b1','b2']})

df

idx val
0 a a1
1 a a2
2 b b1
3 b b2


I tried df['val_shift'] = df.groupby('idx').val.shift(1)



   idx  val val_shift
0 a a1 NaN
1 a a2 a1
2 b b1 NaN
3 b b2 b1


But I want.



   idx  val
0 a NaN
1 a a1
2 a a2
3 b NaN
4 b b1
5 b b2


Are there any way to get this?










share|improve this question



























    3














    I want to group column idx then shift column val and keep the last row with idx.



    import pandas as pd

    df = pd.DataFrame({'idx':['a','a','b','b'],
    'val':['a1','a2','b1','b2']})

    df

    idx val
    0 a a1
    1 a a2
    2 b b1
    3 b b2


    I tried df['val_shift'] = df.groupby('idx').val.shift(1)



       idx  val val_shift
    0 a a1 NaN
    1 a a2 a1
    2 b b1 NaN
    3 b b2 b1


    But I want.



       idx  val
    0 a NaN
    1 a a1
    2 a a2
    3 b NaN
    4 b b1
    5 b b2


    Are there any way to get this?










    share|improve this question

























      3












      3








      3







      I want to group column idx then shift column val and keep the last row with idx.



      import pandas as pd

      df = pd.DataFrame({'idx':['a','a','b','b'],
      'val':['a1','a2','b1','b2']})

      df

      idx val
      0 a a1
      1 a a2
      2 b b1
      3 b b2


      I tried df['val_shift'] = df.groupby('idx').val.shift(1)



         idx  val val_shift
      0 a a1 NaN
      1 a a2 a1
      2 b b1 NaN
      3 b b2 b1


      But I want.



         idx  val
      0 a NaN
      1 a a1
      2 a a2
      3 b NaN
      4 b b1
      5 b b2


      Are there any way to get this?










      share|improve this question













      I want to group column idx then shift column val and keep the last row with idx.



      import pandas as pd

      df = pd.DataFrame({'idx':['a','a','b','b'],
      'val':['a1','a2','b1','b2']})

      df

      idx val
      0 a a1
      1 a a2
      2 b b1
      3 b b2


      I tried df['val_shift'] = df.groupby('idx').val.shift(1)



         idx  val val_shift
      0 a a1 NaN
      1 a a2 a1
      2 b b1 NaN
      3 b b2 b1


      But I want.



         idx  val
      0 a NaN
      1 a a1
      2 a a2
      3 b NaN
      4 b b1
      5 b b2


      Are there any way to get this?







      pandas dataframe






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 at 13:29









      yolox

      438




      438
























          3 Answers
          3






          active

          oldest

          votes


















          4














          I believe you need concat last rows extracted by drop_duplicates with change index values for correct ordering first, because shift always remove last value here:



          df1 = df.drop_duplicates('idx', keep='last')
          df1.index += .5

          df = pd.concat([df, df1]).sort_index().reset_index(drop=True)


          Alternative solution:



          df = df.drop_duplicates('idx', keep='last').append(df).sort_index().reset_index(drop=True)

          df['val_shift'] = df.groupby('idx').val.shift(1)
          print (df)
          idx val val_shift
          0 a a1 NaN
          1 a a2 a1
          2 a a2 a2
          3 b b1 NaN
          4 b b2 b1
          5 b b2 b2


          If want remove val after shift use pop with syntactic sugar - grouping by Series df['idx']:



          df['val_shift']  = df.pop('val').groupby(df['idx']).shift(1)
          print (df)
          idx val_shift
          0 a NaN
          1 a a1
          2 a a2
          3 b NaN
          4 b b1
          5 b b2





          share|improve this answer























          • Thank for the good answer but I have some question. Does the last way give the fastest result? I can't test it with %timeit.
            – yolox
            Nov 12 at 13:57






          • 1




            @yolox - tested too, you are right so necessary testing functions, give me some time.
            – jezrael
            Nov 12 at 14:04






          • 1




            Tested, second solution is faster.
            – jezrael
            Nov 12 at 14:08










          • Thank you so much!
            – yolox
            Nov 12 at 14:09





















          4














          It looks to me like you're just shoving an empty dataframe infront of each group where only 'idx' is populated.



          pd.concat([
          d[['idx']].head(1).append(d)
          for _, d in df.groupby('idx')
          ], ignore_index=True)

          idx val
          0 a NaN
          1 a a1
          2 a a2
          3 b NaN
          4 b b1
          5 b b2


          Alternative



          df[['idx']].drop_duplicates('idx').append(df).sort_values('idx').reset_index(drop=True)





          share|improve this answer































            3














            Using concat with tail



            newdf=pd.concat([df,df.groupby('idx').tail(1)])
            newdf=newdf.assign(val=newdf.groupby('idx').shift()).sort_index()
            newdf
            Out[885]:
            idx val
            0 a NaN
            1 a a1
            1 a a2
            2 b NaN
            3 b b1
            3 b b2





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263226%2fpandas-group-then-shift-column-and-keep-last-row%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              4














              I believe you need concat last rows extracted by drop_duplicates with change index values for correct ordering first, because shift always remove last value here:



              df1 = df.drop_duplicates('idx', keep='last')
              df1.index += .5

              df = pd.concat([df, df1]).sort_index().reset_index(drop=True)


              Alternative solution:



              df = df.drop_duplicates('idx', keep='last').append(df).sort_index().reset_index(drop=True)

              df['val_shift'] = df.groupby('idx').val.shift(1)
              print (df)
              idx val val_shift
              0 a a1 NaN
              1 a a2 a1
              2 a a2 a2
              3 b b1 NaN
              4 b b2 b1
              5 b b2 b2


              If want remove val after shift use pop with syntactic sugar - grouping by Series df['idx']:



              df['val_shift']  = df.pop('val').groupby(df['idx']).shift(1)
              print (df)
              idx val_shift
              0 a NaN
              1 a a1
              2 a a2
              3 b NaN
              4 b b1
              5 b b2





              share|improve this answer























              • Thank for the good answer but I have some question. Does the last way give the fastest result? I can't test it with %timeit.
                – yolox
                Nov 12 at 13:57






              • 1




                @yolox - tested too, you are right so necessary testing functions, give me some time.
                – jezrael
                Nov 12 at 14:04






              • 1




                Tested, second solution is faster.
                – jezrael
                Nov 12 at 14:08










              • Thank you so much!
                – yolox
                Nov 12 at 14:09


















              4














              I believe you need concat last rows extracted by drop_duplicates with change index values for correct ordering first, because shift always remove last value here:



              df1 = df.drop_duplicates('idx', keep='last')
              df1.index += .5

              df = pd.concat([df, df1]).sort_index().reset_index(drop=True)


              Alternative solution:



              df = df.drop_duplicates('idx', keep='last').append(df).sort_index().reset_index(drop=True)

              df['val_shift'] = df.groupby('idx').val.shift(1)
              print (df)
              idx val val_shift
              0 a a1 NaN
              1 a a2 a1
              2 a a2 a2
              3 b b1 NaN
              4 b b2 b1
              5 b b2 b2


              If want remove val after shift use pop with syntactic sugar - grouping by Series df['idx']:



              df['val_shift']  = df.pop('val').groupby(df['idx']).shift(1)
              print (df)
              idx val_shift
              0 a NaN
              1 a a1
              2 a a2
              3 b NaN
              4 b b1
              5 b b2





              share|improve this answer























              • Thank for the good answer but I have some question. Does the last way give the fastest result? I can't test it with %timeit.
                – yolox
                Nov 12 at 13:57






              • 1




                @yolox - tested too, you are right so necessary testing functions, give me some time.
                – jezrael
                Nov 12 at 14:04






              • 1




                Tested, second solution is faster.
                – jezrael
                Nov 12 at 14:08










              • Thank you so much!
                – yolox
                Nov 12 at 14:09
















              4












              4








              4






              I believe you need concat last rows extracted by drop_duplicates with change index values for correct ordering first, because shift always remove last value here:



              df1 = df.drop_duplicates('idx', keep='last')
              df1.index += .5

              df = pd.concat([df, df1]).sort_index().reset_index(drop=True)


              Alternative solution:



              df = df.drop_duplicates('idx', keep='last').append(df).sort_index().reset_index(drop=True)

              df['val_shift'] = df.groupby('idx').val.shift(1)
              print (df)
              idx val val_shift
              0 a a1 NaN
              1 a a2 a1
              2 a a2 a2
              3 b b1 NaN
              4 b b2 b1
              5 b b2 b2


              If want remove val after shift use pop with syntactic sugar - grouping by Series df['idx']:



              df['val_shift']  = df.pop('val').groupby(df['idx']).shift(1)
              print (df)
              idx val_shift
              0 a NaN
              1 a a1
              2 a a2
              3 b NaN
              4 b b1
              5 b b2





              share|improve this answer














              I believe you need concat last rows extracted by drop_duplicates with change index values for correct ordering first, because shift always remove last value here:



              df1 = df.drop_duplicates('idx', keep='last')
              df1.index += .5

              df = pd.concat([df, df1]).sort_index().reset_index(drop=True)


              Alternative solution:



              df = df.drop_duplicates('idx', keep='last').append(df).sort_index().reset_index(drop=True)

              df['val_shift'] = df.groupby('idx').val.shift(1)
              print (df)
              idx val val_shift
              0 a a1 NaN
              1 a a2 a1
              2 a a2 a2
              3 b b1 NaN
              4 b b2 b1
              5 b b2 b2


              If want remove val after shift use pop with syntactic sugar - grouping by Series df['idx']:



              df['val_shift']  = df.pop('val').groupby(df['idx']).shift(1)
              print (df)
              idx val_shift
              0 a NaN
              1 a a1
              2 a a2
              3 b NaN
              4 b b1
              5 b b2






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 12 at 13:46

























              answered Nov 12 at 13:35









              jezrael

              318k22258336




              318k22258336












              • Thank for the good answer but I have some question. Does the last way give the fastest result? I can't test it with %timeit.
                – yolox
                Nov 12 at 13:57






              • 1




                @yolox - tested too, you are right so necessary testing functions, give me some time.
                – jezrael
                Nov 12 at 14:04






              • 1




                Tested, second solution is faster.
                – jezrael
                Nov 12 at 14:08










              • Thank you so much!
                – yolox
                Nov 12 at 14:09




















              • Thank for the good answer but I have some question. Does the last way give the fastest result? I can't test it with %timeit.
                – yolox
                Nov 12 at 13:57






              • 1




                @yolox - tested too, you are right so necessary testing functions, give me some time.
                – jezrael
                Nov 12 at 14:04






              • 1




                Tested, second solution is faster.
                – jezrael
                Nov 12 at 14:08










              • Thank you so much!
                – yolox
                Nov 12 at 14:09


















              Thank for the good answer but I have some question. Does the last way give the fastest result? I can't test it with %timeit.
              – yolox
              Nov 12 at 13:57




              Thank for the good answer but I have some question. Does the last way give the fastest result? I can't test it with %timeit.
              – yolox
              Nov 12 at 13:57




              1




              1




              @yolox - tested too, you are right so necessary testing functions, give me some time.
              – jezrael
              Nov 12 at 14:04




              @yolox - tested too, you are right so necessary testing functions, give me some time.
              – jezrael
              Nov 12 at 14:04




              1




              1




              Tested, second solution is faster.
              – jezrael
              Nov 12 at 14:08




              Tested, second solution is faster.
              – jezrael
              Nov 12 at 14:08












              Thank you so much!
              – yolox
              Nov 12 at 14:09






              Thank you so much!
              – yolox
              Nov 12 at 14:09















              4














              It looks to me like you're just shoving an empty dataframe infront of each group where only 'idx' is populated.



              pd.concat([
              d[['idx']].head(1).append(d)
              for _, d in df.groupby('idx')
              ], ignore_index=True)

              idx val
              0 a NaN
              1 a a1
              2 a a2
              3 b NaN
              4 b b1
              5 b b2


              Alternative



              df[['idx']].drop_duplicates('idx').append(df).sort_values('idx').reset_index(drop=True)





              share|improve this answer




























                4














                It looks to me like you're just shoving an empty dataframe infront of each group where only 'idx' is populated.



                pd.concat([
                d[['idx']].head(1).append(d)
                for _, d in df.groupby('idx')
                ], ignore_index=True)

                idx val
                0 a NaN
                1 a a1
                2 a a2
                3 b NaN
                4 b b1
                5 b b2


                Alternative



                df[['idx']].drop_duplicates('idx').append(df).sort_values('idx').reset_index(drop=True)





                share|improve this answer


























                  4












                  4








                  4






                  It looks to me like you're just shoving an empty dataframe infront of each group where only 'idx' is populated.



                  pd.concat([
                  d[['idx']].head(1).append(d)
                  for _, d in df.groupby('idx')
                  ], ignore_index=True)

                  idx val
                  0 a NaN
                  1 a a1
                  2 a a2
                  3 b NaN
                  4 b b1
                  5 b b2


                  Alternative



                  df[['idx']].drop_duplicates('idx').append(df).sort_values('idx').reset_index(drop=True)





                  share|improve this answer














                  It looks to me like you're just shoving an empty dataframe infront of each group where only 'idx' is populated.



                  pd.concat([
                  d[['idx']].head(1).append(d)
                  for _, d in df.groupby('idx')
                  ], ignore_index=True)

                  idx val
                  0 a NaN
                  1 a a1
                  2 a a2
                  3 b NaN
                  4 b b1
                  5 b b2


                  Alternative



                  df[['idx']].drop_duplicates('idx').append(df).sort_values('idx').reset_index(drop=True)






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 12 at 13:42

























                  answered Nov 12 at 13:37









                  piRSquared

                  151k22142284




                  151k22142284























                      3














                      Using concat with tail



                      newdf=pd.concat([df,df.groupby('idx').tail(1)])
                      newdf=newdf.assign(val=newdf.groupby('idx').shift()).sort_index()
                      newdf
                      Out[885]:
                      idx val
                      0 a NaN
                      1 a a1
                      1 a a2
                      2 b NaN
                      3 b b1
                      3 b b2





                      share|improve this answer


























                        3














                        Using concat with tail



                        newdf=pd.concat([df,df.groupby('idx').tail(1)])
                        newdf=newdf.assign(val=newdf.groupby('idx').shift()).sort_index()
                        newdf
                        Out[885]:
                        idx val
                        0 a NaN
                        1 a a1
                        1 a a2
                        2 b NaN
                        3 b b1
                        3 b b2





                        share|improve this answer
























                          3












                          3








                          3






                          Using concat with tail



                          newdf=pd.concat([df,df.groupby('idx').tail(1)])
                          newdf=newdf.assign(val=newdf.groupby('idx').shift()).sort_index()
                          newdf
                          Out[885]:
                          idx val
                          0 a NaN
                          1 a a1
                          1 a a2
                          2 b NaN
                          3 b b1
                          3 b b2





                          share|improve this answer












                          Using concat with tail



                          newdf=pd.concat([df,df.groupby('idx').tail(1)])
                          newdf=newdf.assign(val=newdf.groupby('idx').shift()).sort_index()
                          newdf
                          Out[885]:
                          idx val
                          0 a NaN
                          1 a a1
                          1 a a2
                          2 b NaN
                          3 b b1
                          3 b b2






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 12 at 13:59









                          W-B

                          99.5k73163




                          99.5k73163






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263226%2fpandas-group-then-shift-column-and-keep-last-row%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Xamarin.iOS Cant Deploy on Iphone

                              Glorious Revolution

                              Dulmage-Mendelsohn matrix decomposition in Python