Pandas create new column based on first unique values of existing column












2















I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.



import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
df

a b
0 1 3
1 2 4
2 3 3
3 4 4
4 5 5


Goal:



    a   b   c
0 1 3 3
1 2 4 4
2 3 3 nan
3 4 4 nan
4 5 5 5


I've tried:



df['c'] = np.where(df['b'].unique(), df['b'], np.nan)


It throws: operands could not be broadcast together with shapes (3,) (5,) ()










share|improve this question





























    2















    I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.



    import pandas as pd
    import numpy as np

    df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
    df

    a b
    0 1 3
    1 2 4
    2 3 3
    3 4 4
    4 5 5


    Goal:



        a   b   c
    0 1 3 3
    1 2 4 4
    2 3 3 nan
    3 4 4 nan
    4 5 5 5


    I've tried:



    df['c'] = np.where(df['b'].unique(), df['b'], np.nan)


    It throws: operands could not be broadcast together with shapes (3,) (5,) ()










    share|improve this question



























      2












      2








      2


      2






      I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.



      import pandas as pd
      import numpy as np

      df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
      df

      a b
      0 1 3
      1 2 4
      2 3 3
      3 4 4
      4 5 5


      Goal:



          a   b   c
      0 1 3 3
      1 2 4 4
      2 3 3 nan
      3 4 4 nan
      4 5 5 5


      I've tried:



      df['c'] = np.where(df['b'].unique(), df['b'], np.nan)


      It throws: operands could not be broadcast together with shapes (3,) (5,) ()










      share|improve this question
















      I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.



      import pandas as pd
      import numpy as np

      df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
      df

      a b
      0 1 3
      1 2 4
      2 3 3
      3 4 4
      4 5 5


      Goal:



          a   b   c
      0 1 3 3
      1 2 4 4
      2 3 3 nan
      3 4 4 nan
      4 5 5 5


      I've tried:



      df['c'] = np.where(df['b'].unique(), df['b'], np.nan)


      It throws: operands could not be broadcast together with shapes (3,) (5,) ()







      python python-3.x pandas numpy unique






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 17:43









      jpp

      101k2162111




      101k2162111










      asked Nov 14 '18 at 17:36









      Derek_PDerek_P

      328215




      328215
























          3 Answers
          3






          active

          oldest

          votes


















          3















          mask + duplicated



          You can use Pandas methods for masking a series:



          df['c'] = df['b'].mask(df['b'].duplicated())

          print(df)

          a b c
          0 1 3 3.0
          1 2 4 4.0
          2 3 3 NaN
          3 4 4 NaN
          4 5 5 5.0





          share|improve this answer































            2














            Use duplicated with np.where:



            df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])


            Or:



            df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)




            print(df)
            a b c
            0 1 3 3.0
            1 2 4 4.0
            2 3 3 NaN
            3 4 4 NaN
            4 5 5 5.0





            share|improve this answer

































              0














              ppg wrote:



              df['c'] = df['b'].mask(df['b'].duplicated())

              print(df)

              a b c
              0 1 3 3.0
              1 2 4 4.0
              2 3 3 NaN
              3 4 4 NaN
              4 5 5 5.0


              I like the code, but the last column should also give NaN



                  0  1  3  3.0
              1 2 4 4.0
              2 3 3 NaN
              3 4 4 NaN
              4 5 5 NaN





              share|improve this answer
























              • I don't understand your answer / point. Can you explain further?

                – jpp
                Jan 13 at 14:12











              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305886%2fpandas-create-new-column-based-on-first-unique-values-of-existing-column%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3















              mask + duplicated



              You can use Pandas methods for masking a series:



              df['c'] = df['b'].mask(df['b'].duplicated())

              print(df)

              a b c
              0 1 3 3.0
              1 2 4 4.0
              2 3 3 NaN
              3 4 4 NaN
              4 5 5 5.0





              share|improve this answer




























                3















                mask + duplicated



                You can use Pandas methods for masking a series:



                df['c'] = df['b'].mask(df['b'].duplicated())

                print(df)

                a b c
                0 1 3 3.0
                1 2 4 4.0
                2 3 3 NaN
                3 4 4 NaN
                4 5 5 5.0





                share|improve this answer


























                  3












                  3








                  3








                  mask + duplicated



                  You can use Pandas methods for masking a series:



                  df['c'] = df['b'].mask(df['b'].duplicated())

                  print(df)

                  a b c
                  0 1 3 3.0
                  1 2 4 4.0
                  2 3 3 NaN
                  3 4 4 NaN
                  4 5 5 5.0





                  share|improve this answer














                  mask + duplicated



                  You can use Pandas methods for masking a series:



                  df['c'] = df['b'].mask(df['b'].duplicated())

                  print(df)

                  a b c
                  0 1 3 3.0
                  1 2 4 4.0
                  2 3 3 NaN
                  3 4 4 NaN
                  4 5 5 5.0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 14 '18 at 17:42









                  jppjpp

                  101k2162111




                  101k2162111

























                      2














                      Use duplicated with np.where:



                      df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])


                      Or:



                      df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)




                      print(df)
                      a b c
                      0 1 3 3.0
                      1 2 4 4.0
                      2 3 3 NaN
                      3 4 4 NaN
                      4 5 5 5.0





                      share|improve this answer






























                        2














                        Use duplicated with np.where:



                        df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])


                        Or:



                        df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)




                        print(df)
                        a b c
                        0 1 3 3.0
                        1 2 4 4.0
                        2 3 3 NaN
                        3 4 4 NaN
                        4 5 5 5.0





                        share|improve this answer




























                          2












                          2








                          2







                          Use duplicated with np.where:



                          df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])


                          Or:



                          df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)




                          print(df)
                          a b c
                          0 1 3 3.0
                          1 2 4 4.0
                          2 3 3 NaN
                          3 4 4 NaN
                          4 5 5 5.0





                          share|improve this answer















                          Use duplicated with np.where:



                          df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])


                          Or:



                          df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)




                          print(df)
                          a b c
                          0 1 3 3.0
                          1 2 4 4.0
                          2 3 3 NaN
                          3 4 4 NaN
                          4 5 5 5.0






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 14 '18 at 17:48

























                          answered Nov 14 '18 at 17:43









                          Sandeep KadapaSandeep Kadapa

                          7,098830




                          7,098830























                              0














                              ppg wrote:



                              df['c'] = df['b'].mask(df['b'].duplicated())

                              print(df)

                              a b c
                              0 1 3 3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 5.0


                              I like the code, but the last column should also give NaN



                                  0  1  3  3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 NaN





                              share|improve this answer
























                              • I don't understand your answer / point. Can you explain further?

                                – jpp
                                Jan 13 at 14:12
















                              0














                              ppg wrote:



                              df['c'] = df['b'].mask(df['b'].duplicated())

                              print(df)

                              a b c
                              0 1 3 3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 5.0


                              I like the code, but the last column should also give NaN



                                  0  1  3  3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 NaN





                              share|improve this answer
























                              • I don't understand your answer / point. Can you explain further?

                                – jpp
                                Jan 13 at 14:12














                              0












                              0








                              0







                              ppg wrote:



                              df['c'] = df['b'].mask(df['b'].duplicated())

                              print(df)

                              a b c
                              0 1 3 3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 5.0


                              I like the code, but the last column should also give NaN



                                  0  1  3  3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 NaN





                              share|improve this answer













                              ppg wrote:



                              df['c'] = df['b'].mask(df['b'].duplicated())

                              print(df)

                              a b c
                              0 1 3 3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 5.0


                              I like the code, but the last column should also give NaN



                                  0  1  3  3.0
                              1 2 4 4.0
                              2 3 3 NaN
                              3 4 4 NaN
                              4 5 5 NaN






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 14 '18 at 18:03









                              Michael G.Michael G.

                              2231316




                              2231316













                              • I don't understand your answer / point. Can you explain further?

                                – jpp
                                Jan 13 at 14:12



















                              • I don't understand your answer / point. Can you explain further?

                                – jpp
                                Jan 13 at 14:12

















                              I don't understand your answer / point. Can you explain further?

                              – jpp
                              Jan 13 at 14:12





                              I don't understand your answer / point. Can you explain further?

                              – jpp
                              Jan 13 at 14:12


















                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53305886%2fpandas-create-new-column-based-on-first-unique-values-of-existing-column%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Xamarin.iOS Cant Deploy on Iphone

                              Glorious Revolution

                              Dulmage-Mendelsohn matrix decomposition in Python