Loop through each row value and return column name











up vote
1
down vote

favorite












I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.



Location        House      car    Toys              haves
x 1 1 3 House, Car
y 2 1 1 Car, toys









share|improve this question




















  • 1




    Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
    – Jon Clements
    Nov 10 at 16:01

















up vote
1
down vote

favorite












I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.



Location        House      car    Toys              haves
x 1 1 3 House, Car
y 2 1 1 Car, toys









share|improve this question




















  • 1




    Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
    – Jon Clements
    Nov 10 at 16:01















up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.



Location        House      car    Toys              haves
x 1 1 3 House, Car
y 2 1 1 Car, toys









share|improve this question















I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.



Location        House      car    Toys              haves
x 1 1 3 House, Car
y 2 1 1 Car, toys






python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 19:58









Ayxan

89614




89614










asked Nov 10 at 15:57









UJAY

273




273








  • 1




    Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
    – Jon Clements
    Nov 10 at 16:01
















  • 1




    Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
    – Jon Clements
    Nov 10 at 16:01










1




1




Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements
Nov 10 at 16:01






Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements
Nov 10 at 16:01














3 Answers
3






active

oldest

votes

















up vote
1
down vote



accepted










First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important



df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
#solution with omiting first column
#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
print (df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys


Details:



print (df.eq(1))
Location House car Toys
0 False True True False
1 False False True True

print (df.eq(1).dot(df.columns + ', '))
0 House, car,
1 car, Toys,
dtype: object


Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:



#2k rows
df = pd.concat([df] * 1000, ignore_index=True)

In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#working if no missing values
In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#jpp answer
In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

#Naga Kiran removed answer
In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)





share|improve this answer























  • Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
    – jpp
    Nov 10 at 18:57












  • @jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
    – jezrael
    Nov 10 at 19:02






  • 1




    Worked! Thanks.
    – UJAY
    Nov 10 at 20:29










  • @UJAY - You are welcome!
    – jezrael
    Nov 10 at 20:30


















up vote
0
down vote













Assuming you need to create the haves series, you can use a list comprehension:



df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

print(df)

Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys


I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.






share|improve this answer




























    up vote
    0
    down vote













    Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.



    import numpy as np

    # numpy array of dataframe column names
    cols = np.array(df.columns)
    # boolean array to mark where dataframe values equal 1
    b = (df.values == 1)
    # list comprehension to join column names for each boolean row result
    df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240700%2floop-through-each-row-value-and-return-column-name%23new-answer', 'question_page');
      }
      );

      Post as a guest
































      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote



      accepted










      First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important



      df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      #solution with omiting first column
      #df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
      print (df)
      Location House car Toys haves
      0 x 1 1 3 House, car
      1 y 2 1 1 car, Toys


      Details:



      print (df.eq(1))
      Location House car Toys
      0 False True True False
      1 False False True True

      print (df.eq(1).dot(df.columns + ', '))
      0 House, car,
      1 car, Toys,
      dtype: object


      Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:



      #2k rows
      df = pd.concat([df] * 1000, ignore_index=True)

      In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #working if no missing values
      In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
      2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #jpp answer
      In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
      86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

      #Naga Kiran removed answer
      In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
      813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)





      share|improve this answer























      • Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
        – jpp
        Nov 10 at 18:57












      • @jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
        – jezrael
        Nov 10 at 19:02






      • 1




        Worked! Thanks.
        – UJAY
        Nov 10 at 20:29










      • @UJAY - You are welcome!
        – jezrael
        Nov 10 at 20:30















      up vote
      1
      down vote



      accepted










      First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important



      df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      #solution with omiting first column
      #df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
      print (df)
      Location House car Toys haves
      0 x 1 1 3 House, car
      1 y 2 1 1 car, Toys


      Details:



      print (df.eq(1))
      Location House car Toys
      0 False True True False
      1 False False True True

      print (df.eq(1).dot(df.columns + ', '))
      0 House, car,
      1 car, Toys,
      dtype: object


      Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:



      #2k rows
      df = pd.concat([df] * 1000, ignore_index=True)

      In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #working if no missing values
      In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
      2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #jpp answer
      In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
      86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

      #Naga Kiran removed answer
      In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
      813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)





      share|improve this answer























      • Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
        – jpp
        Nov 10 at 18:57












      • @jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
        – jezrael
        Nov 10 at 19:02






      • 1




        Worked! Thanks.
        – UJAY
        Nov 10 at 20:29










      • @UJAY - You are welcome!
        – jezrael
        Nov 10 at 20:30













      up vote
      1
      down vote



      accepted







      up vote
      1
      down vote



      accepted






      First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important



      df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      #solution with omiting first column
      #df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
      print (df)
      Location House car Toys haves
      0 x 1 1 3 House, car
      1 y 2 1 1 car, Toys


      Details:



      print (df.eq(1))
      Location House car Toys
      0 False True True False
      1 False False True True

      print (df.eq(1).dot(df.columns + ', '))
      0 House, car,
      1 car, Toys,
      dtype: object


      Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:



      #2k rows
      df = pd.concat([df] * 1000, ignore_index=True)

      In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #working if no missing values
      In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
      2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #jpp answer
      In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
      86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

      #Naga Kiran removed answer
      In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
      813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)





      share|improve this answer














      First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important



      df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      #solution with omiting first column
      #df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
      print (df)
      Location House car Toys haves
      0 x 1 1 3 House, car
      1 y 2 1 1 car, Toys


      Details:



      print (df.eq(1))
      Location House car Toys
      0 False True True False
      1 False False True True

      print (df.eq(1).dot(df.columns + ', '))
      0 House, car,
      1 car, Toys,
      dtype: object


      Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:



      #2k rows
      df = pd.concat([df] * 1000, ignore_index=True)

      In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
      2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #working if no missing values
      In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
      2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

      #jpp answer
      In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
      86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

      #Naga Kiran removed answer
      In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
      813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 10 at 16:35

























      answered Nov 10 at 16:22









      jezrael

      305k20239314




      305k20239314












      • Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
        – jpp
        Nov 10 at 18:57












      • @jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
        – jezrael
        Nov 10 at 19:02






      • 1




        Worked! Thanks.
        – UJAY
        Nov 10 at 20:29










      • @UJAY - You are welcome!
        – jezrael
        Nov 10 at 20:30


















      • Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
        – jpp
        Nov 10 at 18:57












      • @jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
        – jezrael
        Nov 10 at 19:02






      • 1




        Worked! Thanks.
        – UJAY
        Nov 10 at 20:29










      • @UJAY - You are welcome!
        – jezrael
        Nov 10 at 20:30
















      Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
      – jpp
      Nov 10 at 18:57






      Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
      – jpp
      Nov 10 at 18:57














      @jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
      – jezrael
      Nov 10 at 19:02




      @jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
      – jezrael
      Nov 10 at 19:02




      1




      1




      Worked! Thanks.
      – UJAY
      Nov 10 at 20:29




      Worked! Thanks.
      – UJAY
      Nov 10 at 20:29












      @UJAY - You are welcome!
      – jezrael
      Nov 10 at 20:30




      @UJAY - You are welcome!
      – jezrael
      Nov 10 at 20:30












      up vote
      0
      down vote













      Assuming you need to create the haves series, you can use a list comprehension:



      df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

      print(df)

      Location House car Toys haves
      0 x 1 1 3 House, car
      1 y 2 1 1 car, Toys


      I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.






      share|improve this answer

























        up vote
        0
        down vote













        Assuming you need to create the haves series, you can use a list comprehension:



        df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

        print(df)

        Location House car Toys haves
        0 x 1 1 3 House, car
        1 y 2 1 1 car, Toys


        I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.






        share|improve this answer























          up vote
          0
          down vote










          up vote
          0
          down vote









          Assuming you need to create the haves series, you can use a list comprehension:



          df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

          print(df)

          Location House car Toys haves
          0 x 1 1 3 House, car
          1 y 2 1 1 car, Toys


          I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.






          share|improve this answer












          Assuming you need to create the haves series, you can use a list comprehension:



          df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

          print(df)

          Location House car Toys haves
          0 x 1 1 3 House, car
          1 y 2 1 1 car, Toys


          I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 10 at 16:05









          jpp

          80.6k194795




          80.6k194795






















              up vote
              0
              down vote













              Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.



              import numpy as np

              # numpy array of dataframe column names
              cols = np.array(df.columns)
              # boolean array to mark where dataframe values equal 1
              b = (df.values == 1)
              # list comprehension to join column names for each boolean row result
              df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]





              share|improve this answer



























                up vote
                0
                down vote













                Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.



                import numpy as np

                # numpy array of dataframe column names
                cols = np.array(df.columns)
                # boolean array to mark where dataframe values equal 1
                b = (df.values == 1)
                # list comprehension to join column names for each boolean row result
                df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]





                share|improve this answer

























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.



                  import numpy as np

                  # numpy array of dataframe column names
                  cols = np.array(df.columns)
                  # boolean array to mark where dataframe values equal 1
                  b = (df.values == 1)
                  # list comprehension to join column names for each boolean row result
                  df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]





                  share|improve this answer














                  Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.



                  import numpy as np

                  # numpy array of dataframe column names
                  cols = np.array(df.columns)
                  # boolean array to mark where dataframe values equal 1
                  b = (df.values == 1)
                  # list comprehension to join column names for each boolean row result
                  df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 10 at 18:26

























                  answered Nov 10 at 18:11









                  b2002

                  526148




                  526148






























                       

                      draft saved


                      draft discarded



















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240700%2floop-through-each-row-value-and-return-column-name%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest




















































































                      Popular posts from this blog

                      Xamarin.iOS Cant Deploy on Iphone

                      Glorious Revolution

                      Dulmage-Mendelsohn matrix decomposition in Python