Loop through each row value and return column name

up vote
1
down vote

favorite

I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.

Location        House      car    Toys              haves

x                   1        1       3         House, Car

y                   2        1       1          Car, toys

edited Nov 10 at 19:58

Ayxan

89614

asked Nov 10 at 15:57

UJAY

273

1

Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01

add a comment |

up vote
1
down vote

favorite

I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.

Location        House      car    Toys              haves

x                   1        1       3         House, Car

y                   2        1       1          Car, toys

edited Nov 10 at 19:58

Ayxan

89614

asked Nov 10 at 15:57

UJAY

273

1

Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01

add a comment |

up vote
1
down vote

favorite

I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.

Location        House      car    Toys              haves

x                   1        1       3         House, Car

y                   2        1       1          Car, toys

edited Nov 10 at 19:58

Ayxan

89614

asked Nov 10 at 15:57

UJAY

273

I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.

Location        House      car    Toys              haves

x                   1        1       3         House, Car

y                   2        1       1          Car, toys

python pandas

edited Nov 10 at 19:58

Ayxan

89614

asked Nov 10 at 15:57

UJAY

273

edited Nov 10 at 19:58

Ayxan

89614

asked Nov 10 at 15:57

UJAY

273

edited Nov 10 at 19:58

Ayxan

89614

edited Nov 10 at 19:58

Ayxan

89614

edited Nov 10 at 19:58

Ayxan

89614

asked Nov 10 at 15:57

UJAY

273

asked Nov 10 at 15:57

UJAY

273

asked Nov 10 at 15:57

UJAY

273

1

Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01

add a comment |

1

Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01

Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01

add a comment |

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important

df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

#solution with omiting first column

#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')

print (df)

  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

Details:

print (df.eq(1))

   Location  House   car   Toys

0     False   True  True  False

1     False  False  True   True



print (df.eq(1).dot(df.columns + ', '))

0    House, car, 

1     car, Toys, 

dtype: object

Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:

#2k rows

df = pd.concat([df] * 1000, ignore_index=True)



In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#working if no missing values 

In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]

2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#jpp answer

In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



#Naga Kiran removed answer

In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)

813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 16:35

answered Nov 10 at 16:22

jezrael

305k20239314

Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
– jpp
Nov 10 at 18:57

@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02

1

Worked! Thanks.
– UJAY
Nov 10 at 20:29

@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30

add a comment |

up vote
0
down vote

Assuming you need to create the haves series, you can use a list comprehension:

df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]



print(df)



  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.

answered Nov 10 at 16:05

jpp

80.6k194795

add a comment |

up vote
0
down vote

Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.

import numpy as np



# numpy array of dataframe column names

cols = np.array(df.columns)

# boolean array to mark where dataframe values equal 1

b = (df.values == 1)

# list comprehension to join column names for each boolean row result

df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]

edited Nov 10 at 18:26

answered Nov 10 at 18:11

b2002

526148

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240700%2floop-through-each-row-value-and-return-column-name%23new-answer', 'question_page');
}
);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
1
down vote

accepted

First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important

df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

#solution with omiting first column

#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')

print (df)

  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

Details:

print (df.eq(1))

   Location  House   car   Toys

0     False   True  True  False

1     False  False  True   True



print (df.eq(1).dot(df.columns + ', '))

0    House, car, 

1     car, Toys, 

dtype: object

Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:

#2k rows

df = pd.concat([df] * 1000, ignore_index=True)



In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#working if no missing values 

In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]

2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#jpp answer

In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



#Naga Kiran removed answer

In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)

813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 16:35

answered Nov 10 at 16:22

jezrael

305k20239314

Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
– jpp
Nov 10 at 18:57

@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02

1

Worked! Thanks.
– UJAY
Nov 10 at 20:29

@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30

add a comment |

up vote
1
down vote

accepted

First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important

df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

#solution with omiting first column

#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')

print (df)

  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

Details:

print (df.eq(1))

   Location  House   car   Toys

0     False   True  True  False

1     False  False  True   True



print (df.eq(1).dot(df.columns + ', '))

0    House, car, 

1     car, Toys, 

dtype: object

Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:

#2k rows

df = pd.concat([df] * 1000, ignore_index=True)



In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#working if no missing values 

In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]

2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#jpp answer

In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



#Naga Kiran removed answer

In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)

813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 16:35

answered Nov 10 at 16:22

jezrael

305k20239314

Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
– jpp
Nov 10 at 18:57

@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02

1

Worked! Thanks.
– UJAY
Nov 10 at 20:29

@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30

add a comment |

up vote
1
down vote

accepted

First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important

df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

#solution with omiting first column

#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')

print (df)

  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

Details:

print (df.eq(1))

   Location  House   car   Toys

0     False   True  True  False

1     False  False  True   True



print (df.eq(1).dot(df.columns + ', '))

0    House, car, 

1     car, Toys, 

dtype: object

Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:

#2k rows

df = pd.concat([df] * 1000, ignore_index=True)



In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#working if no missing values 

In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]

2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#jpp answer

In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



#Naga Kiran removed answer

In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)

813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 16:35

answered Nov 10 at 16:22

jezrael

305k20239314

First compare values by eq (==) with dot product with columns names and last remove last separator values by rstrip if performance is important

df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

#solution with omiting first column

#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')

print (df)

  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

Details:

print (df.eq(1))

   Location  House   car   Toys

0     False   True  True  False

1     False  False  True   True



print (df.eq(1).dot(df.columns + ', '))

0    House, car, 

1     car, Toys, 

dtype: object

Performance: depends of number of 1 values, number of columns and rows, but because dot is vectorized it is faster like loop solutions:

#2k rows

df = pd.concat([df] * 1000, ignore_index=True)



In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')

2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#working if no missing values 

In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]

2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)



#jpp answer

In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]

86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



#Naga Kiran removed answer

In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)

813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

edited Nov 10 at 16:35

answered Nov 10 at 16:22

jezrael

305k20239314

edited Nov 10 at 16:35

answered Nov 10 at 16:22

jezrael

305k20239314

answered Nov 10 at 16:22

jezrael

305k20239314

answered Nov 10 at 16:22

jezrael

305k20239314

Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
– jpp
Nov 10 at 18:57

@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02

1

Worked! Thanks.
– UJAY
Nov 10 at 20:29

@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30

add a comment |

Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
– jpp
Nov 10 at 18:57

@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02

1

Worked! Thanks.
– UJAY
Nov 10 at 20:29

@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30

Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations / object dtype, including str method accessor.
– jpp
Nov 10 at 18:57

@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02

Worked! Thanks.
– UJAY
Nov 10 at 20:29

@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30

add a comment |

up vote
0
down vote

Assuming you need to create the haves series, you can use a list comprehension:

df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]



print(df)



  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.

answered Nov 10 at 16:05

jpp

80.6k194795

add a comment |

up vote
0
down vote

Assuming you need to create the haves series, you can use a list comprehension:

df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]



print(df)



  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.

answered Nov 10 at 16:05

jpp

80.6k194795

add a comment |

up vote
0
down vote

Assuming you need to create the haves series, you can use a list comprehension:

df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]



print(df)



  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.

answered Nov 10 at 16:05

jpp

80.6k194795

Assuming you need to create the haves series, you can use a list comprehension:

df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]



print(df)



  Location  House  car  Toys       haves

0        x      1    1     3  House, car

1        y      2    1     1   car, Toys

I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object dtype series.

answered Nov 10 at 16:05

jpp

80.6k194795

answered Nov 10 at 16:05

jpp

80.6k194795

answered Nov 10 at 16:05

jpp

80.6k194795

answered Nov 10 at 16:05

jpp

80.6k194795

add a comment |

up vote
0
down vote

import numpy as np



# numpy array of dataframe column names

cols = np.array(df.columns)

# boolean array to mark where dataframe values equal 1

b = (df.values == 1)

# list comprehension to join column names for each boolean row result

df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]

edited Nov 10 at 18:26

answered Nov 10 at 18:11

b2002

526148

add a comment |

up vote
0
down vote

import numpy as np



# numpy array of dataframe column names

cols = np.array(df.columns)

# boolean array to mark where dataframe values equal 1

b = (df.values == 1)

# list comprehension to join column names for each boolean row result

df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]

edited Nov 10 at 18:26

answered Nov 10 at 18:11

b2002

526148

add a comment |

up vote
0
down vote

import numpy as np



# numpy array of dataframe column names

cols = np.array(df.columns)

# boolean array to mark where dataframe values equal 1

b = (df.values == 1)

# list comprehension to join column names for each boolean row result

df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]

edited Nov 10 at 18:26

answered Nov 10 at 18:11

b2002

526148

import numpy as np



# numpy array of dataframe column names

cols = np.array(df.columns)

# boolean array to mark where dataframe values equal 1

b = (df.values == 1)

# list comprehension to join column names for each boolean row result

df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]

edited Nov 10 at 18:26

answered Nov 10 at 18:11

b2002

526148

edited Nov 10 at 18:26

answered Nov 10 at 18:11

b2002

526148

answered Nov 10 at 18:11

b2002

526148

answered Nov 10 at 18:11

b2002

526148

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky