Loop through each row value and return column name
up vote
1
down vote
favorite
I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.
Location House car Toys haves
x 1 1 3 House, Car
y 2 1 1 Car, toys
python pandas
add a comment |
up vote
1
down vote
favorite
I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.
Location House car Toys haves
x 1 1 3 House, Car
y 2 1 1 Car, toys
python pandas
1
Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.
Location House car Toys haves
x 1 1 3 House, Car
y 2 1 1 Car, toys
python pandas
I have a table below. I would like to return in the haves column for each row in the table, column names where row values equals one, using python and pandas.
Location House car Toys haves
x 1 1 3 House, Car
y 2 1 1 Car, toys
python pandas
python pandas
edited Nov 10 at 19:58
Ayxan
89614
89614
asked Nov 10 at 15:57
UJAY
273
273
1
Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01
add a comment |
1
Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01
1
1
Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01
Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01
add a comment |
3 Answers
3
active
oldest
votes
up vote
1
down vote
accepted
First compare values by eq
(==
) with dot
product with columns names and last remove last separator values by rstrip
if performance is important
df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
#solution with omiting first column
#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
print (df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
Details:
print (df.eq(1))
Location House car Toys
0 False True True False
1 False False True True
print (df.eq(1).dot(df.columns + ', '))
0 House, car,
1 car, Toys,
dtype: object
Performance: depends of number of 1
values, number of columns and rows, but because dot
is vectorized it is faster like loop solutions:
#2k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#working if no missing values
In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#jpp answer
In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#Naga Kiran removed answer
In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations /object
dtype, includingstr
method accessor.
– jpp
Nov 10 at 18:57
@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02
1
Worked! Thanks.
– UJAY
Nov 10 at 20:29
@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30
add a comment |
up vote
0
down vote
Assuming you need to create the haves
series, you can use a list comprehension:
df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
print(df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object
dtype series.
add a comment |
up vote
0
down vote
Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.
import numpy as np
# numpy array of dataframe column names
cols = np.array(df.columns)
# boolean array to mark where dataframe values equal 1
b = (df.values == 1)
# list comprehension to join column names for each boolean row result
df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
First compare values by eq
(==
) with dot
product with columns names and last remove last separator values by rstrip
if performance is important
df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
#solution with omiting first column
#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
print (df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
Details:
print (df.eq(1))
Location House car Toys
0 False True True False
1 False False True True
print (df.eq(1).dot(df.columns + ', '))
0 House, car,
1 car, Toys,
dtype: object
Performance: depends of number of 1
values, number of columns and rows, but because dot
is vectorized it is faster like loop solutions:
#2k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#working if no missing values
In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#jpp answer
In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#Naga Kiran removed answer
In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations /object
dtype, includingstr
method accessor.
– jpp
Nov 10 at 18:57
@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02
1
Worked! Thanks.
– UJAY
Nov 10 at 20:29
@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30
add a comment |
up vote
1
down vote
accepted
First compare values by eq
(==
) with dot
product with columns names and last remove last separator values by rstrip
if performance is important
df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
#solution with omiting first column
#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
print (df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
Details:
print (df.eq(1))
Location House car Toys
0 False True True False
1 False False True True
print (df.eq(1).dot(df.columns + ', '))
0 House, car,
1 car, Toys,
dtype: object
Performance: depends of number of 1
values, number of columns and rows, but because dot
is vectorized it is faster like loop solutions:
#2k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#working if no missing values
In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#jpp answer
In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#Naga Kiran removed answer
In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations /object
dtype, includingstr
method accessor.
– jpp
Nov 10 at 18:57
@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02
1
Worked! Thanks.
– UJAY
Nov 10 at 20:29
@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
First compare values by eq
(==
) with dot
product with columns names and last remove last separator values by rstrip
if performance is important
df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
#solution with omiting first column
#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
print (df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
Details:
print (df.eq(1))
Location House car Toys
0 False True True False
1 False False True True
print (df.eq(1).dot(df.columns + ', '))
0 House, car,
1 car, Toys,
dtype: object
Performance: depends of number of 1
values, number of columns and rows, but because dot
is vectorized it is faster like loop solutions:
#2k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#working if no missing values
In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#jpp answer
In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#Naga Kiran removed answer
In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
First compare values by eq
(==
) with dot
product with columns names and last remove last separator values by rstrip
if performance is important
df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
#solution with omiting first column
#df['haves'] = df.iloc[:, 1:].eq(1).dot(df.columns[1:] + ', ').str.rstrip(', ')
print (df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
Details:
print (df.eq(1))
Location House car Toys
0 False True True False
1 False False True True
print (df.eq(1).dot(df.columns + ', '))
0 House, car,
1 car, Toys,
dtype: object
Performance: depends of number of 1
values, number of columns and rows, but because dot
is vectorized it is faster like loop solutions:
#2k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [183]: %timeit df['haves'] = df.eq(1).dot(df.columns + ', ').str.rstrip(', ')
2.65 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#working if no missing values
In [184]: %timeit df['haves'] = [x.rstrip(', ') for x in df.eq(1).dot(df.columns + ', ')]
2.43 ms ± 38.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
#jpp answer
In [185]: %timeit df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
86.5 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
#Naga Kiran removed answer
In [186]: %timeit df['have'] = df.apply(lambda x: ','.join(x[x.eq(1)].index),1)
813 ms ± 8.66 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
edited Nov 10 at 16:35
answered Nov 10 at 16:22
jezrael
305k20239314
305k20239314
Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations /object
dtype, includingstr
method accessor.
– jpp
Nov 10 at 18:57
@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02
1
Worked! Thanks.
– UJAY
Nov 10 at 20:29
@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30
add a comment |
Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations /object
dtype, includingstr
method accessor.
– jpp
Nov 10 at 18:57
@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02
1
Worked! Thanks.
– UJAY
Nov 10 at 20:29
@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30
Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations /
object
dtype, including str
method accessor.– jpp
Nov 10 at 18:57
Even though your solutions are faster, are you sure it's not still an underlying Python-level loop? My understanding is contiguous memory blocks are not possible with string operations /
object
dtype, including str
method accessor.– jpp
Nov 10 at 18:57
@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02
@jpp - Hard question, really. Maybe it is possible check by prun, but never do it before.
– jezrael
Nov 10 at 19:02
1
1
Worked! Thanks.
– UJAY
Nov 10 at 20:29
Worked! Thanks.
– UJAY
Nov 10 at 20:29
@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30
@UJAY - You are welcome!
– jezrael
Nov 10 at 20:30
add a comment |
up vote
0
down vote
Assuming you need to create the haves
series, you can use a list comprehension:
df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
print(df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object
dtype series.
add a comment |
up vote
0
down vote
Assuming you need to create the haves
series, you can use a list comprehension:
df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
print(df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object
dtype series.
add a comment |
up vote
0
down vote
up vote
0
down vote
Assuming you need to create the haves
series, you can use a list comprehension:
df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
print(df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object
dtype series.
Assuming you need to create the haves
series, you can use a list comprehension:
df['haves'] = [', '.join(df.columns[1:][idx]) for idx in df.iloc[:, 1:].eq(1).values]
print(df)
Location House car Toys haves
0 x 1 1 3 House, car
1 y 2 1 1 car, Toys
I don't believe this task is easily vectorisable since you can have a variable number of values satisfying your condition, and your result will be an object
dtype series.
answered Nov 10 at 16:05
jpp
80.6k194795
80.6k194795
add a comment |
add a comment |
up vote
0
down vote
Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.
import numpy as np
# numpy array of dataframe column names
cols = np.array(df.columns)
# boolean array to mark where dataframe values equal 1
b = (df.values == 1)
# list comprehension to join column names for each boolean row result
df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]
add a comment |
up vote
0
down vote
Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.
import numpy as np
# numpy array of dataframe column names
cols = np.array(df.columns)
# boolean array to mark where dataframe values equal 1
b = (df.values == 1)
# list comprehension to join column names for each boolean row result
df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]
add a comment |
up vote
0
down vote
up vote
0
down vote
Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.
import numpy as np
# numpy array of dataframe column names
cols = np.array(df.columns)
# boolean array to mark where dataframe values equal 1
b = (df.values == 1)
# list comprehension to join column names for each boolean row result
df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]
Here is a simple way which is only a little slower than the dot method and may be easier to understand. It does use numpy to create the cols array which speeds things up considerably vs. just using df.columns as a list.
import numpy as np
# numpy array of dataframe column names
cols = np.array(df.columns)
# boolean array to mark where dataframe values equal 1
b = (df.values == 1)
# list comprehension to join column names for each boolean row result
df['haves'] = [', '.join(cols[(row_index)]) for row_index in b]
edited Nov 10 at 18:26
answered Nov 10 at 18:11
b2002
526148
526148
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53240700%2floop-through-each-row-value-and-return-column-name%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
1
Isn't that what you already have? (minus some casing differences) ? Or are you saying you want to generate that haves column from the existing columns?
– Jon Clements♦
Nov 10 at 16:01