Pandas split column of lists into multiple columns
up vote
36
down vote
favorite
I have a pandas dataFrame with one column that looks like the following:
`
In [207]:df2.teams
Out[207]:
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
7 [SF, NYG]
`
I need to split this column of lists into 2 columns named team1 and team2 using pandas
python pandas
add a comment |
up vote
36
down vote
favorite
I have a pandas dataFrame with one column that looks like the following:
`
In [207]:df2.teams
Out[207]:
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
7 [SF, NYG]
`
I need to split this column of lists into 2 columns named team1 and team2 using pandas
python pandas
add a comment |
up vote
36
down vote
favorite
up vote
36
down vote
favorite
I have a pandas dataFrame with one column that looks like the following:
`
In [207]:df2.teams
Out[207]:
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
7 [SF, NYG]
`
I need to split this column of lists into 2 columns named team1 and team2 using pandas
python pandas
I have a pandas dataFrame with one column that looks like the following:
`
In [207]:df2.teams
Out[207]:
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
7 [SF, NYG]
`
I need to split this column of lists into 2 columns named team1 and team2 using pandas
python pandas
python pandas
asked Feb 18 '16 at 20:01
user2938093
3731412
3731412
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
80
down vote
accepted
You can use DataFrame
constructor with lists
created by converting to numpy array
by values
with tolist
:
import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
teams
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
print (df2)
teams team1 team2
0 [SF, NYG] SF NYG
1 [SF, NYG] SF NYG
2 [SF, NYG] SF NYG
3 [SF, NYG] SF NYG
4 [SF, NYG] SF NYG
5 [SF, NYG] SF NYG
6 [SF, NYG] SF NYG
And for new DataFrame
:
df3 = pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
print (df3)
team1 team2
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
Solution with apply(pd.Series)
is very slow:
#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)
In [89]: %timeit df2['teams'].apply(pd.Series)
1 loop, best of 3: 1.15 s per loop
In [90]: %timeit pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
1000 loops, best of 3: 820 µs per loop
what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.
– Sherlock
Jul 9 '17 at 16:21
1
Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.
– user1700890
Nov 6 '17 at 15:16
1
@user1700890 - yes, or specify index in DataFrame constructordf2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
– jezrael
Nov 6 '17 at 15:18
1
@Catbuilts - yes, if exist vectorize solution the best avoid it.
– jezrael
Nov 20 at 11:08
1
@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this
– jezrael
Nov 20 at 11:21
|
show 11 more comments
up vote
5
down vote
There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:
df2 = pd.DataFrame(df['meta'].str.split().values.tolist())
1
I got an error but I resolved it by removing thestr.split()
. This was much simpler and has the advantage if you don't know the number of items in your list.
– otteheng
Jan 11 at 16:29
add a comment |
up vote
4
down vote
Much simpler solution:
pd.DataFrame(df2.teams.tolist(), columns=['team1', 'team2'])
Yields,
team1 team2
-------------
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
7 SF NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df.teams.str.split('<delim>', expand=True).values,
columns=['team1', 'team2'])
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
80
down vote
accepted
You can use DataFrame
constructor with lists
created by converting to numpy array
by values
with tolist
:
import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
teams
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
print (df2)
teams team1 team2
0 [SF, NYG] SF NYG
1 [SF, NYG] SF NYG
2 [SF, NYG] SF NYG
3 [SF, NYG] SF NYG
4 [SF, NYG] SF NYG
5 [SF, NYG] SF NYG
6 [SF, NYG] SF NYG
And for new DataFrame
:
df3 = pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
print (df3)
team1 team2
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
Solution with apply(pd.Series)
is very slow:
#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)
In [89]: %timeit df2['teams'].apply(pd.Series)
1 loop, best of 3: 1.15 s per loop
In [90]: %timeit pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
1000 loops, best of 3: 820 µs per loop
what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.
– Sherlock
Jul 9 '17 at 16:21
1
Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.
– user1700890
Nov 6 '17 at 15:16
1
@user1700890 - yes, or specify index in DataFrame constructordf2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
– jezrael
Nov 6 '17 at 15:18
1
@Catbuilts - yes, if exist vectorize solution the best avoid it.
– jezrael
Nov 20 at 11:08
1
@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this
– jezrael
Nov 20 at 11:21
|
show 11 more comments
up vote
80
down vote
accepted
You can use DataFrame
constructor with lists
created by converting to numpy array
by values
with tolist
:
import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
teams
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
print (df2)
teams team1 team2
0 [SF, NYG] SF NYG
1 [SF, NYG] SF NYG
2 [SF, NYG] SF NYG
3 [SF, NYG] SF NYG
4 [SF, NYG] SF NYG
5 [SF, NYG] SF NYG
6 [SF, NYG] SF NYG
And for new DataFrame
:
df3 = pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
print (df3)
team1 team2
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
Solution with apply(pd.Series)
is very slow:
#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)
In [89]: %timeit df2['teams'].apply(pd.Series)
1 loop, best of 3: 1.15 s per loop
In [90]: %timeit pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
1000 loops, best of 3: 820 µs per loop
what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.
– Sherlock
Jul 9 '17 at 16:21
1
Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.
– user1700890
Nov 6 '17 at 15:16
1
@user1700890 - yes, or specify index in DataFrame constructordf2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
– jezrael
Nov 6 '17 at 15:18
1
@Catbuilts - yes, if exist vectorize solution the best avoid it.
– jezrael
Nov 20 at 11:08
1
@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this
– jezrael
Nov 20 at 11:21
|
show 11 more comments
up vote
80
down vote
accepted
up vote
80
down vote
accepted
You can use DataFrame
constructor with lists
created by converting to numpy array
by values
with tolist
:
import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
teams
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
print (df2)
teams team1 team2
0 [SF, NYG] SF NYG
1 [SF, NYG] SF NYG
2 [SF, NYG] SF NYG
3 [SF, NYG] SF NYG
4 [SF, NYG] SF NYG
5 [SF, NYG] SF NYG
6 [SF, NYG] SF NYG
And for new DataFrame
:
df3 = pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
print (df3)
team1 team2
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
Solution with apply(pd.Series)
is very slow:
#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)
In [89]: %timeit df2['teams'].apply(pd.Series)
1 loop, best of 3: 1.15 s per loop
In [90]: %timeit pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
1000 loops, best of 3: 820 µs per loop
You can use DataFrame
constructor with lists
created by converting to numpy array
by values
with tolist
:
import pandas as pd
d1 = {'teams': [['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],
['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG'],['SF', 'NYG']]}
df2 = pd.DataFrame(d1)
print (df2)
teams
0 [SF, NYG]
1 [SF, NYG]
2 [SF, NYG]
3 [SF, NYG]
4 [SF, NYG]
5 [SF, NYG]
6 [SF, NYG]
df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
print (df2)
teams team1 team2
0 [SF, NYG] SF NYG
1 [SF, NYG] SF NYG
2 [SF, NYG] SF NYG
3 [SF, NYG] SF NYG
4 [SF, NYG] SF NYG
5 [SF, NYG] SF NYG
6 [SF, NYG] SF NYG
And for new DataFrame
:
df3 = pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
print (df3)
team1 team2
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
Solution with apply(pd.Series)
is very slow:
#7k rows
df2 = pd.concat([df2]*1000).reset_index(drop=True)
In [89]: %timeit df2['teams'].apply(pd.Series)
1 loop, best of 3: 1.15 s per loop
In [90]: %timeit pd.DataFrame(df2['teams'].values.tolist(), columns=['team1','team2'])
1000 loops, best of 3: 820 µs per loop
edited Nov 6 '17 at 15:18
answered Feb 18 '16 at 20:06
jezrael
314k21250328
314k21250328
what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.
– Sherlock
Jul 9 '17 at 16:21
1
Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.
– user1700890
Nov 6 '17 at 15:16
1
@user1700890 - yes, or specify index in DataFrame constructordf2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
– jezrael
Nov 6 '17 at 15:18
1
@Catbuilts - yes, if exist vectorize solution the best avoid it.
– jezrael
Nov 20 at 11:08
1
@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this
– jezrael
Nov 20 at 11:21
|
show 11 more comments
what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.
– Sherlock
Jul 9 '17 at 16:21
1
Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.
– user1700890
Nov 6 '17 at 15:16
1
@user1700890 - yes, or specify index in DataFrame constructordf2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
– jezrael
Nov 6 '17 at 15:18
1
@Catbuilts - yes, if exist vectorize solution the best avoid it.
– jezrael
Nov 20 at 11:08
1
@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this
– jezrael
Nov 20 at 11:21
what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.
– Sherlock
Jul 9 '17 at 16:21
what about if column name has space, like "team 1". I tried to access column value this way df2['team 1'] but it does not work.
– Sherlock
Jul 9 '17 at 16:21
1
1
Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.
– user1700890
Nov 6 '17 at 15:16
Minor caveat, if you are using it on existing dataframe, make sure to reset index, otherwise it will not assign correctly.
– user1700890
Nov 6 '17 at 15:16
1
1
@user1700890 - yes, or specify index in DataFrame constructor
df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
– jezrael
Nov 6 '17 at 15:18
@user1700890 - yes, or specify index in DataFrame constructor
df2[['team1','team2']] = pd.DataFrame(df2.teams.values.tolist(), index= df2.index)
– jezrael
Nov 6 '17 at 15:18
1
1
@Catbuilts - yes, if exist vectorize solution the best avoid it.
– jezrael
Nov 20 at 11:08
@Catbuilts - yes, if exist vectorize solution the best avoid it.
– jezrael
Nov 20 at 11:08
1
1
@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this
– jezrael
Nov 20 at 11:21
@Catbuilts - yes, obviously. Vectorized means generally no loops, so no apply, no for, no list comprehensions. But it depends what need exactly. Maybe also help this
– jezrael
Nov 20 at 11:21
|
show 11 more comments
up vote
5
down vote
There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:
df2 = pd.DataFrame(df['meta'].str.split().values.tolist())
1
I got an error but I resolved it by removing thestr.split()
. This was much simpler and has the advantage if you don't know the number of items in your list.
– otteheng
Jan 11 at 16:29
add a comment |
up vote
5
down vote
There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:
df2 = pd.DataFrame(df['meta'].str.split().values.tolist())
1
I got an error but I resolved it by removing thestr.split()
. This was much simpler and has the advantage if you don't know the number of items in your list.
– otteheng
Jan 11 at 16:29
add a comment |
up vote
5
down vote
up vote
5
down vote
There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:
df2 = pd.DataFrame(df['meta'].str.split().values.tolist())
There seems to be a syntactically simpler way, and therefore easier to remember, as opposed to the proposed solutions. I'm assuming that the column is called 'meta' in a dataframe df:
df2 = pd.DataFrame(df['meta'].str.split().values.tolist())
answered Jan 9 at 11:53
mikkokotila
44747
44747
1
I got an error but I resolved it by removing thestr.split()
. This was much simpler and has the advantage if you don't know the number of items in your list.
– otteheng
Jan 11 at 16:29
add a comment |
1
I got an error but I resolved it by removing thestr.split()
. This was much simpler and has the advantage if you don't know the number of items in your list.
– otteheng
Jan 11 at 16:29
1
1
I got an error but I resolved it by removing the
str.split()
. This was much simpler and has the advantage if you don't know the number of items in your list.– otteheng
Jan 11 at 16:29
I got an error but I resolved it by removing the
str.split()
. This was much simpler and has the advantage if you don't know the number of items in your list.– otteheng
Jan 11 at 16:29
add a comment |
up vote
4
down vote
Much simpler solution:
pd.DataFrame(df2.teams.tolist(), columns=['team1', 'team2'])
Yields,
team1 team2
-------------
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
7 SF NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df.teams.str.split('<delim>', expand=True).values,
columns=['team1', 'team2'])
add a comment |
up vote
4
down vote
Much simpler solution:
pd.DataFrame(df2.teams.tolist(), columns=['team1', 'team2'])
Yields,
team1 team2
-------------
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
7 SF NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df.teams.str.split('<delim>', expand=True).values,
columns=['team1', 'team2'])
add a comment |
up vote
4
down vote
up vote
4
down vote
Much simpler solution:
pd.DataFrame(df2.teams.tolist(), columns=['team1', 'team2'])
Yields,
team1 team2
-------------
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
7 SF NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df.teams.str.split('<delim>', expand=True).values,
columns=['team1', 'team2'])
Much simpler solution:
pd.DataFrame(df2.teams.tolist(), columns=['team1', 'team2'])
Yields,
team1 team2
-------------
0 SF NYG
1 SF NYG
2 SF NYG
3 SF NYG
4 SF NYG
5 SF NYG
6 SF NYG
7 SF NYG
If you wanted to split a column of delimited strings rather than lists, you could similarly do:
pd.DataFrame(df.teams.str.split('<delim>', expand=True).values,
columns=['team1', 'team2'])
answered Jun 15 at 17:03
Joseph Davison
13229
13229
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f35491274%2fpandas-split-column-of-lists-into-multiple-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown