Using a Python dict to replace/clean data in a Pandas DataFrame
up vote
2
down vote
favorite
I have a Dataframe(table2) that looks something like
57 INVERNESS
361 INVERNESS
533 INVERNESS
535 INVERNESS KERRY DOWNS
758 INVERNESS GREEN
807 INVERNESS
970 INVERNESS POINT
971 INVERNESS
And so on..
And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like
NSUBDIVISION
SUBDIVISION
*HUFFMAN**8MILES NE OTHER
0 OTHER
00 OTHER
000 OTHER
INVERNESS POINT INVERNESS
And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:
{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
u'0': u'OTHER',
u'00': u'OTHER',
u'000': u'OTHER',
u'0000': u'OTHER',
u'INVERNESS POINT': u'INVERNESS',
And so ..on (I mention this as I'm not sure the dict is Properly formed)
I want to do something like
table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)
I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS
currently I just get TypeError: unhashable type: 'dict'
python pandas
add a comment |
up vote
2
down vote
favorite
I have a Dataframe(table2) that looks something like
57 INVERNESS
361 INVERNESS
533 INVERNESS
535 INVERNESS KERRY DOWNS
758 INVERNESS GREEN
807 INVERNESS
970 INVERNESS POINT
971 INVERNESS
And so on..
And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like
NSUBDIVISION
SUBDIVISION
*HUFFMAN**8MILES NE OTHER
0 OTHER
00 OTHER
000 OTHER
INVERNESS POINT INVERNESS
And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:
{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
u'0': u'OTHER',
u'00': u'OTHER',
u'000': u'OTHER',
u'0000': u'OTHER',
u'INVERNESS POINT': u'INVERNESS',
And so ..on (I mention this as I'm not sure the dict is Properly formed)
I want to do something like
table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)
I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS
currently I just get TypeError: unhashable type: 'dict'
python pandas
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a Dataframe(table2) that looks something like
57 INVERNESS
361 INVERNESS
533 INVERNESS
535 INVERNESS KERRY DOWNS
758 INVERNESS GREEN
807 INVERNESS
970 INVERNESS POINT
971 INVERNESS
And so on..
And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like
NSUBDIVISION
SUBDIVISION
*HUFFMAN**8MILES NE OTHER
0 OTHER
00 OTHER
000 OTHER
INVERNESS POINT INVERNESS
And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:
{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
u'0': u'OTHER',
u'00': u'OTHER',
u'000': u'OTHER',
u'0000': u'OTHER',
u'INVERNESS POINT': u'INVERNESS',
And so ..on (I mention this as I'm not sure the dict is Properly formed)
I want to do something like
table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)
I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS
currently I just get TypeError: unhashable type: 'dict'
python pandas
I have a Dataframe(table2) that looks something like
57 INVERNESS
361 INVERNESS
533 INVERNESS
535 INVERNESS KERRY DOWNS
758 INVERNESS GREEN
807 INVERNESS
970 INVERNESS POINT
971 INVERNESS
And so on..
And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like
NSUBDIVISION
SUBDIVISION
*HUFFMAN**8MILES NE OTHER
0 OTHER
00 OTHER
000 OTHER
INVERNESS POINT INVERNESS
And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:
{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
u'0': u'OTHER',
u'00': u'OTHER',
u'000': u'OTHER',
u'0000': u'OTHER',
u'INVERNESS POINT': u'INVERNESS',
And so ..on (I mention this as I'm not sure the dict is Properly formed)
I want to do something like
table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)
I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS
currently I just get TypeError: unhashable type: 'dict'
python pandas
python pandas
asked Aug 5 '13 at 21:23
dartdog
4,025165094
4,025165094
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
I think you want to create a dictionary from the Series (rather than the DataFrame):
In [11]: translate_df['NSUBDIVISION'].to_dict()
Out[11]:
{'*HUFFMAN**8MILES NE': 'OTHER',
'0': 'OTHER',
'00': 'OTHER',
'000': 'OTHER',
'INVERNESS POINT': 'INVERNESS'}
And use this to replace
the column:
In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
Out[12]:
0 INVERNESS
1 INVERNESS
2 INVERNESS
3 INVERNESS KERRY DOWNS
4 INVERNESS GREEN
5 INVERNESS
6 INVERNESS
7 INVERNESS
Name: SUBDIVISION, dtype: object
mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07
Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25
@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28
@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29
Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38
|
show 5 more comments
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
I think you want to create a dictionary from the Series (rather than the DataFrame):
In [11]: translate_df['NSUBDIVISION'].to_dict()
Out[11]:
{'*HUFFMAN**8MILES NE': 'OTHER',
'0': 'OTHER',
'00': 'OTHER',
'000': 'OTHER',
'INVERNESS POINT': 'INVERNESS'}
And use this to replace
the column:
In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
Out[12]:
0 INVERNESS
1 INVERNESS
2 INVERNESS
3 INVERNESS KERRY DOWNS
4 INVERNESS GREEN
5 INVERNESS
6 INVERNESS
7 INVERNESS
Name: SUBDIVISION, dtype: object
mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07
Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25
@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28
@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29
Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38
|
show 5 more comments
up vote
3
down vote
accepted
I think you want to create a dictionary from the Series (rather than the DataFrame):
In [11]: translate_df['NSUBDIVISION'].to_dict()
Out[11]:
{'*HUFFMAN**8MILES NE': 'OTHER',
'0': 'OTHER',
'00': 'OTHER',
'000': 'OTHER',
'INVERNESS POINT': 'INVERNESS'}
And use this to replace
the column:
In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
Out[12]:
0 INVERNESS
1 INVERNESS
2 INVERNESS
3 INVERNESS KERRY DOWNS
4 INVERNESS GREEN
5 INVERNESS
6 INVERNESS
7 INVERNESS
Name: SUBDIVISION, dtype: object
mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07
Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25
@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28
@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29
Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38
|
show 5 more comments
up vote
3
down vote
accepted
up vote
3
down vote
accepted
I think you want to create a dictionary from the Series (rather than the DataFrame):
In [11]: translate_df['NSUBDIVISION'].to_dict()
Out[11]:
{'*HUFFMAN**8MILES NE': 'OTHER',
'0': 'OTHER',
'00': 'OTHER',
'000': 'OTHER',
'INVERNESS POINT': 'INVERNESS'}
And use this to replace
the column:
In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
Out[12]:
0 INVERNESS
1 INVERNESS
2 INVERNESS
3 INVERNESS KERRY DOWNS
4 INVERNESS GREEN
5 INVERNESS
6 INVERNESS
7 INVERNESS
Name: SUBDIVISION, dtype: object
I think you want to create a dictionary from the Series (rather than the DataFrame):
In [11]: translate_df['NSUBDIVISION'].to_dict()
Out[11]:
{'*HUFFMAN**8MILES NE': 'OTHER',
'0': 'OTHER',
'00': 'OTHER',
'000': 'OTHER',
'INVERNESS POINT': 'INVERNESS'}
And use this to replace
the column:
In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
Out[12]:
0 INVERNESS
1 INVERNESS
2 INVERNESS
3 INVERNESS KERRY DOWNS
4 INVERNESS GREEN
5 INVERNESS
6 INVERNESS
7 INVERNESS
Name: SUBDIVISION, dtype: object
edited Aug 6 '13 at 13:18
answered Aug 5 '13 at 21:46
Andy Hayden
173k48416403
173k48416403
mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07
Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25
@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28
@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29
Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38
|
show 5 more comments
mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07
Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25
@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28
@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29
Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38
mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07
mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07
Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25
Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25
@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28
@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28
@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29
@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29
Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38
Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38
|
show 5 more comments
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18067982%2fusing-a-python-dict-to-replace-clean-data-in-a-pandas-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown