Using a Python dict to replace/clean data in a Pandas DataFrame

up vote
2
down vote

favorite

I have a Dataframe(table2) that looks something like

57                  INVERNESS

361                 INVERNESS

533                 INVERNESS

535     INVERNESS KERRY DOWNS

758           INVERNESS GREEN

807                 INVERNESS

970           INVERNESS POINT

971                 INVERNESS

And so on..

And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like

             NSUBDIVISION

SUBDIVISION 

*HUFFMAN**8MILES NE  OTHER

0                    OTHER

00                   OTHER

000                  OTHER

INVERNESS POINT      INVERNESS

And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:

{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',

  u'0': u'OTHER',

  u'00': u'OTHER',

  u'000': u'OTHER',

  u'0000': u'OTHER',

  u'INVERNESS POINT': u'INVERNESS',

And so ..on (I mention this as I'm not sure the dict is Properly formed)

I want to do something like

 table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)

I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS

currently I just get TypeError: unhashable type: 'dict'

asked Aug 5 '13 at 21:23

dartdog

4,025165094

add a comment |

up vote
2
down vote

favorite

I have a Dataframe(table2) that looks something like

57                  INVERNESS

361                 INVERNESS

533                 INVERNESS

535     INVERNESS KERRY DOWNS

758           INVERNESS GREEN

807                 INVERNESS

970           INVERNESS POINT

971                 INVERNESS

And so on..

And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like

             NSUBDIVISION

SUBDIVISION 

*HUFFMAN**8MILES NE  OTHER

0                    OTHER

00                   OTHER

000                  OTHER

INVERNESS POINT      INVERNESS

And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:

{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',

  u'0': u'OTHER',

  u'00': u'OTHER',

  u'000': u'OTHER',

  u'0000': u'OTHER',

  u'INVERNESS POINT': u'INVERNESS',

And so ..on (I mention this as I'm not sure the dict is Properly formed)

I want to do something like

 table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)

currently I just get TypeError: unhashable type: 'dict'

asked Aug 5 '13 at 21:23

dartdog

4,025165094

add a comment |

up vote
2
down vote

favorite

I have a Dataframe(table2) that looks something like

57                  INVERNESS

361                 INVERNESS

533                 INVERNESS

535     INVERNESS KERRY DOWNS

758           INVERNESS GREEN

807                 INVERNESS

970           INVERNESS POINT

971                 INVERNESS

And so on..

And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like

             NSUBDIVISION

SUBDIVISION 

*HUFFMAN**8MILES NE  OTHER

0                    OTHER

00                   OTHER

000                  OTHER

INVERNESS POINT      INVERNESS

And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:

{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',

  u'0': u'OTHER',

  u'00': u'OTHER',

  u'000': u'OTHER',

  u'0000': u'OTHER',

  u'INVERNESS POINT': u'INVERNESS',

And so ..on (I mention this as I'm not sure the dict is Properly formed)

I want to do something like

 table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)

currently I just get TypeError: unhashable type: 'dict'

asked Aug 5 '13 at 21:23

dartdog

4,025165094

I have a Dataframe(table2) that looks something like

57                  INVERNESS

361                 INVERNESS

533                 INVERNESS

535     INVERNESS KERRY DOWNS

758           INVERNESS GREEN

807                 INVERNESS

970           INVERNESS POINT

971                 INVERNESS

And so on..

And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like

             NSUBDIVISION

SUBDIVISION 

*HUFFMAN**8MILES NE  OTHER

0                    OTHER

00                   OTHER

000                  OTHER

INVERNESS POINT      INVERNESS

And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:

{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',

  u'0': u'OTHER',

  u'00': u'OTHER',

  u'000': u'OTHER',

  u'0000': u'OTHER',

  u'INVERNESS POINT': u'INVERNESS',

And so ..on (I mention this as I'm not sure the dict is Properly formed)

I want to do something like

 table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)

currently I just get TypeError: unhashable type: 'dict'

python pandas

asked Aug 5 '13 at 21:23

dartdog

4,025165094

asked Aug 5 '13 at 21:23

dartdog

4,025165094

asked Aug 5 '13 at 21:23

dartdog

4,025165094

asked Aug 5 '13 at 21:23

dartdog

4,025165094

asked Aug 5 '13 at 21:23

dartdog

4,025165094

add a comment |

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

I think you want to create a dictionary from the Series (rather than the DataFrame):

In [11]: translate_df['NSUBDIVISION'].to_dict()

Out[11]:

{'*HUFFMAN**8MILES NE': 'OTHER',

 '0': 'OTHER',

 '00': 'OTHER',

 '000': 'OTHER',

 'INVERNESS POINT': 'INVERNESS'}

And use this to replace the column:

In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())

Out[12]:

0                INVERNESS

1                INVERNESS

2                INVERNESS

3    INVERNESS KERRY DOWNS

4          INVERNESS GREEN

5                INVERNESS

6                INVERNESS

7                INVERNESS

Name: SUBDIVISION, dtype: object

edited Aug 6 '13 at 13:18

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07

Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25

@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28

@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29

Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38

|
show 5 more comments

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18067982%2fusing-a-python-dict-to-replace-clean-data-in-a-pandas-dataframe%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

I think you want to create a dictionary from the Series (rather than the DataFrame):

In [11]: translate_df['NSUBDIVISION'].to_dict()

Out[11]:

{'*HUFFMAN**8MILES NE': 'OTHER',

 '0': 'OTHER',

 '00': 'OTHER',

 '000': 'OTHER',

 'INVERNESS POINT': 'INVERNESS'}

And use this to replace the column:

In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())

Out[12]:

0                INVERNESS

1                INVERNESS

2                INVERNESS

3    INVERNESS KERRY DOWNS

4          INVERNESS GREEN

5                INVERNESS

6                INVERNESS

7                INVERNESS

Name: SUBDIVISION, dtype: object

edited Aug 6 '13 at 13:18

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07

Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25

@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28

@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29

Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38

|
show 5 more comments

up vote
3
down vote

accepted

I think you want to create a dictionary from the Series (rather than the DataFrame):

In [11]: translate_df['NSUBDIVISION'].to_dict()

Out[11]:

{'*HUFFMAN**8MILES NE': 'OTHER',

 '0': 'OTHER',

 '00': 'OTHER',

 '000': 'OTHER',

 'INVERNESS POINT': 'INVERNESS'}

And use this to replace the column:

In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())

Out[12]:

0                INVERNESS

1                INVERNESS

2                INVERNESS

3    INVERNESS KERRY DOWNS

4          INVERNESS GREEN

5                INVERNESS

6                INVERNESS

7                INVERNESS

Name: SUBDIVISION, dtype: object

edited Aug 6 '13 at 13:18

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07

Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25

@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28

@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29

Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38

|
show 5 more comments

up vote
3
down vote

accepted

I think you want to create a dictionary from the Series (rather than the DataFrame):

In [11]: translate_df['NSUBDIVISION'].to_dict()

Out[11]:

{'*HUFFMAN**8MILES NE': 'OTHER',

 '0': 'OTHER',

 '00': 'OTHER',

 '000': 'OTHER',

 'INVERNESS POINT': 'INVERNESS'}

And use this to replace the column:

In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())

Out[12]:

0                INVERNESS

1                INVERNESS

2                INVERNESS

3    INVERNESS KERRY DOWNS

4          INVERNESS GREEN

5                INVERNESS

6                INVERNESS

7                INVERNESS

Name: SUBDIVISION, dtype: object

edited Aug 6 '13 at 13:18

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

I think you want to create a dictionary from the Series (rather than the DataFrame):

In [11]: translate_df['NSUBDIVISION'].to_dict()

Out[11]:

{'*HUFFMAN**8MILES NE': 'OTHER',

 '0': 'OTHER',

 '00': 'OTHER',

 '000': 'OTHER',

 'INVERNESS POINT': 'INVERNESS'}

And use this to replace the column:

In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())

Out[12]:

0                INVERNESS

1                INVERNESS

2                INVERNESS

3    INVERNESS KERRY DOWNS

4          INVERNESS GREEN

5                INVERNESS

6                INVERNESS

7                INVERNESS

Name: SUBDIVISION, dtype: object

edited Aug 6 '13 at 13:18

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

edited Aug 6 '13 at 13:18

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

answered Aug 5 '13 at 21:46

Andy Hayden

173k48416403

mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07

Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25

@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28

@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29

Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38

|
show 5 more comments

mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07

Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25

@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28

@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29

Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38

mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
– dartdog
Aug 5 '13 at 22:07

Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
– dartdog
Aug 5 '13 at 22:25

@dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
– Andy Hayden
Aug 5 '13 at 22:28

@dartdog adding as inplace, like in your example, changes df. :)
– Andy Hayden
Aug 5 '13 at 22:29

Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
– dartdog
Aug 5 '13 at 22:38

|
show 5 more comments

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky