Using a Python dict to replace/clean data in a Pandas DataFrame











up vote
2
down vote

favorite












I have a Dataframe(table2) that looks something like



57                  INVERNESS
361 INVERNESS
533 INVERNESS
535 INVERNESS KERRY DOWNS
758 INVERNESS GREEN
807 INVERNESS
970 INVERNESS POINT
971 INVERNESS


And so on..



And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
When I read the translate table into Pandas I get a DF that looks like



             NSUBDIVISION
SUBDIVISION
*HUFFMAN**8MILES NE OTHER
0 OTHER
00 OTHER
000 OTHER
INVERNESS POINT INVERNESS


And so on..
When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:



{u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
u'0': u'OTHER',
u'00': u'OTHER',
u'000': u'OTHER',
u'0000': u'OTHER',
u'INVERNESS POINT': u'INVERNESS',


And so ..on (I mention this as I'm not sure the dict is Properly formed)



I want to do something like



 table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)


I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS



currently I just get TypeError: unhashable type: 'dict'










share|improve this question


























    up vote
    2
    down vote

    favorite












    I have a Dataframe(table2) that looks something like



    57                  INVERNESS
    361 INVERNESS
    533 INVERNESS
    535 INVERNESS KERRY DOWNS
    758 INVERNESS GREEN
    807 INVERNESS
    970 INVERNESS POINT
    971 INVERNESS


    And so on..



    And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
    When I read the translate table into Pandas I get a DF that looks like



                 NSUBDIVISION
    SUBDIVISION
    *HUFFMAN**8MILES NE OTHER
    0 OTHER
    00 OTHER
    000 OTHER
    INVERNESS POINT INVERNESS


    And so on..
    When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:



    {u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
    u'0': u'OTHER',
    u'00': u'OTHER',
    u'000': u'OTHER',
    u'0000': u'OTHER',
    u'INVERNESS POINT': u'INVERNESS',


    And so ..on (I mention this as I'm not sure the dict is Properly formed)



    I want to do something like



     table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)


    I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS



    currently I just get TypeError: unhashable type: 'dict'










    share|improve this question
























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I have a Dataframe(table2) that looks something like



      57                  INVERNESS
      361 INVERNESS
      533 INVERNESS
      535 INVERNESS KERRY DOWNS
      758 INVERNESS GREEN
      807 INVERNESS
      970 INVERNESS POINT
      971 INVERNESS


      And so on..



      And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
      When I read the translate table into Pandas I get a DF that looks like



                   NSUBDIVISION
      SUBDIVISION
      *HUFFMAN**8MILES NE OTHER
      0 OTHER
      00 OTHER
      000 OTHER
      INVERNESS POINT INVERNESS


      And so on..
      When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:



      {u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
      u'0': u'OTHER',
      u'00': u'OTHER',
      u'000': u'OTHER',
      u'0000': u'OTHER',
      u'INVERNESS POINT': u'INVERNESS',


      And so ..on (I mention this as I'm not sure the dict is Properly formed)



      I want to do something like



       table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)


      I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS



      currently I just get TypeError: unhashable type: 'dict'










      share|improve this question













      I have a Dataframe(table2) that looks something like



      57                  INVERNESS
      361 INVERNESS
      533 INVERNESS
      535 INVERNESS KERRY DOWNS
      758 INVERNESS GREEN
      807 INVERNESS
      970 INVERNESS POINT
      971 INVERNESS


      And so on..



      And I need to map/replace the names using a Dict, (which I have in a Excel sheet)
      When I read the translate table into Pandas I get a DF that looks like



                   NSUBDIVISION
      SUBDIVISION
      *HUFFMAN**8MILES NE OTHER
      0 OTHER
      00 OTHER
      000 OTHER
      INVERNESS POINT INVERNESS


      And so on..
      When I convert it to a DICT using xlate=df.to_dict() I get a dict(xlate) that looks like:



      {u'NSUBDIVISION': {u'*HUFFMAN**8MILES NE': u'OTHER',
      u'0': u'OTHER',
      u'00': u'OTHER',
      u'000': u'OTHER',
      u'0000': u'OTHER',
      u'INVERNESS POINT': u'INVERNESS',


      And so ..on (I mention this as I'm not sure the dict is Properly formed)



      I want to do something like



       table2['SUBDIVISION'].replace(to_replace=xlate,inplace=True)


      I want to look up values in the 1st col of the xlate table match them to table2['SUBDIVISION'] and if found replace contents of SUBDIVISION with the values in xlate column 2 if not leave them alone (bonus..actually if col 2 is NAn I'd like to leave it alone as well) for instance above finding INVERNESS POINT will be replaced by INVERNESS



      currently I just get TypeError: unhashable type: 'dict'







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Aug 5 '13 at 21:23









      dartdog

      4,025165094




      4,025165094
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          3
          down vote



          accepted










          I think you want to create a dictionary from the Series (rather than the DataFrame):



          In [11]: translate_df['NSUBDIVISION'].to_dict()
          Out[11]:
          {'*HUFFMAN**8MILES NE': 'OTHER',
          '0': 'OTHER',
          '00': 'OTHER',
          '000': 'OTHER',
          'INVERNESS POINT': 'INVERNESS'}


          And use this to replace the column:



          In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
          Out[12]:
          0 INVERNESS
          1 INVERNESS
          2 INVERNESS
          3 INVERNESS KERRY DOWNS
          4 INVERNESS GREEN
          5 INVERNESS
          6 INVERNESS
          7 INVERNESS
          Name: SUBDIVISION, dtype: object





          share|improve this answer























          • mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
            – dartdog
            Aug 5 '13 at 22:07












          • Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
            – dartdog
            Aug 5 '13 at 22:25










          • @dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
            – Andy Hayden
            Aug 5 '13 at 22:28










          • @dartdog adding as inplace, like in your example, changes df. :)
            – Andy Hayden
            Aug 5 '13 at 22:29










          • Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
            – dartdog
            Aug 5 '13 at 22:38













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














           

          draft saved


          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18067982%2fusing-a-python-dict-to-replace-clean-data-in-a-pandas-dataframe%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          3
          down vote



          accepted










          I think you want to create a dictionary from the Series (rather than the DataFrame):



          In [11]: translate_df['NSUBDIVISION'].to_dict()
          Out[11]:
          {'*HUFFMAN**8MILES NE': 'OTHER',
          '0': 'OTHER',
          '00': 'OTHER',
          '000': 'OTHER',
          'INVERNESS POINT': 'INVERNESS'}


          And use this to replace the column:



          In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
          Out[12]:
          0 INVERNESS
          1 INVERNESS
          2 INVERNESS
          3 INVERNESS KERRY DOWNS
          4 INVERNESS GREEN
          5 INVERNESS
          6 INVERNESS
          7 INVERNESS
          Name: SUBDIVISION, dtype: object





          share|improve this answer























          • mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
            – dartdog
            Aug 5 '13 at 22:07












          • Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
            – dartdog
            Aug 5 '13 at 22:25










          • @dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
            – Andy Hayden
            Aug 5 '13 at 22:28










          • @dartdog adding as inplace, like in your example, changes df. :)
            – Andy Hayden
            Aug 5 '13 at 22:29










          • Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
            – dartdog
            Aug 5 '13 at 22:38

















          up vote
          3
          down vote



          accepted










          I think you want to create a dictionary from the Series (rather than the DataFrame):



          In [11]: translate_df['NSUBDIVISION'].to_dict()
          Out[11]:
          {'*HUFFMAN**8MILES NE': 'OTHER',
          '0': 'OTHER',
          '00': 'OTHER',
          '000': 'OTHER',
          'INVERNESS POINT': 'INVERNESS'}


          And use this to replace the column:



          In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
          Out[12]:
          0 INVERNESS
          1 INVERNESS
          2 INVERNESS
          3 INVERNESS KERRY DOWNS
          4 INVERNESS GREEN
          5 INVERNESS
          6 INVERNESS
          7 INVERNESS
          Name: SUBDIVISION, dtype: object





          share|improve this answer























          • mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
            – dartdog
            Aug 5 '13 at 22:07












          • Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
            – dartdog
            Aug 5 '13 at 22:25










          • @dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
            – Andy Hayden
            Aug 5 '13 at 22:28










          • @dartdog adding as inplace, like in your example, changes df. :)
            – Andy Hayden
            Aug 5 '13 at 22:29










          • Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
            – dartdog
            Aug 5 '13 at 22:38















          up vote
          3
          down vote



          accepted







          up vote
          3
          down vote



          accepted






          I think you want to create a dictionary from the Series (rather than the DataFrame):



          In [11]: translate_df['NSUBDIVISION'].to_dict()
          Out[11]:
          {'*HUFFMAN**8MILES NE': 'OTHER',
          '0': 'OTHER',
          '00': 'OTHER',
          '000': 'OTHER',
          'INVERNESS POINT': 'INVERNESS'}


          And use this to replace the column:



          In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
          Out[12]:
          0 INVERNESS
          1 INVERNESS
          2 INVERNESS
          3 INVERNESS KERRY DOWNS
          4 INVERNESS GREEN
          5 INVERNESS
          6 INVERNESS
          7 INVERNESS
          Name: SUBDIVISION, dtype: object





          share|improve this answer














          I think you want to create a dictionary from the Series (rather than the DataFrame):



          In [11]: translate_df['NSUBDIVISION'].to_dict()
          Out[11]:
          {'*HUFFMAN**8MILES NE': 'OTHER',
          '0': 'OTHER',
          '00': 'OTHER',
          '000': 'OTHER',
          'INVERNESS POINT': 'INVERNESS'}


          And use this to replace the column:



          In [12]: df['SUBDIVISION'].replace(translate_df['NSUBDIVISION'].to_dict())
          Out[12]:
          0 INVERNESS
          1 INVERNESS
          2 INVERNESS
          3 INVERNESS KERRY DOWNS
          4 INVERNESS GREEN
          5 INVERNESS
          6 INVERNESS
          7 INVERNESS
          Name: SUBDIVISION, dtype: object






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Aug 6 '13 at 13:18

























          answered Aug 5 '13 at 21:46









          Andy Hayden

          173k48416403




          173k48416403












          • mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
            – dartdog
            Aug 5 '13 at 22:07












          • Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
            – dartdog
            Aug 5 '13 at 22:25










          • @dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
            – Andy Hayden
            Aug 5 '13 at 22:28










          • @dartdog adding as inplace, like in your example, changes df. :)
            – Andy Hayden
            Aug 5 '13 at 22:29










          • Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
            – dartdog
            Aug 5 '13 at 22:38




















          • mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
            – dartdog
            Aug 5 '13 at 22:07












          • Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
            – dartdog
            Aug 5 '13 at 22:25










          • @dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
            – Andy Hayden
            Aug 5 '13 at 22:28










          • @dartdog adding as inplace, like in your example, changes df. :)
            – Andy Hayden
            Aug 5 '13 at 22:29










          • Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
            – dartdog
            Aug 5 '13 at 22:38


















          mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
          – dartdog
          Aug 5 '13 at 22:07






          mmm I think the to dict helps,, but I have and need a DF as the final result and in any event after I got a good Dict using your suggestion I then ran table2['SUBDIVISION'].replace(xlate['NSUBDIVISION'].to_dict()) and got Key error on NSBDIVISION ??? The actual tables are much larger than the samples.. Nsubdivision (col2) is not unique, but the 1st column is which in the original df For Xlate is 'SUBDIVISION' ?
          – dartdog
          Aug 5 '13 at 22:07














          Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
          – dartdog
          Aug 5 '13 at 22:25




          Ahh I had tried to reassign the translate dict to a new variable which it did not like..so it seems to have replaced but not "inplace" on the DF?
          – dartdog
          Aug 5 '13 at 22:25












          @dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
          – Andy Hayden
          Aug 5 '13 at 22:28




          @dartdog a little confused with what you are doing tbh, don't think you should reuse xlate dictionary at all :s If there are multiple columns perhaps use iloc to access them...?
          – Andy Hayden
          Aug 5 '13 at 22:28












          @dartdog adding as inplace, like in your example, changes df. :)
          – Andy Hayden
          Aug 5 '13 at 22:29




          @dartdog adding as inplace, like in your example, changes df. :)
          – Andy Hayden
          Aug 5 '13 at 22:29












          Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
          – dartdog
          Aug 5 '13 at 22:38






          Odd this does not seem to do it?? (no errors though) >> table2['SUBDIVISION'].replace(df['NSUBDIVISION'],inplace=True)
          – dartdog
          Aug 5 '13 at 22:38




















           

          draft saved


          draft discarded



















































           


          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f18067982%2fusing-a-python-dict-to-replace-clean-data-in-a-pandas-dataframe%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Xamarin.iOS Cant Deploy on Iphone

          Glorious Revolution

          Dulmage-Mendelsohn matrix decomposition in Python