Prevent NaN to become index and column in dataframe pivot












1















I have a dataframe which I extend to include values for all increments in 2 columns. Therefor NaN values are introduced, as expected and desired.



However, when I use pivot on this dataframe I'll get a row and column for NaN.
Can I prevent this when doing the pivot? If not, how can I drop a column named NaN?
Trying to drop it by calling [NaN],[nan] or ['NaN'] doesn't work.



Dropping the columns and rows where all values are NaN is not working in this case as the column headings and indexes are used for a seaborn heatmap plot, so eventhough all cell values are NaN it is still useful to have it as the index and key values are not NaN



Sample code;



import pandas as pd
import numpy as np

#generate dummy data
df = pd.DataFrame({'Y': np.random.randint(130,140,10),
'X': np.random.randint(5,10,10),
'Z': np.random.randint(0,25, size=10)})
df = df.round(1)
#create dataset for heatmap
#group by axis to plot
df = df.groupby(['X','Y']).sum().reset_index()
df = df.sort_values(by=['Y'])
dfY = pd.DataFrame({'Y':np.arange(min(df['Y']), max(df['Y']),1)})
dfX = pd.DataFrame({'X':np.arange(min(df['X']), max(df['X']),1)})
df = pd.merge(df,dfY, how='outer', on='Y')
df = pd.merge(df,dfX, how='outer', on='X')
df = df.round(1)

print(df)
#restructure for heatmap
data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
print(data)


Sample DataFrame before pivot:



      X      Y     Z
0 5.0 132.0 0.0
1 5.0 135.0 20.0
2 5.0 137.0 17.0
3 7.0 132.0 15.0
4 7.0 133.0 3.0
5 6.0 133.0 30.0
6 6.0 135.0 22.0
7 6.0 138.0 16.0
8 9.0 135.0 9.0
9 NaN 134.0 NaN
10 NaN 136.0 NaN
11 8.0 NaN NaN


After pivot:



X       NaN    5.0   6.0   7.0   8.0   9.0
Y
138.0 NaN NaN 16.0 NaN NaN NaN
137.0 NaN 17.0 NaN NaN NaN NaN
136.0 NaN NaN NaN NaN NaN NaN
135.0 NaN 20.0 22.0 NaN NaN 9.0
134.0 NaN NaN NaN NaN NaN NaN
133.0 NaN NaN 30.0 3.0 NaN NaN
132.0 NaN 0.0 NaN 15.0 NaN NaN
NaN NaN NaN NaN NaN NaN NaN


Desired output:



X        5.0   6.0   7.0   8.0   9.0
Y
138.0 NaN 16.0 NaN NaN NaN
137.0 17.0 NaN NaN NaN NaN
136.0 NaN NaN NaN NaN NaN
135.0 20.0 22.0 NaN NaN 9.0
134.0 NaN NaN NaN NaN NaN
133.0 NaN 30.0 3.0 NaN NaN
132.0 0.0 NaN 15.0 NaN NaN









share|improve this question





























    1















    I have a dataframe which I extend to include values for all increments in 2 columns. Therefor NaN values are introduced, as expected and desired.



    However, when I use pivot on this dataframe I'll get a row and column for NaN.
    Can I prevent this when doing the pivot? If not, how can I drop a column named NaN?
    Trying to drop it by calling [NaN],[nan] or ['NaN'] doesn't work.



    Dropping the columns and rows where all values are NaN is not working in this case as the column headings and indexes are used for a seaborn heatmap plot, so eventhough all cell values are NaN it is still useful to have it as the index and key values are not NaN



    Sample code;



    import pandas as pd
    import numpy as np

    #generate dummy data
    df = pd.DataFrame({'Y': np.random.randint(130,140,10),
    'X': np.random.randint(5,10,10),
    'Z': np.random.randint(0,25, size=10)})
    df = df.round(1)
    #create dataset for heatmap
    #group by axis to plot
    df = df.groupby(['X','Y']).sum().reset_index()
    df = df.sort_values(by=['Y'])
    dfY = pd.DataFrame({'Y':np.arange(min(df['Y']), max(df['Y']),1)})
    dfX = pd.DataFrame({'X':np.arange(min(df['X']), max(df['X']),1)})
    df = pd.merge(df,dfY, how='outer', on='Y')
    df = pd.merge(df,dfX, how='outer', on='X')
    df = df.round(1)

    print(df)
    #restructure for heatmap
    data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
    print(data)


    Sample DataFrame before pivot:



          X      Y     Z
    0 5.0 132.0 0.0
    1 5.0 135.0 20.0
    2 5.0 137.0 17.0
    3 7.0 132.0 15.0
    4 7.0 133.0 3.0
    5 6.0 133.0 30.0
    6 6.0 135.0 22.0
    7 6.0 138.0 16.0
    8 9.0 135.0 9.0
    9 NaN 134.0 NaN
    10 NaN 136.0 NaN
    11 8.0 NaN NaN


    After pivot:



    X       NaN    5.0   6.0   7.0   8.0   9.0
    Y
    138.0 NaN NaN 16.0 NaN NaN NaN
    137.0 NaN 17.0 NaN NaN NaN NaN
    136.0 NaN NaN NaN NaN NaN NaN
    135.0 NaN 20.0 22.0 NaN NaN 9.0
    134.0 NaN NaN NaN NaN NaN NaN
    133.0 NaN NaN 30.0 3.0 NaN NaN
    132.0 NaN 0.0 NaN 15.0 NaN NaN
    NaN NaN NaN NaN NaN NaN NaN


    Desired output:



    X        5.0   6.0   7.0   8.0   9.0
    Y
    138.0 NaN 16.0 NaN NaN NaN
    137.0 17.0 NaN NaN NaN NaN
    136.0 NaN NaN NaN NaN NaN
    135.0 20.0 22.0 NaN NaN 9.0
    134.0 NaN NaN NaN NaN NaN
    133.0 NaN 30.0 3.0 NaN NaN
    132.0 0.0 NaN 15.0 NaN NaN









    share|improve this question



























      1












      1








      1








      I have a dataframe which I extend to include values for all increments in 2 columns. Therefor NaN values are introduced, as expected and desired.



      However, when I use pivot on this dataframe I'll get a row and column for NaN.
      Can I prevent this when doing the pivot? If not, how can I drop a column named NaN?
      Trying to drop it by calling [NaN],[nan] or ['NaN'] doesn't work.



      Dropping the columns and rows where all values are NaN is not working in this case as the column headings and indexes are used for a seaborn heatmap plot, so eventhough all cell values are NaN it is still useful to have it as the index and key values are not NaN



      Sample code;



      import pandas as pd
      import numpy as np

      #generate dummy data
      df = pd.DataFrame({'Y': np.random.randint(130,140,10),
      'X': np.random.randint(5,10,10),
      'Z': np.random.randint(0,25, size=10)})
      df = df.round(1)
      #create dataset for heatmap
      #group by axis to plot
      df = df.groupby(['X','Y']).sum().reset_index()
      df = df.sort_values(by=['Y'])
      dfY = pd.DataFrame({'Y':np.arange(min(df['Y']), max(df['Y']),1)})
      dfX = pd.DataFrame({'X':np.arange(min(df['X']), max(df['X']),1)})
      df = pd.merge(df,dfY, how='outer', on='Y')
      df = pd.merge(df,dfX, how='outer', on='X')
      df = df.round(1)

      print(df)
      #restructure for heatmap
      data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
      print(data)


      Sample DataFrame before pivot:



            X      Y     Z
      0 5.0 132.0 0.0
      1 5.0 135.0 20.0
      2 5.0 137.0 17.0
      3 7.0 132.0 15.0
      4 7.0 133.0 3.0
      5 6.0 133.0 30.0
      6 6.0 135.0 22.0
      7 6.0 138.0 16.0
      8 9.0 135.0 9.0
      9 NaN 134.0 NaN
      10 NaN 136.0 NaN
      11 8.0 NaN NaN


      After pivot:



      X       NaN    5.0   6.0   7.0   8.0   9.0
      Y
      138.0 NaN NaN 16.0 NaN NaN NaN
      137.0 NaN 17.0 NaN NaN NaN NaN
      136.0 NaN NaN NaN NaN NaN NaN
      135.0 NaN 20.0 22.0 NaN NaN 9.0
      134.0 NaN NaN NaN NaN NaN NaN
      133.0 NaN NaN 30.0 3.0 NaN NaN
      132.0 NaN 0.0 NaN 15.0 NaN NaN
      NaN NaN NaN NaN NaN NaN NaN


      Desired output:



      X        5.0   6.0   7.0   8.0   9.0
      Y
      138.0 NaN 16.0 NaN NaN NaN
      137.0 17.0 NaN NaN NaN NaN
      136.0 NaN NaN NaN NaN NaN
      135.0 20.0 22.0 NaN NaN 9.0
      134.0 NaN NaN NaN NaN NaN
      133.0 NaN 30.0 3.0 NaN NaN
      132.0 0.0 NaN 15.0 NaN NaN









      share|improve this question
















      I have a dataframe which I extend to include values for all increments in 2 columns. Therefor NaN values are introduced, as expected and desired.



      However, when I use pivot on this dataframe I'll get a row and column for NaN.
      Can I prevent this when doing the pivot? If not, how can I drop a column named NaN?
      Trying to drop it by calling [NaN],[nan] or ['NaN'] doesn't work.



      Dropping the columns and rows where all values are NaN is not working in this case as the column headings and indexes are used for a seaborn heatmap plot, so eventhough all cell values are NaN it is still useful to have it as the index and key values are not NaN



      Sample code;



      import pandas as pd
      import numpy as np

      #generate dummy data
      df = pd.DataFrame({'Y': np.random.randint(130,140,10),
      'X': np.random.randint(5,10,10),
      'Z': np.random.randint(0,25, size=10)})
      df = df.round(1)
      #create dataset for heatmap
      #group by axis to plot
      df = df.groupby(['X','Y']).sum().reset_index()
      df = df.sort_values(by=['Y'])
      dfY = pd.DataFrame({'Y':np.arange(min(df['Y']), max(df['Y']),1)})
      dfX = pd.DataFrame({'X':np.arange(min(df['X']), max(df['X']),1)})
      df = pd.merge(df,dfY, how='outer', on='Y')
      df = pd.merge(df,dfX, how='outer', on='X')
      df = df.round(1)

      print(df)
      #restructure for heatmap
      data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)
      print(data)


      Sample DataFrame before pivot:



            X      Y     Z
      0 5.0 132.0 0.0
      1 5.0 135.0 20.0
      2 5.0 137.0 17.0
      3 7.0 132.0 15.0
      4 7.0 133.0 3.0
      5 6.0 133.0 30.0
      6 6.0 135.0 22.0
      7 6.0 138.0 16.0
      8 9.0 135.0 9.0
      9 NaN 134.0 NaN
      10 NaN 136.0 NaN
      11 8.0 NaN NaN


      After pivot:



      X       NaN    5.0   6.0   7.0   8.0   9.0
      Y
      138.0 NaN NaN 16.0 NaN NaN NaN
      137.0 NaN 17.0 NaN NaN NaN NaN
      136.0 NaN NaN NaN NaN NaN NaN
      135.0 NaN 20.0 22.0 NaN NaN 9.0
      134.0 NaN NaN NaN NaN NaN NaN
      133.0 NaN NaN 30.0 3.0 NaN NaN
      132.0 NaN 0.0 NaN 15.0 NaN NaN
      NaN NaN NaN NaN NaN NaN NaN


      Desired output:



      X        5.0   6.0   7.0   8.0   9.0
      Y
      138.0 NaN 16.0 NaN NaN NaN
      137.0 17.0 NaN NaN NaN NaN
      136.0 NaN NaN NaN NaN NaN
      135.0 20.0 22.0 NaN NaN 9.0
      134.0 NaN NaN NaN NaN NaN
      133.0 NaN 30.0 3.0 NaN NaN
      132.0 0.0 NaN 15.0 NaN NaN






      python pandas dataframe pivot






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 13:00







      Chrisvdberge

















      asked Nov 14 '18 at 12:27









      ChrisvdbergeChrisvdberge

      5602822




      5602822
























          1 Answer
          1






          active

          oldest

          votes


















          1














          For me working drop by missing value np.nan:



          data = (df.pivot("Y","X","Z")
          .sort_values(by=['Y'],ascending=False)
          .drop(np.nan, axis=1)
          .drop(np.nan))


          Or:



          data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)

          data = data.reindex(index=data.index.difference([np.nan]),
          columns=data.columns.difference([np.nan]))





          share|improve this answer





















          • 1





            excellent, np.nan is the solution for the drop. Adding a drop for axis=0 as well get's me the desired output

            – Chrisvdberge
            Nov 14 '18 at 12:33













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300244%2fprevent-nan-to-become-index-and-column-in-dataframe-pivot%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          For me working drop by missing value np.nan:



          data = (df.pivot("Y","X","Z")
          .sort_values(by=['Y'],ascending=False)
          .drop(np.nan, axis=1)
          .drop(np.nan))


          Or:



          data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)

          data = data.reindex(index=data.index.difference([np.nan]),
          columns=data.columns.difference([np.nan]))





          share|improve this answer





















          • 1





            excellent, np.nan is the solution for the drop. Adding a drop for axis=0 as well get's me the desired output

            – Chrisvdberge
            Nov 14 '18 at 12:33


















          1














          For me working drop by missing value np.nan:



          data = (df.pivot("Y","X","Z")
          .sort_values(by=['Y'],ascending=False)
          .drop(np.nan, axis=1)
          .drop(np.nan))


          Or:



          data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)

          data = data.reindex(index=data.index.difference([np.nan]),
          columns=data.columns.difference([np.nan]))





          share|improve this answer





















          • 1





            excellent, np.nan is the solution for the drop. Adding a drop for axis=0 as well get's me the desired output

            – Chrisvdberge
            Nov 14 '18 at 12:33
















          1












          1








          1







          For me working drop by missing value np.nan:



          data = (df.pivot("Y","X","Z")
          .sort_values(by=['Y'],ascending=False)
          .drop(np.nan, axis=1)
          .drop(np.nan))


          Or:



          data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)

          data = data.reindex(index=data.index.difference([np.nan]),
          columns=data.columns.difference([np.nan]))





          share|improve this answer















          For me working drop by missing value np.nan:



          data = (df.pivot("Y","X","Z")
          .sort_values(by=['Y'],ascending=False)
          .drop(np.nan, axis=1)
          .drop(np.nan))


          Or:



          data = df.pivot("Y","X","Z").sort_values(by=['Y'],ascending=False)

          data = data.reindex(index=data.index.difference([np.nan]),
          columns=data.columns.difference([np.nan]))






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 14 '18 at 12:34

























          answered Nov 14 '18 at 12:31









          jezraeljezrael

          333k24274351




          333k24274351








          • 1





            excellent, np.nan is the solution for the drop. Adding a drop for axis=0 as well get's me the desired output

            – Chrisvdberge
            Nov 14 '18 at 12:33
















          • 1





            excellent, np.nan is the solution for the drop. Adding a drop for axis=0 as well get's me the desired output

            – Chrisvdberge
            Nov 14 '18 at 12:33










          1




          1





          excellent, np.nan is the solution for the drop. Adding a drop for axis=0 as well get's me the desired output

          – Chrisvdberge
          Nov 14 '18 at 12:33







          excellent, np.nan is the solution for the drop. Adding a drop for axis=0 as well get's me the desired output

          – Chrisvdberge
          Nov 14 '18 at 12:33




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300244%2fprevent-nan-to-become-index-and-column-in-dataframe-pivot%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Xamarin.iOS Cant Deploy on Iphone

          Glorious Revolution

          Dulmage-Mendelsohn matrix decomposition in Python