How does one determine the rows that have NaN in only some subset of columns?












0















Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.



I believe the following should work...



my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')


However, I am coming across the following exception



TypeError: unhashable type: 'numpy.ndarray'


Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.



Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?



Thanks.










share|improve this question



























    0















    Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.



    I believe the following should work...



    my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')


    However, I am coming across the following exception



    TypeError: unhashable type: 'numpy.ndarray'


    Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.



    Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?



    Thanks.










    share|improve this question

























      0












      0








      0








      Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.



      I believe the following should work...



      my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')


      However, I am coming across the following exception



      TypeError: unhashable type: 'numpy.ndarray'


      Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.



      Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?



      Thanks.










      share|improve this question














      Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.



      I believe the following should work...



      my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')


      However, I am coming across the following exception



      TypeError: unhashable type: 'numpy.ndarray'


      Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.



      Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?



      Thanks.







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 1:48









      Spencer LeeSpencer Lee

      32




      32
























          2 Answers
          2






          active

          oldest

          votes


















          0














          One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.



          # Method 1: build the boolean mask using bitwise operations
          mask = ((df['colA'].isnull()) |
          (df['colZ'].isnull()) |
          (df['colN'].isnull()))
          null_rows = df[mask]

          # Method 2: pick desired columns from an element-wise boolean mask of null flags
          mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
          null_rows = df[mask]





          share|improve this answer































            0














            You can slice the columns and use df.isna().



            df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):



                      0         1         2         3         4
            0 0.763847 1.343149 0.096778 NaN 0.532322
            1 -0.364227 -0.560027 NaN NaN NaN
            2 -0.556234 0.384970 0.476016 NaN -0.385282
            3 0.604560 -0.390024 -1.697762 1.207321 0.829520
            4 NaN NaN 0.754011 2.137359 -0.594698
            5 0.513925 0.651509 -1.500094 NaN -0.556604
            6 NaN NaN -1.388030 NaN NaN
            7 NaN -0.634743 0.024213 -0.439684 0.765820
            8 0.815948 0.545350 -0.823986 NaN 1.655538
            9 0.687386 1.477326 NaN 0.207531 0.571499


            output of df.isna():



                   0      1      2      3      4
            0 False False False True False
            1 False False True True True
            2 False False False True False
            3 False False False False False
            4 True True False False False
            5 False False False True False
            6 True True False True True
            7 True False False False False
            8 False False False True False
            9 False False True False False


            Row-wise operations:



            df.isna().sum(axis=1)
            0 1
            1 3
            2 1
            3 0
            4 2
            5 1
            6 4
            7 1
            8 1
            9 1


            Column-wise:



            df.isna().sum()
            0 3
            1 2
            2 2
            3 6
            4 2


            To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292037%2fhow-does-one-determine-the-rows-that-have-nan-in-only-some-subset-of-columns%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0














              One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.



              # Method 1: build the boolean mask using bitwise operations
              mask = ((df['colA'].isnull()) |
              (df['colZ'].isnull()) |
              (df['colN'].isnull()))
              null_rows = df[mask]

              # Method 2: pick desired columns from an element-wise boolean mask of null flags
              mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
              null_rows = df[mask]





              share|improve this answer




























                0














                One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.



                # Method 1: build the boolean mask using bitwise operations
                mask = ((df['colA'].isnull()) |
                (df['colZ'].isnull()) |
                (df['colN'].isnull()))
                null_rows = df[mask]

                # Method 2: pick desired columns from an element-wise boolean mask of null flags
                mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
                null_rows = df[mask]





                share|improve this answer


























                  0












                  0








                  0







                  One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.



                  # Method 1: build the boolean mask using bitwise operations
                  mask = ((df['colA'].isnull()) |
                  (df['colZ'].isnull()) |
                  (df['colN'].isnull()))
                  null_rows = df[mask]

                  # Method 2: pick desired columns from an element-wise boolean mask of null flags
                  mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
                  null_rows = df[mask]





                  share|improve this answer













                  One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.



                  # Method 1: build the boolean mask using bitwise operations
                  mask = ((df['colA'].isnull()) |
                  (df['colZ'].isnull()) |
                  (df['colN'].isnull()))
                  null_rows = df[mask]

                  # Method 2: pick desired columns from an element-wise boolean mask of null flags
                  mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
                  null_rows = df[mask]






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 14 '18 at 2:03









                  Peter LeimbiglerPeter Leimbigler

                  3,8881415




                  3,8881415

























                      0














                      You can slice the columns and use df.isna().



                      df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):



                                0         1         2         3         4
                      0 0.763847 1.343149 0.096778 NaN 0.532322
                      1 -0.364227 -0.560027 NaN NaN NaN
                      2 -0.556234 0.384970 0.476016 NaN -0.385282
                      3 0.604560 -0.390024 -1.697762 1.207321 0.829520
                      4 NaN NaN 0.754011 2.137359 -0.594698
                      5 0.513925 0.651509 -1.500094 NaN -0.556604
                      6 NaN NaN -1.388030 NaN NaN
                      7 NaN -0.634743 0.024213 -0.439684 0.765820
                      8 0.815948 0.545350 -0.823986 NaN 1.655538
                      9 0.687386 1.477326 NaN 0.207531 0.571499


                      output of df.isna():



                             0      1      2      3      4
                      0 False False False True False
                      1 False False True True True
                      2 False False False True False
                      3 False False False False False
                      4 True True False False False
                      5 False False False True False
                      6 True True False True True
                      7 True False False False False
                      8 False False False True False
                      9 False False True False False


                      Row-wise operations:



                      df.isna().sum(axis=1)
                      0 1
                      1 3
                      2 1
                      3 0
                      4 2
                      5 1
                      6 4
                      7 1
                      8 1
                      9 1


                      Column-wise:



                      df.isna().sum()
                      0 3
                      1 2
                      2 2
                      3 6
                      4 2


                      To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html






                      share|improve this answer




























                        0














                        You can slice the columns and use df.isna().



                        df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):



                                  0         1         2         3         4
                        0 0.763847 1.343149 0.096778 NaN 0.532322
                        1 -0.364227 -0.560027 NaN NaN NaN
                        2 -0.556234 0.384970 0.476016 NaN -0.385282
                        3 0.604560 -0.390024 -1.697762 1.207321 0.829520
                        4 NaN NaN 0.754011 2.137359 -0.594698
                        5 0.513925 0.651509 -1.500094 NaN -0.556604
                        6 NaN NaN -1.388030 NaN NaN
                        7 NaN -0.634743 0.024213 -0.439684 0.765820
                        8 0.815948 0.545350 -0.823986 NaN 1.655538
                        9 0.687386 1.477326 NaN 0.207531 0.571499


                        output of df.isna():



                               0      1      2      3      4
                        0 False False False True False
                        1 False False True True True
                        2 False False False True False
                        3 False False False False False
                        4 True True False False False
                        5 False False False True False
                        6 True True False True True
                        7 True False False False False
                        8 False False False True False
                        9 False False True False False


                        Row-wise operations:



                        df.isna().sum(axis=1)
                        0 1
                        1 3
                        2 1
                        3 0
                        4 2
                        5 1
                        6 4
                        7 1
                        8 1
                        9 1


                        Column-wise:



                        df.isna().sum()
                        0 3
                        1 2
                        2 2
                        3 6
                        4 2


                        To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html






                        share|improve this answer


























                          0












                          0








                          0







                          You can slice the columns and use df.isna().



                          df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):



                                    0         1         2         3         4
                          0 0.763847 1.343149 0.096778 NaN 0.532322
                          1 -0.364227 -0.560027 NaN NaN NaN
                          2 -0.556234 0.384970 0.476016 NaN -0.385282
                          3 0.604560 -0.390024 -1.697762 1.207321 0.829520
                          4 NaN NaN 0.754011 2.137359 -0.594698
                          5 0.513925 0.651509 -1.500094 NaN -0.556604
                          6 NaN NaN -1.388030 NaN NaN
                          7 NaN -0.634743 0.024213 -0.439684 0.765820
                          8 0.815948 0.545350 -0.823986 NaN 1.655538
                          9 0.687386 1.477326 NaN 0.207531 0.571499


                          output of df.isna():



                                 0      1      2      3      4
                          0 False False False True False
                          1 False False True True True
                          2 False False False True False
                          3 False False False False False
                          4 True True False False False
                          5 False False False True False
                          6 True True False True True
                          7 True False False False False
                          8 False False False True False
                          9 False False True False False


                          Row-wise operations:



                          df.isna().sum(axis=1)
                          0 1
                          1 3
                          2 1
                          3 0
                          4 2
                          5 1
                          6 4
                          7 1
                          8 1
                          9 1


                          Column-wise:



                          df.isna().sum()
                          0 3
                          1 2
                          2 2
                          3 6
                          4 2


                          To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html






                          share|improve this answer













                          You can slice the columns and use df.isna().



                          df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):



                                    0         1         2         3         4
                          0 0.763847 1.343149 0.096778 NaN 0.532322
                          1 -0.364227 -0.560027 NaN NaN NaN
                          2 -0.556234 0.384970 0.476016 NaN -0.385282
                          3 0.604560 -0.390024 -1.697762 1.207321 0.829520
                          4 NaN NaN 0.754011 2.137359 -0.594698
                          5 0.513925 0.651509 -1.500094 NaN -0.556604
                          6 NaN NaN -1.388030 NaN NaN
                          7 NaN -0.634743 0.024213 -0.439684 0.765820
                          8 0.815948 0.545350 -0.823986 NaN 1.655538
                          9 0.687386 1.477326 NaN 0.207531 0.571499


                          output of df.isna():



                                 0      1      2      3      4
                          0 False False False True False
                          1 False False True True True
                          2 False False False True False
                          3 False False False False False
                          4 True True False False False
                          5 False False False True False
                          6 True True False True True
                          7 True False False False False
                          8 False False False True False
                          9 False False True False False


                          Row-wise operations:



                          df.isna().sum(axis=1)
                          0 1
                          1 3
                          2 1
                          3 0
                          4 2
                          5 1
                          6 4
                          7 1
                          8 1
                          9 1


                          Column-wise:



                          df.isna().sum()
                          0 3
                          1 2
                          2 2
                          3 6
                          4 2


                          To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 14 '18 at 3:59









                          EvanEvan

                          1,141516




                          1,141516






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292037%2fhow-does-one-determine-the-rows-that-have-nan-in-only-some-subset-of-columns%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              List item for chat from Array inside array React Native

                              Thiostrepton

                              Caerphilly