How does one determine the rows that have NaN in only some subset of columns?
Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.
I believe the following should work...
my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')
However, I am coming across the following exception
TypeError: unhashable type: 'numpy.ndarray'
Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.
Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?
Thanks.
python pandas
add a comment |
Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.
I believe the following should work...
my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')
However, I am coming across the following exception
TypeError: unhashable type: 'numpy.ndarray'
Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.
Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?
Thanks.
python pandas
add a comment |
Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.
I believe the following should work...
my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')
However, I am coming across the following exception
TypeError: unhashable type: 'numpy.ndarray'
Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.
Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?
Thanks.
python pandas
Given a DataFrame with possible NaN values, I'd like to determine which rows have NaN as a value but only for certain columns.
I believe the following should work...
my_df.query('colA.isnull() | colZ.isnull() | colN.isnull()')
However, I am coming across the following exception
TypeError: unhashable type: 'numpy.ndarray'
Now, I've determine that I can pass the param engine='python' to get the query to work. But, I'd like to use the optimized engine numexpr.
Is such a query possibly? Or do I have to iterate over each column I wish to filter on, one at a time?
Thanks.
python pandas
python pandas
asked Nov 14 '18 at 1:48
Spencer LeeSpencer Lee
32
32
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.
# Method 1: build the boolean mask using bitwise operations
mask = ((df['colA'].isnull()) |
(df['colZ'].isnull()) |
(df['colN'].isnull()))
null_rows = df[mask]
# Method 2: pick desired columns from an element-wise boolean mask of null flags
mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
null_rows = df[mask]
add a comment |
You can slice the columns and use df.isna().
df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):
0 1 2 3 4
0 0.763847 1.343149 0.096778 NaN 0.532322
1 -0.364227 -0.560027 NaN NaN NaN
2 -0.556234 0.384970 0.476016 NaN -0.385282
3 0.604560 -0.390024 -1.697762 1.207321 0.829520
4 NaN NaN 0.754011 2.137359 -0.594698
5 0.513925 0.651509 -1.500094 NaN -0.556604
6 NaN NaN -1.388030 NaN NaN
7 NaN -0.634743 0.024213 -0.439684 0.765820
8 0.815948 0.545350 -0.823986 NaN 1.655538
9 0.687386 1.477326 NaN 0.207531 0.571499
output of df.isna():
0 1 2 3 4
0 False False False True False
1 False False True True True
2 False False False True False
3 False False False False False
4 True True False False False
5 False False False True False
6 True True False True True
7 True False False False False
8 False False False True False
9 False False True False False
Row-wise operations:
df.isna().sum(axis=1)
0 1
1 3
2 1
3 0
4 2
5 1
6 4
7 1
8 1
9 1
Column-wise:
df.isna().sum()
0 3
1 2
2 2
3 6
4 2
To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292037%2fhow-does-one-determine-the-rows-that-have-nan-in-only-some-subset-of-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.
# Method 1: build the boolean mask using bitwise operations
mask = ((df['colA'].isnull()) |
(df['colZ'].isnull()) |
(df['colN'].isnull()))
null_rows = df[mask]
# Method 2: pick desired columns from an element-wise boolean mask of null flags
mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
null_rows = df[mask]
add a comment |
One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.
# Method 1: build the boolean mask using bitwise operations
mask = ((df['colA'].isnull()) |
(df['colZ'].isnull()) |
(df['colN'].isnull()))
null_rows = df[mask]
# Method 2: pick desired columns from an element-wise boolean mask of null flags
mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
null_rows = df[mask]
add a comment |
One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.
# Method 1: build the boolean mask using bitwise operations
mask = ((df['colA'].isnull()) |
(df['colZ'].isnull()) |
(df['colN'].isnull()))
null_rows = df[mask]
# Method 2: pick desired columns from an element-wise boolean mask of null flags
mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
null_rows = df[mask]
One approach is to build a boolean mask that picks out the row(s) on which any of your conditions is satisfied.
# Method 1: build the boolean mask using bitwise operations
mask = ((df['colA'].isnull()) |
(df['colZ'].isnull()) |
(df['colN'].isnull()))
null_rows = df[mask]
# Method 2: pick desired columns from an element-wise boolean mask of null flags
mask = df.isnull()[['colA', 'colZ', 'colN']].any(axis=1)
null_rows = df[mask]
answered Nov 14 '18 at 2:03
Peter LeimbiglerPeter Leimbigler
3,8881415
3,8881415
add a comment |
add a comment |
You can slice the columns and use df.isna().
df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):
0 1 2 3 4
0 0.763847 1.343149 0.096778 NaN 0.532322
1 -0.364227 -0.560027 NaN NaN NaN
2 -0.556234 0.384970 0.476016 NaN -0.385282
3 0.604560 -0.390024 -1.697762 1.207321 0.829520
4 NaN NaN 0.754011 2.137359 -0.594698
5 0.513925 0.651509 -1.500094 NaN -0.556604
6 NaN NaN -1.388030 NaN NaN
7 NaN -0.634743 0.024213 -0.439684 0.765820
8 0.815948 0.545350 -0.823986 NaN 1.655538
9 0.687386 1.477326 NaN 0.207531 0.571499
output of df.isna():
0 1 2 3 4
0 False False False True False
1 False False True True True
2 False False False True False
3 False False False False False
4 True True False False False
5 False False False True False
6 True True False True True
7 True False False False False
8 False False False True False
9 False False True False False
Row-wise operations:
df.isna().sum(axis=1)
0 1
1 3
2 1
3 0
4 2
5 1
6 4
7 1
8 1
9 1
Column-wise:
df.isna().sum()
0 3
1 2
2 2
3 6
4 2
To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html
add a comment |
You can slice the columns and use df.isna().
df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):
0 1 2 3 4
0 0.763847 1.343149 0.096778 NaN 0.532322
1 -0.364227 -0.560027 NaN NaN NaN
2 -0.556234 0.384970 0.476016 NaN -0.385282
3 0.604560 -0.390024 -1.697762 1.207321 0.829520
4 NaN NaN 0.754011 2.137359 -0.594698
5 0.513925 0.651509 -1.500094 NaN -0.556604
6 NaN NaN -1.388030 NaN NaN
7 NaN -0.634743 0.024213 -0.439684 0.765820
8 0.815948 0.545350 -0.823986 NaN 1.655538
9 0.687386 1.477326 NaN 0.207531 0.571499
output of df.isna():
0 1 2 3 4
0 False False False True False
1 False False True True True
2 False False False True False
3 False False False False False
4 True True False False False
5 False False False True False
6 True True False True True
7 True False False False False
8 False False False True False
9 False False True False False
Row-wise operations:
df.isna().sum(axis=1)
0 1
1 3
2 1
3 0
4 2
5 1
6 4
7 1
8 1
9 1
Column-wise:
df.isna().sum()
0 3
1 2
2 2
3 6
4 2
To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html
add a comment |
You can slice the columns and use df.isna().
df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):
0 1 2 3 4
0 0.763847 1.343149 0.096778 NaN 0.532322
1 -0.364227 -0.560027 NaN NaN NaN
2 -0.556234 0.384970 0.476016 NaN -0.385282
3 0.604560 -0.390024 -1.697762 1.207321 0.829520
4 NaN NaN 0.754011 2.137359 -0.594698
5 0.513925 0.651509 -1.500094 NaN -0.556604
6 NaN NaN -1.388030 NaN NaN
7 NaN -0.634743 0.024213 -0.439684 0.765820
8 0.815948 0.545350 -0.823986 NaN 1.655538
9 0.687386 1.477326 NaN 0.207531 0.571499
output of df.isna():
0 1 2 3 4
0 False False False True False
1 False False True True True
2 False False False True False
3 False False False False False
4 True True False False False
5 False False False True False
6 True True False True True
7 True False False False False
8 False False False True False
9 False False True False False
Row-wise operations:
df.isna().sum(axis=1)
0 1
1 3
2 1
3 0
4 2
5 1
6 4
7 1
8 1
9 1
Column-wise:
df.isna().sum()
0 3
1 2
2 2
3 6
4 2
To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html
You can slice the columns and use df.isna().
df (generated using code I copied from somewhere else on SO earlier today, sorry I forget where, but thank you!):
0 1 2 3 4
0 0.763847 1.343149 0.096778 NaN 0.532322
1 -0.364227 -0.560027 NaN NaN NaN
2 -0.556234 0.384970 0.476016 NaN -0.385282
3 0.604560 -0.390024 -1.697762 1.207321 0.829520
4 NaN NaN 0.754011 2.137359 -0.594698
5 0.513925 0.651509 -1.500094 NaN -0.556604
6 NaN NaN -1.388030 NaN NaN
7 NaN -0.634743 0.024213 -0.439684 0.765820
8 0.815948 0.545350 -0.823986 NaN 1.655538
9 0.687386 1.477326 NaN 0.207531 0.571499
output of df.isna():
0 1 2 3 4
0 False False False True False
1 False False True True True
2 False False False True False
3 False False False False False
4 True True False False False
5 False False False True False
6 True True False True True
7 True False False False False
8 False False False True False
9 False False True False False
Row-wise operations:
df.isna().sum(axis=1)
0 1
1 3
2 1
3 0
4 2
5 1
6 4
7 1
8 1
9 1
Column-wise:
df.isna().sum()
0 3
1 2
2 2
3 6
4 2
To slice the df, use something like df.loc[:, 0:2].isna(). You can read up on slicing, .loc, and .iloc here: https://pandas.pydata.org/pandas-docs/stable/indexing.html
answered Nov 14 '18 at 3:59
EvanEvan
1,141516
1,141516
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292037%2fhow-does-one-determine-the-rows-that-have-nan-in-only-some-subset-of-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown