understand df.isnull.mean() in python
I have a dataframe df. Code is written in such a manner
df.isnull().mean().sort_values(ascending = False)
Here is the some part of the output-
inq_fi 1.0
sec_app_fico_range_low 1.0
I want to understand how it is working?
if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.
Also we are not passing by in sort_values?
python python-3.x pandas
add a comment |
I have a dataframe df. Code is written in such a manner
df.isnull().mean().sort_values(ascending = False)
Here is the some part of the output-
inq_fi 1.0
sec_app_fico_range_low 1.0
I want to understand how it is working?
if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.
Also we are not passing by in sort_values?
python python-3.x pandas
Think about whatmeanis actually doing. It simply returns the sum of the values divided by the number of values. When your values are0-1(boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior
– user3483203
Nov 13 '18 at 14:27
add a comment |
I have a dataframe df. Code is written in such a manner
df.isnull().mean().sort_values(ascending = False)
Here is the some part of the output-
inq_fi 1.0
sec_app_fico_range_low 1.0
I want to understand how it is working?
if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.
Also we are not passing by in sort_values?
python python-3.x pandas
I have a dataframe df. Code is written in such a manner
df.isnull().mean().sort_values(ascending = False)
Here is the some part of the output-
inq_fi 1.0
sec_app_fico_range_low 1.0
I want to understand how it is working?
if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.
Also we are not passing by in sort_values?
python python-3.x pandas
python python-3.x pandas
edited Nov 13 '18 at 15:18
Jean-François Corbett
28.6k22109159
28.6k22109159
asked Nov 13 '18 at 14:22
yashulyashul
197
197
Think about whatmeanis actually doing. It simply returns the sum of the values divided by the number of values. When your values are0-1(boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior
– user3483203
Nov 13 '18 at 14:27
add a comment |
Think about whatmeanis actually doing. It simply returns the sum of the values divided by the number of values. When your values are0-1(boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior
– user3483203
Nov 13 '18 at 14:27
Think about what
mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior– user3483203
Nov 13 '18 at 14:27
Think about what
mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior– user3483203
Nov 13 '18 at 14:27
add a comment |
1 Answer
1
active
oldest
votes
Breakdown would look like this:
df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending
That being said a column that has values:
[np.nan, 2, 3, 4]
is evaluated as:
[True, False, False, False]
interpreted as:
[1, 0, 0, 0]
Resulting in:
0.25
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283142%2funderstand-df-isnull-mean-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Breakdown would look like this:
df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending
That being said a column that has values:
[np.nan, 2, 3, 4]
is evaluated as:
[True, False, False, False]
interpreted as:
[1, 0, 0, 0]
Resulting in:
0.25
add a comment |
Breakdown would look like this:
df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending
That being said a column that has values:
[np.nan, 2, 3, 4]
is evaluated as:
[True, False, False, False]
interpreted as:
[1, 0, 0, 0]
Resulting in:
0.25
add a comment |
Breakdown would look like this:
df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending
That being said a column that has values:
[np.nan, 2, 3, 4]
is evaluated as:
[True, False, False, False]
interpreted as:
[1, 0, 0, 0]
Resulting in:
0.25
Breakdown would look like this:
df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending
That being said a column that has values:
[np.nan, 2, 3, 4]
is evaluated as:
[True, False, False, False]
interpreted as:
[1, 0, 0, 0]
Resulting in:
0.25
answered Nov 13 '18 at 14:36
zipazipa
16k31437
16k31437
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283142%2funderstand-df-isnull-mean-in-python%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Think about what
meanis actually doing. It simply returns the sum of the values divided by the number of values. When your values are0-1(boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior– user3483203
Nov 13 '18 at 14:27