understand df.isnull.mean() in python

I have a dataframe df. Code is written in such a manner

df.isnull().mean().sort_values(ascending = False)

Here is the some part of the output-

inq_fi                                 1.0

sec_app_fico_range_low                 1.0

I want to understand how it is working?

if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.

Also we are not passing by in sort_values?

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

asked Nov 13 '18 at 14:22

yashul

197

Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

– user3483203
Nov 13 '18 at 14:27

add a comment |

I have a dataframe df. Code is written in such a manner

df.isnull().mean().sort_values(ascending = False)

Here is the some part of the output-

inq_fi                                 1.0

sec_app_fico_range_low                 1.0

I want to understand how it is working?

Also we are not passing by in sort_values?

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

asked Nov 13 '18 at 14:22

yashul

197

Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

– user3483203
Nov 13 '18 at 14:27

add a comment |

I have a dataframe df. Code is written in such a manner

df.isnull().mean().sort_values(ascending = False)

Here is the some part of the output-

inq_fi                                 1.0

sec_app_fico_range_low                 1.0

I want to understand how it is working?

Also we are not passing by in sort_values?

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

asked Nov 13 '18 at 14:22

yashul

197

I have a dataframe df. Code is written in such a manner

df.isnull().mean().sort_values(ascending = False)

Here is the some part of the output-

inq_fi                                 1.0

sec_app_fico_range_low                 1.0

I want to understand how it is working?

Also we are not passing by in sort_values?

python python-3.x pandas

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

asked Nov 13 '18 at 14:22

yashul

197

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

asked Nov 13 '18 at 14:22

yashul

197

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

edited Nov 13 '18 at 15:18

Jean-François Corbett

28.6k22109159

asked Nov 13 '18 at 14:22

yashul

197

asked Nov 13 '18 at 14:22

yashul

197

asked Nov 13 '18 at 14:22

yashul

197

Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

– user3483203
Nov 13 '18 at 14:27

add a comment |

Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

– user3483203
Nov 13 '18 at 14:27

Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

– user3483203
Nov 13 '18 at 14:27

add a comment |

1 Answer
1

active

oldest

votes

Breakdown would look like this:

df.isnull()

#Mask all values that are NaN as True

df.isnull().mean()

#compute the mean of Boolean mask (True evaluates as 1 and False as 0)

df.isnull().mean().sort_values(ascending = False)

#sort the resulting series by column names descending

That being said a column that has values:

[np.nan, 2, 3, 4]

is evaluated as:

[True, False, False, False]

interpreted as:

[1, 0, 0, 0]

Resulting in:

0.25

answered Nov 13 '18 at 14:36

zipa

16k31437

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283142%2funderstand-df-isnull-mean-in-python%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Breakdown would look like this:

df.isnull()

#Mask all values that are NaN as True

df.isnull().mean()

#compute the mean of Boolean mask (True evaluates as 1 and False as 0)

df.isnull().mean().sort_values(ascending = False)

#sort the resulting series by column names descending

That being said a column that has values:

[np.nan, 2, 3, 4]

is evaluated as:

[True, False, False, False]

interpreted as:

[1, 0, 0, 0]

Resulting in:

0.25

answered Nov 13 '18 at 14:36

zipa

16k31437

add a comment |

Breakdown would look like this:

df.isnull()

#Mask all values that are NaN as True

df.isnull().mean()

#compute the mean of Boolean mask (True evaluates as 1 and False as 0)

df.isnull().mean().sort_values(ascending = False)

#sort the resulting series by column names descending

That being said a column that has values:

[np.nan, 2, 3, 4]

is evaluated as:

[True, False, False, False]

interpreted as:

[1, 0, 0, 0]

Resulting in:

0.25

answered Nov 13 '18 at 14:36

zipa

16k31437

add a comment |

Breakdown would look like this:

df.isnull()

#Mask all values that are NaN as True

df.isnull().mean()

#compute the mean of Boolean mask (True evaluates as 1 and False as 0)

df.isnull().mean().sort_values(ascending = False)

#sort the resulting series by column names descending

That being said a column that has values:

[np.nan, 2, 3, 4]

is evaluated as:

[True, False, False, False]

interpreted as:

[1, 0, 0, 0]

Resulting in:

0.25

answered Nov 13 '18 at 14:36

zipa

16k31437

Breakdown would look like this:

df.isnull()

#Mask all values that are NaN as True

df.isnull().mean()

#compute the mean of Boolean mask (True evaluates as 1 and False as 0)

df.isnull().mean().sort_values(ascending = False)

#sort the resulting series by column names descending

That being said a column that has values:

[np.nan, 2, 3, 4]

is evaluated as:

[True, False, False, False]

interpreted as:

[1, 0, 0, 0]

Resulting in:

0.25

answered Nov 13 '18 at 14:36

zipa

16k31437

answered Nov 13 '18 at 14:36

zipa

16k31437

answered Nov 13 '18 at 14:36

zipa

16k31437

answered Nov 13 '18 at 14:36

zipa

16k31437

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky