understand df.isnull.mean() in python












0















I have a dataframe df. Code is written in such a manner



df.isnull().mean().sort_values(ascending = False)



Here is the some part of the output-



inq_fi                                 1.0
sec_app_fico_range_low 1.0


I want to understand how it is working?



if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.



Also we are not passing by in sort_values?










share|improve this question

























  • Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

    – user3483203
    Nov 13 '18 at 14:27


















0















I have a dataframe df. Code is written in such a manner



df.isnull().mean().sort_values(ascending = False)



Here is the some part of the output-



inq_fi                                 1.0
sec_app_fico_range_low 1.0


I want to understand how it is working?



if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.



Also we are not passing by in sort_values?










share|improve this question

























  • Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

    – user3483203
    Nov 13 '18 at 14:27
















0












0








0








I have a dataframe df. Code is written in such a manner



df.isnull().mean().sort_values(ascending = False)



Here is the some part of the output-



inq_fi                                 1.0
sec_app_fico_range_low 1.0


I want to understand how it is working?



if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.



Also we are not passing by in sort_values?










share|improve this question
















I have a dataframe df. Code is written in such a manner



df.isnull().mean().sort_values(ascending = False)



Here is the some part of the output-



inq_fi                                 1.0
sec_app_fico_range_low 1.0


I want to understand how it is working?



if we use, df.isnull() only it will return us True or False for each and every cell. How mean() is going to give us right output. My objective is to find percentage of null values in all columns. Above output represents inq_fi and sec_app_fico_range_low has all missing values.



Also we are not passing by in sort_values?







python python-3.x pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 '18 at 15:18









Jean-François Corbett

28.6k22109159




28.6k22109159










asked Nov 13 '18 at 14:22









yashulyashul

197




197













  • Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

    – user3483203
    Nov 13 '18 at 14:27





















  • Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

    – user3483203
    Nov 13 '18 at 14:27



















Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

– user3483203
Nov 13 '18 at 14:27







Think about what mean is actually doing. It simply returns the sum of the values divided by the number of values. When your values are 0-1 (boolean), it returns the percentage of 1s. Try it on a series with only a couple elements and you'll quickly see the behavior

– user3483203
Nov 13 '18 at 14:27














1 Answer
1






active

oldest

votes


















1














Breakdown would look like this:



df.isnull()
#Mask all values that are NaN as True
df.isnull().mean()
#compute the mean of Boolean mask (True evaluates as 1 and False as 0)
df.isnull().mean().sort_values(ascending = False)
#sort the resulting series by column names descending


That being said a column that has values:



[np.nan, 2, 3, 4]


is evaluated as:



[True, False, False, False]


interpreted as:



[1, 0, 0, 0]


Resulting in:



0.25





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283142%2funderstand-df-isnull-mean-in-python%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Breakdown would look like this:



    df.isnull()
    #Mask all values that are NaN as True
    df.isnull().mean()
    #compute the mean of Boolean mask (True evaluates as 1 and False as 0)
    df.isnull().mean().sort_values(ascending = False)
    #sort the resulting series by column names descending


    That being said a column that has values:



    [np.nan, 2, 3, 4]


    is evaluated as:



    [True, False, False, False]


    interpreted as:



    [1, 0, 0, 0]


    Resulting in:



    0.25





    share|improve this answer




























      1














      Breakdown would look like this:



      df.isnull()
      #Mask all values that are NaN as True
      df.isnull().mean()
      #compute the mean of Boolean mask (True evaluates as 1 and False as 0)
      df.isnull().mean().sort_values(ascending = False)
      #sort the resulting series by column names descending


      That being said a column that has values:



      [np.nan, 2, 3, 4]


      is evaluated as:



      [True, False, False, False]


      interpreted as:



      [1, 0, 0, 0]


      Resulting in:



      0.25





      share|improve this answer


























        1












        1








        1







        Breakdown would look like this:



        df.isnull()
        #Mask all values that are NaN as True
        df.isnull().mean()
        #compute the mean of Boolean mask (True evaluates as 1 and False as 0)
        df.isnull().mean().sort_values(ascending = False)
        #sort the resulting series by column names descending


        That being said a column that has values:



        [np.nan, 2, 3, 4]


        is evaluated as:



        [True, False, False, False]


        interpreted as:



        [1, 0, 0, 0]


        Resulting in:



        0.25





        share|improve this answer













        Breakdown would look like this:



        df.isnull()
        #Mask all values that are NaN as True
        df.isnull().mean()
        #compute the mean of Boolean mask (True evaluates as 1 and False as 0)
        df.isnull().mean().sort_values(ascending = False)
        #sort the resulting series by column names descending


        That being said a column that has values:



        [np.nan, 2, 3, 4]


        is evaluated as:



        [True, False, False, False]


        interpreted as:



        [1, 0, 0, 0]


        Resulting in:



        0.25






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 13 '18 at 14:36









        zipazipa

        16k31437




        16k31437






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283142%2funderstand-df-isnull-mean-in-python%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            List item for chat from Array inside array React Native

            Thiostrepton

            Caerphilly