include NAs as factor in seaborn boxplot





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















Can I show missing data as extra factor in seaborn? Googling for a while now.



This is the simple code I am using:



ax = sns.boxplot(data=df, x=x, y=y)


There is an option such as dropna for value_counts:



df['bla'].value_counts(dropna = False)


but I could not find it for boxplots. Thanks.










share|improve this question































    1















    Can I show missing data as extra factor in seaborn? Googling for a while now.



    This is the simple code I am using:



    ax = sns.boxplot(data=df, x=x, y=y)


    There is an option such as dropna for value_counts:



    df['bla'].value_counts(dropna = False)


    but I could not find it for boxplots. Thanks.










    share|improve this question



























      1












      1








      1








      Can I show missing data as extra factor in seaborn? Googling for a while now.



      This is the simple code I am using:



      ax = sns.boxplot(data=df, x=x, y=y)


      There is an option such as dropna for value_counts:



      df['bla'].value_counts(dropna = False)


      but I could not find it for boxplots. Thanks.










      share|improve this question
















      Can I show missing data as extra factor in seaborn? Googling for a while now.



      This is the simple code I am using:



      ax = sns.boxplot(data=df, x=x, y=y)


      There is an option such as dropna for value_counts:



      df['bla'].value_counts(dropna = False)


      but I could not find it for boxplots. Thanks.







      python seaborn






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 14:16







      cs0815

















      asked Nov 16 '18 at 13:12









      cs0815cs0815

      5,4251979207




      5,4251979207
























          1 Answer
          1






          active

          oldest

          votes


















          1














          No you can't.
          At least, not directly with seaborn.



          Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py



          box_data = remove_na(group_data)


          The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.



          Then I would do 2 subplots :
          - a counplot that shows the nb of valid/invalid data for the column you are focusing on
          - some convential seaborn plot based on that column



          Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
          Similar could be done for barplots.



          Another approach is to use the value_count intel and add it to plot as an annotation



          Example:



          import seaborn as sns
          import numpy as np
          import matplotlib.pyplot as plt
          import pandas as pd

          def custom(val):
          if val >= 0.0:
          return np.NaN
          return val

          df = pd.DataFrame(np.random.randn(500, 3))
          df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
          df['four'] = 'bar'
          df['five'] = df['col_1'] > 0
          df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
          df['col_3'] = df['col_1'].apply(custom)
          df['is_col_3_na'] = pd.isna(df['col_3'])

          fig, (ax1, ax2) = plt.subplots(1, 2)
          validdf = df[(df['is_col_3_na'] == False)].copy()

          sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
          sns.boxplot(data=validdf, x='category', y='col_3',
          #hue="category",
          ax=ax2)

          print(df['is_col_3_na'].describe())
          print(df['is_col_3_na'].value_counts())

          # start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
          # with proper modifications
          # Calculate number of obs per group & median to position labels
          medians = validdf.groupby(['category'])['col_3'].median().values
          nobs = validdf['category'].value_counts().values
          nobs = [str(x) for x in nobs.tolist()]
          nobs = ["n: " + i for i in nobs]

          # Add it to the plot
          pos = range(len(nobs))
          for tick, label in zip(pos, ax2.get_xticklabels()):
          ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
          horizontalalignment='center', size='x-small', color='b', weight='semibold')
          # end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
          plt.show()


          Output:



          enter image description here



          Console prints (concerning the column 'col_3'):



          count      500
          unique 2
          top True
          freq 254
          Name: is_col_3_na, dtype: object

          True 254
          False 246
          Name: is_col_3_na, dtype: int64





          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338630%2finclude-nas-as-factor-in-seaborn-boxplot%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            No you can't.
            At least, not directly with seaborn.



            Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py



            box_data = remove_na(group_data)


            The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.



            Then I would do 2 subplots :
            - a counplot that shows the nb of valid/invalid data for the column you are focusing on
            - some convential seaborn plot based on that column



            Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
            Similar could be done for barplots.



            Another approach is to use the value_count intel and add it to plot as an annotation



            Example:



            import seaborn as sns
            import numpy as np
            import matplotlib.pyplot as plt
            import pandas as pd

            def custom(val):
            if val >= 0.0:
            return np.NaN
            return val

            df = pd.DataFrame(np.random.randn(500, 3))
            df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
            df['four'] = 'bar'
            df['five'] = df['col_1'] > 0
            df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
            df['col_3'] = df['col_1'].apply(custom)
            df['is_col_3_na'] = pd.isna(df['col_3'])

            fig, (ax1, ax2) = plt.subplots(1, 2)
            validdf = df[(df['is_col_3_na'] == False)].copy()

            sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
            sns.boxplot(data=validdf, x='category', y='col_3',
            #hue="category",
            ax=ax2)

            print(df['is_col_3_na'].describe())
            print(df['is_col_3_na'].value_counts())

            # start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
            # with proper modifications
            # Calculate number of obs per group & median to position labels
            medians = validdf.groupby(['category'])['col_3'].median().values
            nobs = validdf['category'].value_counts().values
            nobs = [str(x) for x in nobs.tolist()]
            nobs = ["n: " + i for i in nobs]

            # Add it to the plot
            pos = range(len(nobs))
            for tick, label in zip(pos, ax2.get_xticklabels()):
            ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
            horizontalalignment='center', size='x-small', color='b', weight='semibold')
            # end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
            plt.show()


            Output:



            enter image description here



            Console prints (concerning the column 'col_3'):



            count      500
            unique 2
            top True
            freq 254
            Name: is_col_3_na, dtype: object

            True 254
            False 246
            Name: is_col_3_na, dtype: int64





            share|improve this answer




























              1














              No you can't.
              At least, not directly with seaborn.



              Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py



              box_data = remove_na(group_data)


              The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.



              Then I would do 2 subplots :
              - a counplot that shows the nb of valid/invalid data for the column you are focusing on
              - some convential seaborn plot based on that column



              Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
              Similar could be done for barplots.



              Another approach is to use the value_count intel and add it to plot as an annotation



              Example:



              import seaborn as sns
              import numpy as np
              import matplotlib.pyplot as plt
              import pandas as pd

              def custom(val):
              if val >= 0.0:
              return np.NaN
              return val

              df = pd.DataFrame(np.random.randn(500, 3))
              df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
              df['four'] = 'bar'
              df['five'] = df['col_1'] > 0
              df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
              df['col_3'] = df['col_1'].apply(custom)
              df['is_col_3_na'] = pd.isna(df['col_3'])

              fig, (ax1, ax2) = plt.subplots(1, 2)
              validdf = df[(df['is_col_3_na'] == False)].copy()

              sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
              sns.boxplot(data=validdf, x='category', y='col_3',
              #hue="category",
              ax=ax2)

              print(df['is_col_3_na'].describe())
              print(df['is_col_3_na'].value_counts())

              # start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
              # with proper modifications
              # Calculate number of obs per group & median to position labels
              medians = validdf.groupby(['category'])['col_3'].median().values
              nobs = validdf['category'].value_counts().values
              nobs = [str(x) for x in nobs.tolist()]
              nobs = ["n: " + i for i in nobs]

              # Add it to the plot
              pos = range(len(nobs))
              for tick, label in zip(pos, ax2.get_xticklabels()):
              ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
              horizontalalignment='center', size='x-small', color='b', weight='semibold')
              # end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
              plt.show()


              Output:



              enter image description here



              Console prints (concerning the column 'col_3'):



              count      500
              unique 2
              top True
              freq 254
              Name: is_col_3_na, dtype: object

              True 254
              False 246
              Name: is_col_3_na, dtype: int64





              share|improve this answer


























                1












                1








                1







                No you can't.
                At least, not directly with seaborn.



                Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py



                box_data = remove_na(group_data)


                The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.



                Then I would do 2 subplots :
                - a counplot that shows the nb of valid/invalid data for the column you are focusing on
                - some convential seaborn plot based on that column



                Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
                Similar could be done for barplots.



                Another approach is to use the value_count intel and add it to plot as an annotation



                Example:



                import seaborn as sns
                import numpy as np
                import matplotlib.pyplot as plt
                import pandas as pd

                def custom(val):
                if val >= 0.0:
                return np.NaN
                return val

                df = pd.DataFrame(np.random.randn(500, 3))
                df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
                df['four'] = 'bar'
                df['five'] = df['col_1'] > 0
                df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
                df['col_3'] = df['col_1'].apply(custom)
                df['is_col_3_na'] = pd.isna(df['col_3'])

                fig, (ax1, ax2) = plt.subplots(1, 2)
                validdf = df[(df['is_col_3_na'] == False)].copy()

                sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
                sns.boxplot(data=validdf, x='category', y='col_3',
                #hue="category",
                ax=ax2)

                print(df['is_col_3_na'].describe())
                print(df['is_col_3_na'].value_counts())

                # start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
                # with proper modifications
                # Calculate number of obs per group & median to position labels
                medians = validdf.groupby(['category'])['col_3'].median().values
                nobs = validdf['category'].value_counts().values
                nobs = [str(x) for x in nobs.tolist()]
                nobs = ["n: " + i for i in nobs]

                # Add it to the plot
                pos = range(len(nobs))
                for tick, label in zip(pos, ax2.get_xticklabels()):
                ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
                horizontalalignment='center', size='x-small', color='b', weight='semibold')
                # end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
                plt.show()


                Output:



                enter image description here



                Console prints (concerning the column 'col_3'):



                count      500
                unique 2
                top True
                freq 254
                Name: is_col_3_na, dtype: object

                True 254
                False 246
                Name: is_col_3_na, dtype: int64





                share|improve this answer













                No you can't.
                At least, not directly with seaborn.



                Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py



                box_data = remove_na(group_data)


                The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.



                Then I would do 2 subplots :
                - a counplot that shows the nb of valid/invalid data for the column you are focusing on
                - some convential seaborn plot based on that column



                Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
                Similar could be done for barplots.



                Another approach is to use the value_count intel and add it to plot as an annotation



                Example:



                import seaborn as sns
                import numpy as np
                import matplotlib.pyplot as plt
                import pandas as pd

                def custom(val):
                if val >= 0.0:
                return np.NaN
                return val

                df = pd.DataFrame(np.random.randn(500, 3))
                df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
                df['four'] = 'bar'
                df['five'] = df['col_1'] > 0
                df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
                df['col_3'] = df['col_1'].apply(custom)
                df['is_col_3_na'] = pd.isna(df['col_3'])

                fig, (ax1, ax2) = plt.subplots(1, 2)
                validdf = df[(df['is_col_3_na'] == False)].copy()

                sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
                sns.boxplot(data=validdf, x='category', y='col_3',
                #hue="category",
                ax=ax2)

                print(df['is_col_3_na'].describe())
                print(df['is_col_3_na'].value_counts())

                # start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
                # with proper modifications
                # Calculate number of obs per group & median to position labels
                medians = validdf.groupby(['category'])['col_3'].median().values
                nobs = validdf['category'].value_counts().values
                nobs = [str(x) for x in nobs.tolist()]
                nobs = ["n: " + i for i in nobs]

                # Add it to the plot
                pos = range(len(nobs))
                for tick, label in zip(pos, ax2.get_xticklabels()):
                ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
                horizontalalignment='center', size='x-small', color='b', weight='semibold')
                # end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
                plt.show()


                Output:



                enter image description here



                Console prints (concerning the column 'col_3'):



                count      500
                unique 2
                top True
                freq 254
                Name: is_col_3_na, dtype: object

                True 254
                False 246
                Name: is_col_3_na, dtype: int64






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Mar 13 at 0:44









                LoneWandererLoneWanderer

                1,226925




                1,226925
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338630%2finclude-nas-as-factor-in-seaborn-boxplot%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    List item for chat from Array inside array React Native

                    Thiostrepton

                    Caerphilly