include NAs as factor in seaborn boxplot
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
Can I show missing data as extra factor in seaborn? Googling for a while now.
This is the simple code I am using:
ax = sns.boxplot(data=df, x=x, y=y)
There is an option such as dropna for value_counts:
df['bla'].value_counts(dropna = False)
but I could not find it for boxplots. Thanks.
python seaborn
add a comment |
Can I show missing data as extra factor in seaborn? Googling for a while now.
This is the simple code I am using:
ax = sns.boxplot(data=df, x=x, y=y)
There is an option such as dropna for value_counts:
df['bla'].value_counts(dropna = False)
but I could not find it for boxplots. Thanks.
python seaborn
add a comment |
Can I show missing data as extra factor in seaborn? Googling for a while now.
This is the simple code I am using:
ax = sns.boxplot(data=df, x=x, y=y)
There is an option such as dropna for value_counts:
df['bla'].value_counts(dropna = False)
but I could not find it for boxplots. Thanks.
python seaborn
Can I show missing data as extra factor in seaborn? Googling for a while now.
This is the simple code I am using:
ax = sns.boxplot(data=df, x=x, y=y)
There is an option such as dropna for value_counts:
df['bla'].value_counts(dropna = False)
but I could not find it for boxplots. Thanks.
python seaborn
python seaborn
edited Nov 16 '18 at 14:16
cs0815
asked Nov 16 '18 at 13:12
cs0815cs0815
5,4251979207
5,4251979207
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
No you can't.
At least, not directly with seaborn.
Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py
box_data = remove_na(group_data)
The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.
Then I would do 2 subplots :
- a counplot that shows the nb of valid/invalid data for the column you are focusing on
- some convential seaborn plot based on that column
Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
Similar could be done for barplots.
Another approach is to use the value_count intel and add it to plot as an annotation
Example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def custom(val):
if val >= 0.0:
return np.NaN
return val
df = pd.DataFrame(np.random.randn(500, 3))
df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
df['four'] = 'bar'
df['five'] = df['col_1'] > 0
df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
df['col_3'] = df['col_1'].apply(custom)
df['is_col_3_na'] = pd.isna(df['col_3'])
fig, (ax1, ax2) = plt.subplots(1, 2)
validdf = df[(df['is_col_3_na'] == False)].copy()
sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
sns.boxplot(data=validdf, x='category', y='col_3',
#hue="category",
ax=ax2)
print(df['is_col_3_na'].describe())
print(df['is_col_3_na'].value_counts())
# start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
# with proper modifications
# Calculate number of obs per group & median to position labels
medians = validdf.groupby(['category'])['col_3'].median().values
nobs = validdf['category'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
# Add it to the plot
pos = range(len(nobs))
for tick, label in zip(pos, ax2.get_xticklabels()):
ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
horizontalalignment='center', size='x-small', color='b', weight='semibold')
# end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
plt.show()
Output:

Console prints (concerning the column 'col_3'):
count 500
unique 2
top True
freq 254
Name: is_col_3_na, dtype: object
True 254
False 246
Name: is_col_3_na, dtype: int64
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338630%2finclude-nas-as-factor-in-seaborn-boxplot%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
No you can't.
At least, not directly with seaborn.
Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py
box_data = remove_na(group_data)
The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.
Then I would do 2 subplots :
- a counplot that shows the nb of valid/invalid data for the column you are focusing on
- some convential seaborn plot based on that column
Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
Similar could be done for barplots.
Another approach is to use the value_count intel and add it to plot as an annotation
Example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def custom(val):
if val >= 0.0:
return np.NaN
return val
df = pd.DataFrame(np.random.randn(500, 3))
df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
df['four'] = 'bar'
df['five'] = df['col_1'] > 0
df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
df['col_3'] = df['col_1'].apply(custom)
df['is_col_3_na'] = pd.isna(df['col_3'])
fig, (ax1, ax2) = plt.subplots(1, 2)
validdf = df[(df['is_col_3_na'] == False)].copy()
sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
sns.boxplot(data=validdf, x='category', y='col_3',
#hue="category",
ax=ax2)
print(df['is_col_3_na'].describe())
print(df['is_col_3_na'].value_counts())
# start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
# with proper modifications
# Calculate number of obs per group & median to position labels
medians = validdf.groupby(['category'])['col_3'].median().values
nobs = validdf['category'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
# Add it to the plot
pos = range(len(nobs))
for tick, label in zip(pos, ax2.get_xticklabels()):
ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
horizontalalignment='center', size='x-small', color='b', weight='semibold')
# end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
plt.show()
Output:

Console prints (concerning the column 'col_3'):
count 500
unique 2
top True
freq 254
Name: is_col_3_na, dtype: object
True 254
False 246
Name: is_col_3_na, dtype: int64
add a comment |
No you can't.
At least, not directly with seaborn.
Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py
box_data = remove_na(group_data)
The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.
Then I would do 2 subplots :
- a counplot that shows the nb of valid/invalid data for the column you are focusing on
- some convential seaborn plot based on that column
Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
Similar could be done for barplots.
Another approach is to use the value_count intel and add it to plot as an annotation
Example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def custom(val):
if val >= 0.0:
return np.NaN
return val
df = pd.DataFrame(np.random.randn(500, 3))
df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
df['four'] = 'bar'
df['five'] = df['col_1'] > 0
df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
df['col_3'] = df['col_1'].apply(custom)
df['is_col_3_na'] = pd.isna(df['col_3'])
fig, (ax1, ax2) = plt.subplots(1, 2)
validdf = df[(df['is_col_3_na'] == False)].copy()
sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
sns.boxplot(data=validdf, x='category', y='col_3',
#hue="category",
ax=ax2)
print(df['is_col_3_na'].describe())
print(df['is_col_3_na'].value_counts())
# start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
# with proper modifications
# Calculate number of obs per group & median to position labels
medians = validdf.groupby(['category'])['col_3'].median().values
nobs = validdf['category'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
# Add it to the plot
pos = range(len(nobs))
for tick, label in zip(pos, ax2.get_xticklabels()):
ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
horizontalalignment='center', size='x-small', color='b', weight='semibold')
# end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
plt.show()
Output:

Console prints (concerning the column 'col_3'):
count 500
unique 2
top True
freq 254
Name: is_col_3_na, dtype: object
True 254
False 246
Name: is_col_3_na, dtype: int64
add a comment |
No you can't.
At least, not directly with seaborn.
Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py
box_data = remove_na(group_data)
The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.
Then I would do 2 subplots :
- a counplot that shows the nb of valid/invalid data for the column you are focusing on
- some convential seaborn plot based on that column
Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
Similar could be done for barplots.
Another approach is to use the value_count intel and add it to plot as an annotation
Example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def custom(val):
if val >= 0.0:
return np.NaN
return val
df = pd.DataFrame(np.random.randn(500, 3))
df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
df['four'] = 'bar'
df['five'] = df['col_1'] > 0
df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
df['col_3'] = df['col_1'].apply(custom)
df['is_col_3_na'] = pd.isna(df['col_3'])
fig, (ax1, ax2) = plt.subplots(1, 2)
validdf = df[(df['is_col_3_na'] == False)].copy()
sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
sns.boxplot(data=validdf, x='category', y='col_3',
#hue="category",
ax=ax2)
print(df['is_col_3_na'].describe())
print(df['is_col_3_na'].value_counts())
# start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
# with proper modifications
# Calculate number of obs per group & median to position labels
medians = validdf.groupby(['category'])['col_3'].median().values
nobs = validdf['category'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
# Add it to the plot
pos = range(len(nobs))
for tick, label in zip(pos, ax2.get_xticklabels()):
ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
horizontalalignment='center', size='x-small', color='b', weight='semibold')
# end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
plt.show()
Output:

Console prints (concerning the column 'col_3'):
count 500
unique 2
top True
freq 254
Name: is_col_3_na, dtype: object
True 254
False 246
Name: is_col_3_na, dtype: int64
No you can't.
At least, not directly with seaborn.
Issues related to NaN values have been opened in seaborn for lineplot, or pairplot. However a ticket from 2014 seems to indicate that seaborn ignores missing values starting from 0.4. It can be confirmed from seaborn's source code categorical.py
box_data = remove_na(group_data)
The best I could come up with is to create an extra categorical column that expresses the valid/invalid column data status.
Then I would do 2 subplots :
- a counplot that shows the nb of valid/invalid data for the column you are focusing on
- some convential seaborn plot based on that column
Additionnaly, it is possible to access the boxplots in order to show the nb of points taken into account for each boxplot.
Similar could be done for barplots.
Another approach is to use the value_count intel and add it to plot as an annotation
Example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def custom(val):
if val >= 0.0:
return np.NaN
return val
df = pd.DataFrame(np.random.randn(500, 3))
df = df.rename(index=int, columns={0: 'col_1', 1: 'col_2', 2: 'col_3'})
df['four'] = 'bar'
df['five'] = df['col_1'] > 0
df['category'] = pd.cut(df['col_2'], bins=3, labels=['titi', 'tata', 'toto'])
df['col_3'] = df['col_1'].apply(custom)
df['is_col_3_na'] = pd.isna(df['col_3'])
fig, (ax1, ax2) = plt.subplots(1, 2)
validdf = df[(df['is_col_3_na'] == False)].copy()
sns.countplot(data=df, x='is_col_3_na', ax=ax1).set_title('col_3 valid/invalid data ratios')
sns.boxplot(data=validdf, x='category', y='col_3',
#hue="category",
ax=ax2)
print(df['is_col_3_na'].describe())
print(df['is_col_3_na'].value_counts())
# start: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
# with proper modifications
# Calculate number of obs per group & median to position labels
medians = validdf.groupby(['category'])['col_3'].median().values
nobs = validdf['category'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
# Add it to the plot
pos = range(len(nobs))
for tick, label in zip(pos, ax2.get_xticklabels()):
ax2.text(pos[tick], medians[tick] + 0.03, nobs[tick],
horizontalalignment='center', size='x-small', color='b', weight='semibold')
# end: taken from https://python-graph-gallery.com/38-show-number-of-observation-on-boxplot/
plt.show()
Output:

Console prints (concerning the column 'col_3'):
count 500
unique 2
top True
freq 254
Name: is_col_3_na, dtype: object
True 254
False 246
Name: is_col_3_na, dtype: int64
answered Mar 13 at 0:44
LoneWandererLoneWanderer
1,226925
1,226925
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338630%2finclude-nas-as-factor-in-seaborn-boxplot%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown