pandas DataFrame: replace nan values with average of a certain group [duplicate]
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
This question already has an answer here:
Pandas: filling missing values by mean in each group
8 answers
I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.
How can I replace the nans with mean or median of grouped categories?
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 NaN
5 model 1 NaN
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
in this case i would like to substitute two nan with their relative mean or median.
Thank you in advance
python pandas missing-data
marked as duplicate by jpp
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Pandas: filling missing values by mean in each group
8 answers
I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.
How can I replace the nans with mean or median of grouped categories?
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 NaN
5 model 1 NaN
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
in this case i would like to substitute two nan with their relative mean or median.
Thank you in advance
python pandas missing-data
marked as duplicate by jpp
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Pandas: filling missing values by mean in each group
8 answers
I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.
How can I replace the nans with mean or median of grouped categories?
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 NaN
5 model 1 NaN
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
in this case i would like to substitute two nan with their relative mean or median.
Thank you in advance
python pandas missing-data
This question already has an answer here:
Pandas: filling missing values by mean in each group
8 answers
I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.
How can I replace the nans with mean or median of grouped categories?
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 NaN
5 model 1 NaN
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
in this case i would like to substitute two nan with their relative mean or median.
Thank you in advance
This question already has an answer here:
Pandas: filling missing values by mean in each group
8 answers
python pandas missing-data
python pandas missing-data
asked Nov 16 '18 at 13:32
PdFPdF
4410
4410
marked as duplicate by jpp
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by jpp
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You can use groupby + transform + fillna
:
>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))
>>> df
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 0.241510
5 model 1 1.249709
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function
– PdF
Nov 16 '18 at 13:51
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use groupby + transform + fillna
:
>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))
>>> df
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 0.241510
5 model 1 1.249709
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function
– PdF
Nov 16 '18 at 13:51
add a comment |
You can use groupby + transform + fillna
:
>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))
>>> df
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 0.241510
5 model 1 1.249709
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function
– PdF
Nov 16 '18 at 13:51
add a comment |
You can use groupby + transform + fillna
:
>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))
>>> df
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 0.241510
5 model 1 1.249709
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
You can use groupby + transform + fillna
:
>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))
>>> df
A B
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 0.241510
5 model 1 1.249709
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876
answered Nov 16 '18 at 13:36
Brad SolomonBrad Solomon
14.4k83892
14.4k83892
very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function
– PdF
Nov 16 '18 at 13:51
add a comment |
very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function
– PdF
Nov 16 '18 at 13:51
very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function
Nov 16 '18 at 13:51
very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function
Nov 16 '18 at 13:51
add a comment |