pandas DataFrame: replace nan values with average of a certain group [duplicate]

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

This question already has an answer here:

Pandas: filling missing values by mean in each group

8 answers

I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.

How can I replace the nans with mean or median of grouped categories?

      A         B      

0  model 2    0.979728 

1  model 1    0.912674 

2  model 2    0.540679 

3  model 1    2.027325 

4  model 2        NaN  

5  model 1        NaN  

6  model 3   -0.612343 

7  model 1   1.033826  

8  model 1   1.025011  

9  model 2   -0.795876

in this case i would like to substitute two nan with their relative mean or median.

Thank you in advance

asked Nov 16 '18 at 13:32

PdF

4410

marked as duplicate by jpp pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Pandas: filling missing values by mean in each group

8 answers

I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.

How can I replace the nans with mean or median of grouped categories?

      A         B      

0  model 2    0.979728 

1  model 1    0.912674 

2  model 2    0.540679 

3  model 1    2.027325 

4  model 2        NaN  

5  model 1        NaN  

6  model 3   -0.612343 

7  model 1   1.033826  

8  model 1   1.025011  

9  model 2   -0.795876

in this case i would like to substitute two nan with their relative mean or median.

Thank you in advance

asked Nov 16 '18 at 13:32

PdF

4410

marked as duplicate by jpp pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Pandas: filling missing values by mean in each group

8 answers

I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.

How can I replace the nans with mean or median of grouped categories?

      A         B      

0  model 2    0.979728 

1  model 1    0.912674 

2  model 2    0.540679 

3  model 1    2.027325 

4  model 2        NaN  

5  model 1        NaN  

6  model 3   -0.612343 

7  model 1   1.033826  

8  model 1   1.025011  

9  model 2   -0.795876

in this case i would like to substitute two nan with their relative mean or median.

Thank you in advance

asked Nov 16 '18 at 13:32

PdF

4410

This question already has an answer here:

Pandas: filling missing values by mean in each group

8 answers

I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.

How can I replace the nans with mean or median of grouped categories?

      A         B      

0  model 2    0.979728 

1  model 1    0.912674 

2  model 2    0.540679 

3  model 1    2.027325 

4  model 2        NaN  

5  model 1        NaN  

6  model 3   -0.612343 

7  model 1   1.033826  

8  model 1   1.025011  

9  model 2   -0.795876

in this case i would like to substitute two nan with their relative mean or median.

Thank you in advance

This question already has an answer here:

Pandas: filling missing values by mean in each group

8 answers

python pandas missing-data

asked Nov 16 '18 at 13:32

PdF

4410

asked Nov 16 '18 at 13:32

PdF

4410

asked Nov 16 '18 at 13:32

PdF

4410

asked Nov 16 '18 at 13:32

PdF

4410

asked Nov 16 '18 at 13:32

PdF

4410

marked as duplicate by jpp pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by jpp pandas
Users with the pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

1 Answer
1

active

oldest

votes

You can use groupby + transform + fillna:

>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          

>>> df                                                                                                                                                                                                                        



        A         B

0 model 2  0.979728

1 model 1  0.912674

2 model 2  0.540679

3 model 1  2.027325

4 model 2  0.241510

5 model 1  1.249709

6 model 3 -0.612343

7 model 1  1.033826

8 model 1  1.025011

9 model 2 -0.795876

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

– PdF
Nov 16 '18 at 13:51

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

You can use groupby + transform + fillna:

>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          

>>> df                                                                                                                                                                                                                        



        A         B

0 model 2  0.979728

1 model 1  0.912674

2 model 2  0.540679

3 model 1  2.027325

4 model 2  0.241510

5 model 1  1.249709

6 model 3 -0.612343

7 model 1  1.033826

8 model 1  1.025011

9 model 2 -0.795876

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

– PdF
Nov 16 '18 at 13:51

add a comment |

You can use groupby + transform + fillna:

>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          

>>> df                                                                                                                                                                                                                        



        A         B

0 model 2  0.979728

1 model 1  0.912674

2 model 2  0.540679

3 model 1  2.027325

4 model 2  0.241510

5 model 1  1.249709

6 model 3 -0.612343

7 model 1  1.033826

8 model 1  1.025011

9 model 2 -0.795876

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

– PdF
Nov 16 '18 at 13:51

add a comment |

You can use groupby + transform + fillna:

>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          

>>> df                                                                                                                                                                                                                        



        A         B

0 model 2  0.979728

1 model 1  0.912674

2 model 2  0.540679

3 model 1  2.027325

4 model 2  0.241510

5 model 1  1.249709

6 model 3 -0.612343

7 model 1  1.033826

8 model 1  1.025011

9 model 2 -0.795876

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

You can use groupby + transform + fillna:

>>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          

>>> df                                                                                                                                                                                                                        



        A         B

0 model 2  0.979728

1 model 1  0.912674

2 model 2  0.540679

3 model 1  2.027325

4 model 2  0.241510

5 model 1  1.249709

6 model 3 -0.612343

7 model 1  1.033826

8 model 1  1.025011

9 model 2 -0.795876

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

answered Nov 16 '18 at 13:36

Brad Solomon

14.4k83892

very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

– PdF
Nov 16 '18 at 13:51

add a comment |

very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

– PdF
Nov 16 '18 at 13:51

very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

– PdF
Nov 16 '18 at 13:51

add a comment |

This page is only for reference, If you need detailed information, please check here

Jd r0RAfsHHmHv5275lmaOYzV3,vlM4SC7Bdo6qz4w0lTXDIzA4oobwhZg7kc HEdwzezVS,WdBV4wLpoeL6y4gsQ01jY,4A2

搜尋此網誌

Vfrdtyky