pandas DataFrame: replace nan values with average of a certain group [duplicate]





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1
















This question already has an answer here:




  • Pandas: filling missing values by mean in each group

    8 answers




I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.



How can I replace the nans with mean or median of grouped categories?



      A         B      
0 model 2 0.979728
1 model 1 0.912674
2 model 2 0.540679
3 model 1 2.027325
4 model 2 NaN
5 model 1 NaN
6 model 3 -0.612343
7 model 1 1.033826
8 model 1 1.025011
9 model 2 -0.795876


in this case i would like to substitute two nan with their relative mean or median.



Thank you in advance










share|improve this question













marked as duplicate by jpp pandas
Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 13:48


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

























    1
















    This question already has an answer here:




    • Pandas: filling missing values by mean in each group

      8 answers




    I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.



    How can I replace the nans with mean or median of grouped categories?



          A         B      
    0 model 2 0.979728
    1 model 1 0.912674
    2 model 2 0.540679
    3 model 1 2.027325
    4 model 2 NaN
    5 model 1 NaN
    6 model 3 -0.612343
    7 model 1 1.033826
    8 model 1 1.025011
    9 model 2 -0.795876


    in this case i would like to substitute two nan with their relative mean or median.



    Thank you in advance










    share|improve this question













    marked as duplicate by jpp pandas
    Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

    StackExchange.ready(function() {
    if (StackExchange.options.isMobile) return;

    $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
    var $hover = $(this).addClass('hover-bound'),
    $msg = $hover.siblings('.dupe-hammer-message');

    $hover.hover(
    function() {
    $hover.showInfoMessage('', {
    messageElement: $msg.clone().show(),
    transient: false,
    position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
    dismissable: false,
    relativeToBody: true
    });
    },
    function() {
    StackExchange.helpers.removeMessages();
    }
    );
    });
    });
    Nov 16 '18 at 13:48


    This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.





















      1












      1








      1









      This question already has an answer here:




      • Pandas: filling missing values by mean in each group

        8 answers




      I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.



      How can I replace the nans with mean or median of grouped categories?



            A         B      
      0 model 2 0.979728
      1 model 1 0.912674
      2 model 2 0.540679
      3 model 1 2.027325
      4 model 2 NaN
      5 model 1 NaN
      6 model 3 -0.612343
      7 model 1 1.033826
      8 model 1 1.025011
      9 model 2 -0.795876


      in this case i would like to substitute two nan with their relative mean or median.



      Thank you in advance










      share|improve this question















      This question already has an answer here:




      • Pandas: filling missing values by mean in each group

        8 answers




      I've got a pandas DataFrame filled with real numbers and categories, but there is a few nan values in it.



      How can I replace the nans with mean or median of grouped categories?



            A         B      
      0 model 2 0.979728
      1 model 1 0.912674
      2 model 2 0.540679
      3 model 1 2.027325
      4 model 2 NaN
      5 model 1 NaN
      6 model 3 -0.612343
      7 model 1 1.033826
      8 model 1 1.025011
      9 model 2 -0.795876


      in this case i would like to substitute two nan with their relative mean or median.



      Thank you in advance





      This question already has an answer here:




      • Pandas: filling missing values by mean in each group

        8 answers








      python pandas missing-data






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 16 '18 at 13:32









      PdFPdF

      4410




      4410




      marked as duplicate by jpp pandas
      Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 16 '18 at 13:48


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









      marked as duplicate by jpp pandas
      Users with the  pandas badge can single-handedly close pandas questions as duplicates and reopen them as needed.

      StackExchange.ready(function() {
      if (StackExchange.options.isMobile) return;

      $('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
      var $hover = $(this).addClass('hover-bound'),
      $msg = $hover.siblings('.dupe-hammer-message');

      $hover.hover(
      function() {
      $hover.showInfoMessage('', {
      messageElement: $msg.clone().show(),
      transient: false,
      position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
      dismissable: false,
      relativeToBody: true
      });
      },
      function() {
      StackExchange.helpers.removeMessages();
      }
      );
      });
      });
      Nov 16 '18 at 13:48


      This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.


























          1 Answer
          1






          active

          oldest

          votes


















          2














          You can use groupby + transform + fillna:



          >>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          
          >>> df

          A B
          0 model 2 0.979728
          1 model 1 0.912674
          2 model 2 0.540679
          3 model 1 2.027325
          4 model 2 0.241510
          5 model 1 1.249709
          6 model 3 -0.612343
          7 model 1 1.033826
          8 model 1 1.025011
          9 model 2 -0.795876





          share|improve this answer
























          • very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

            – PdF
            Nov 16 '18 at 13:51




















          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          2














          You can use groupby + transform + fillna:



          >>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          
          >>> df

          A B
          0 model 2 0.979728
          1 model 1 0.912674
          2 model 2 0.540679
          3 model 1 2.027325
          4 model 2 0.241510
          5 model 1 1.249709
          6 model 3 -0.612343
          7 model 1 1.033826
          8 model 1 1.025011
          9 model 2 -0.795876





          share|improve this answer
























          • very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

            – PdF
            Nov 16 '18 at 13:51


















          2














          You can use groupby + transform + fillna:



          >>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          
          >>> df

          A B
          0 model 2 0.979728
          1 model 1 0.912674
          2 model 2 0.540679
          3 model 1 2.027325
          4 model 2 0.241510
          5 model 1 1.249709
          6 model 3 -0.612343
          7 model 1 1.033826
          8 model 1 1.025011
          9 model 2 -0.795876





          share|improve this answer
























          • very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

            – PdF
            Nov 16 '18 at 13:51
















          2












          2








          2







          You can use groupby + transform + fillna:



          >>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          
          >>> df

          A B
          0 model 2 0.979728
          1 model 1 0.912674
          2 model 2 0.540679
          3 model 1 2.027325
          4 model 2 0.241510
          5 model 1 1.249709
          6 model 3 -0.612343
          7 model 1 1.033826
          8 model 1 1.025011
          9 model 2 -0.795876





          share|improve this answer













          You can use groupby + transform + fillna:



          >>> df['B'] = df.B.fillna(df.groupby('A')['B'].transform('mean'))                                                                                                                                                          
          >>> df

          A B
          0 model 2 0.979728
          1 model 1 0.912674
          2 model 2 0.540679
          3 model 1 2.027325
          4 model 2 0.241510
          5 model 1 1.249709
          6 model 3 -0.612343
          7 model 1 1.033826
          8 model 1 1.025011
          9 model 2 -0.795876






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 16 '18 at 13:36









          Brad SolomonBrad Solomon

          14.4k83892




          14.4k83892













          • very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

            – PdF
            Nov 16 '18 at 13:51





















          • very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

            – PdF
            Nov 16 '18 at 13:51



















          very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

          – PdF
          Nov 16 '18 at 13:51







          very good thx but in this way i have to impute manually the mean, but my dataset is very big so the effort in this way will be very high. Is it possible to use a groupby like this? group_data_median = df.groupby(['A'])['B'].median() # sum function

          – PdF
          Nov 16 '18 at 13:51







          Popular posts from this blog

          Bressuire

          Vorschmack

          Quarantine