Data Frame: mean over certain variables, ignore but keep others












0















I am analysing my data with R for the first time which is a bit challenging. I have a data frame with my data that looks like this:



head(data)
subject group age trial cond acc rt
1 S1 2 1 1 1 1 5045
2 S1 2 1 2 2 1 8034
3 S1 2 1 3 1 1 6236
4 S1 2 1 4 2 1 8087
5 S1 2 1 5 3 0 8756
6 S1 2 1 6 1 1 6619


I would like to compute a mean and standard deviation for each subject in each condition for rt and a sum for each subject in each condition for acc. All the other variables are should remain the same (group and age are subject-specific, and trial can be disregarded).



I have tried using aggregate but that seemed kind of complicated because I had to do it in several steps and re-add information...



I'd be thankful for any help =)



Edit: I realise that I wasn't being clear. I want trial to be disregarded and end up with one row per subject per condition:



head(data_new)
subject group age cond rt_mean rt_sd acc_sum
1 S1 2 1 1 7581 100 5
2 S2 2 1 2 8034 150 4


Sorry about the confusion!










share|improve this question





























    0















    I am analysing my data with R for the first time which is a bit challenging. I have a data frame with my data that looks like this:



    head(data)
    subject group age trial cond acc rt
    1 S1 2 1 1 1 1 5045
    2 S1 2 1 2 2 1 8034
    3 S1 2 1 3 1 1 6236
    4 S1 2 1 4 2 1 8087
    5 S1 2 1 5 3 0 8756
    6 S1 2 1 6 1 1 6619


    I would like to compute a mean and standard deviation for each subject in each condition for rt and a sum for each subject in each condition for acc. All the other variables are should remain the same (group and age are subject-specific, and trial can be disregarded).



    I have tried using aggregate but that seemed kind of complicated because I had to do it in several steps and re-add information...



    I'd be thankful for any help =)



    Edit: I realise that I wasn't being clear. I want trial to be disregarded and end up with one row per subject per condition:



    head(data_new)
    subject group age cond rt_mean rt_sd acc_sum
    1 S1 2 1 1 7581 100 5
    2 S2 2 1 2 8034 150 4


    Sorry about the confusion!










    share|improve this question



























      0












      0








      0








      I am analysing my data with R for the first time which is a bit challenging. I have a data frame with my data that looks like this:



      head(data)
      subject group age trial cond acc rt
      1 S1 2 1 1 1 1 5045
      2 S1 2 1 2 2 1 8034
      3 S1 2 1 3 1 1 6236
      4 S1 2 1 4 2 1 8087
      5 S1 2 1 5 3 0 8756
      6 S1 2 1 6 1 1 6619


      I would like to compute a mean and standard deviation for each subject in each condition for rt and a sum for each subject in each condition for acc. All the other variables are should remain the same (group and age are subject-specific, and trial can be disregarded).



      I have tried using aggregate but that seemed kind of complicated because I had to do it in several steps and re-add information...



      I'd be thankful for any help =)



      Edit: I realise that I wasn't being clear. I want trial to be disregarded and end up with one row per subject per condition:



      head(data_new)
      subject group age cond rt_mean rt_sd acc_sum
      1 S1 2 1 1 7581 100 5
      2 S2 2 1 2 8034 150 4


      Sorry about the confusion!










      share|improve this question
















      I am analysing my data with R for the first time which is a bit challenging. I have a data frame with my data that looks like this:



      head(data)
      subject group age trial cond acc rt
      1 S1 2 1 1 1 1 5045
      2 S1 2 1 2 2 1 8034
      3 S1 2 1 3 1 1 6236
      4 S1 2 1 4 2 1 8087
      5 S1 2 1 5 3 0 8756
      6 S1 2 1 6 1 1 6619


      I would like to compute a mean and standard deviation for each subject in each condition for rt and a sum for each subject in each condition for acc. All the other variables are should remain the same (group and age are subject-specific, and trial can be disregarded).



      I have tried using aggregate but that seemed kind of complicated because I had to do it in several steps and re-add information...



      I'd be thankful for any help =)



      Edit: I realise that I wasn't being clear. I want trial to be disregarded and end up with one row per subject per condition:



      head(data_new)
      subject group age cond rt_mean rt_sd acc_sum
      1 S1 2 1 1 7581 100 5
      2 S2 2 1 2 8034 150 4


      Sorry about the confusion!







      r dataframe sum mean reorganize






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 10:33







      Max

















      asked Nov 16 '18 at 9:46









      MaxMax

      12




      12
























          3 Answers
          3






          active

          oldest

          votes


















          0














          The package dplyr is made for this:



          library(dplyr)
          d %>%
          group_by(subject, cond) %>% # we group by the two values
          summarise(
          mean_rt = mean(rt, na.rm=T),
          sd_rt = sd(rt, na.rm=T),
          sum_acc = sum(acc, na.rm=T) # here we apply each function to summarise values
          )


          # A tibble: 3 x 5
          # Groups: subject [?]
          subject cond mean_rt sd_rt sum_acc
          <fct> <int> <dbl> <dbl> <int>
          1 S1 1 5967. 821. 3
          2 S1 2 8060. 37.5 2
          3 S1 3 8756 NA 0
          # NA for the last sd_rt is because you can't have
          # sd for a single obs.


          Basically you need to group_by the columns (one or more) that you need to use as grouping, then inside summarise, you apply each function you need (mean, sd, sum, ecc) to each variable (rt, acc, ecc).



          Change summarise with mutate if you want to keep all variables:



          d %>% 
          select(-trial) %>% # use select with -var_name to eliminate columns
          group_by(subject, cond) %>%
          mutate(
          mean_rt = mean(rt, na.rm=T),
          sd_rt = sd(rt, na.rm=T),
          sum_acc = sum(acc, na.rm=T)
          ) %>%
          ungroup()
          # A tibble: 6 x 9
          subject group age cond acc rt mean_rt sd_rt sum_acc
          <fct> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
          1 S1 2 1 1 1 5045 5967. 821. 3
          2 S1 2 1 2 1 8034 8060. 37.5 2
          3 S1 2 1 1 1 6236 5967. 821. 3
          4 S1 2 1 2 1 8087 8060. 37.5 2
          5 S1 2 1 3 0 8756 8756 NA 0
          6 S1 2 1 1 1 6619 5967. 821. 3


          Update based on op request, maybe this is what you need:



          d %>% 
          group_by(subject, cond, group, age) %>%
          summarise(
          mean_rt = mean(rt, na.rm=T),
          sd_rt = sd(rt, na.rm=T),
          sum_acc = sum(acc, na.rm=T)
          )
          # A tibble: 3 x 7
          # Groups: subject, cond, group [?]
          subject cond group age mean_rt sd_rt sum_acc
          <fct> <int> <int> <int> <dbl> <dbl> <int>
          1 S1 1 2 1 5967. 821. 3
          2 S1 2 2 1 8060. 37.5 2
          3 S1 3 2 1 8756 NA 0


          Data used:



          tt <- "subject group age trial cond acc  rt
          S1 2 1 1 1 1 5045
          S1 2 1 2 2 1 8034
          S1 2 1 3 1 1 6236
          S1 2 1 4 2 1 8087
          S1 2 1 5 3 0 8756
          S1 2 1 6 1 1 6619"

          d <- read.table(text=tt, header=T)





          share|improve this answer


























          • Thanks! Generally, it looks good, however, if I use summarise, I lose all the variables that I wanted to keep the same (e.g. group) but if I use mutate, the doesn't eliminate "duplicate" rows... Is there a way to disregard "trial" and get one row per subject in each condition?

            – Max
            Nov 16 '18 at 10:29











          • Hi see my update, use select(-trial) in order to remove that column.

            – RLave
            Nov 16 '18 at 10:35











          • if you need to add more grouping conditions try something like group_by(subject, cond, group), in group_by you can add more variables.

            – RLave
            Nov 16 '18 at 10:36











          • basically if you need more grouping variables just add the min group_by()

            – RLave
            Nov 16 '18 at 10:39



















          1














          If you don't mind using the data.table package:



          library(data.table)
          data <- data.table(data)
          data[, ':=' (rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)]
          data

          subject group age trial cond acc rt rt_mean rt_sd acc_sum
          1: S1 2 1 1 1 1 5045 5966.667 820.83758 3
          2: S1 2 1 2 2 1 8034 8060.500 37.47666 2
          3: S1 2 1 3 1 1 6236 5966.667 820.83758 3
          4: S1 2 1 4 2 1 8087 8060.500 37.47666 2
          5: S1 2 1 5 3 0 8756 8756.000 NA 0
          6: S1 2 1 6 1 1 6619 5966.667 820.83758 3


          Edit:



          If you want to get rid of some of the variables and duplicated rows, you need only a small modification - remove the := assignment operator (instead of adding new colums, it will now create a new data.table), add the variables you want to keep and use the unique function:



          unique(dt[, .(group, age, rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)])
          subject cond group age rt_mean rt_sd acc_sum
          1: S1 1 2 1 5966.667 820.83758 3
          2: S1 2 2 1 8060.500 37.47666 2
          3: S1 3 2 1 8756.000 NA 0


          If you additionally want to get rid of rows with missing values, use the na.omit function.






          share|improve this answer


























          • First of all thank you for your help. This looks really close but I'm sorry, I described my issue a bit unclear/wrong: I would like to get rid of trials and end up with one row per subject per condition instead of having the same rt_mean for each trial of a specific condition and subject.

            – Max
            Nov 16 '18 at 10:35













          • @Max Ok, it would require a simple modification, I edited the answer to adress it :)

            – MRau
            Nov 16 '18 at 11:12



















          0














          If you want to compute for example the mean of rt for subject S1 under condition 1, you can use mean(data[data$subject == "S1" & data$cond == 1, 7]).



          I hope this gives you an idea how you can filter your values.






          share|improve this answer
























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335207%2fdata-frame-mean-over-certain-variables-ignore-but-keep-others%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            The package dplyr is made for this:



            library(dplyr)
            d %>%
            group_by(subject, cond) %>% # we group by the two values
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T) # here we apply each function to summarise values
            )


            # A tibble: 3 x 5
            # Groups: subject [?]
            subject cond mean_rt sd_rt sum_acc
            <fct> <int> <dbl> <dbl> <int>
            1 S1 1 5967. 821. 3
            2 S1 2 8060. 37.5 2
            3 S1 3 8756 NA 0
            # NA for the last sd_rt is because you can't have
            # sd for a single obs.


            Basically you need to group_by the columns (one or more) that you need to use as grouping, then inside summarise, you apply each function you need (mean, sd, sum, ecc) to each variable (rt, acc, ecc).



            Change summarise with mutate if you want to keep all variables:



            d %>% 
            select(-trial) %>% # use select with -var_name to eliminate columns
            group_by(subject, cond) %>%
            mutate(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            ) %>%
            ungroup()
            # A tibble: 6 x 9
            subject group age cond acc rt mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 2 1 1 1 5045 5967. 821. 3
            2 S1 2 1 2 1 8034 8060. 37.5 2
            3 S1 2 1 1 1 6236 5967. 821. 3
            4 S1 2 1 2 1 8087 8060. 37.5 2
            5 S1 2 1 3 0 8756 8756 NA 0
            6 S1 2 1 1 1 6619 5967. 821. 3


            Update based on op request, maybe this is what you need:



            d %>% 
            group_by(subject, cond, group, age) %>%
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            )
            # A tibble: 3 x 7
            # Groups: subject, cond, group [?]
            subject cond group age mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 1 2 1 5967. 821. 3
            2 S1 2 2 1 8060. 37.5 2
            3 S1 3 2 1 8756 NA 0


            Data used:



            tt <- "subject group age trial cond acc  rt
            S1 2 1 1 1 1 5045
            S1 2 1 2 2 1 8034
            S1 2 1 3 1 1 6236
            S1 2 1 4 2 1 8087
            S1 2 1 5 3 0 8756
            S1 2 1 6 1 1 6619"

            d <- read.table(text=tt, header=T)





            share|improve this answer


























            • Thanks! Generally, it looks good, however, if I use summarise, I lose all the variables that I wanted to keep the same (e.g. group) but if I use mutate, the doesn't eliminate "duplicate" rows... Is there a way to disregard "trial" and get one row per subject in each condition?

              – Max
              Nov 16 '18 at 10:29











            • Hi see my update, use select(-trial) in order to remove that column.

              – RLave
              Nov 16 '18 at 10:35











            • if you need to add more grouping conditions try something like group_by(subject, cond, group), in group_by you can add more variables.

              – RLave
              Nov 16 '18 at 10:36











            • basically if you need more grouping variables just add the min group_by()

              – RLave
              Nov 16 '18 at 10:39
















            0














            The package dplyr is made for this:



            library(dplyr)
            d %>%
            group_by(subject, cond) %>% # we group by the two values
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T) # here we apply each function to summarise values
            )


            # A tibble: 3 x 5
            # Groups: subject [?]
            subject cond mean_rt sd_rt sum_acc
            <fct> <int> <dbl> <dbl> <int>
            1 S1 1 5967. 821. 3
            2 S1 2 8060. 37.5 2
            3 S1 3 8756 NA 0
            # NA for the last sd_rt is because you can't have
            # sd for a single obs.


            Basically you need to group_by the columns (one or more) that you need to use as grouping, then inside summarise, you apply each function you need (mean, sd, sum, ecc) to each variable (rt, acc, ecc).



            Change summarise with mutate if you want to keep all variables:



            d %>% 
            select(-trial) %>% # use select with -var_name to eliminate columns
            group_by(subject, cond) %>%
            mutate(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            ) %>%
            ungroup()
            # A tibble: 6 x 9
            subject group age cond acc rt mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 2 1 1 1 5045 5967. 821. 3
            2 S1 2 1 2 1 8034 8060. 37.5 2
            3 S1 2 1 1 1 6236 5967. 821. 3
            4 S1 2 1 2 1 8087 8060. 37.5 2
            5 S1 2 1 3 0 8756 8756 NA 0
            6 S1 2 1 1 1 6619 5967. 821. 3


            Update based on op request, maybe this is what you need:



            d %>% 
            group_by(subject, cond, group, age) %>%
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            )
            # A tibble: 3 x 7
            # Groups: subject, cond, group [?]
            subject cond group age mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 1 2 1 5967. 821. 3
            2 S1 2 2 1 8060. 37.5 2
            3 S1 3 2 1 8756 NA 0


            Data used:



            tt <- "subject group age trial cond acc  rt
            S1 2 1 1 1 1 5045
            S1 2 1 2 2 1 8034
            S1 2 1 3 1 1 6236
            S1 2 1 4 2 1 8087
            S1 2 1 5 3 0 8756
            S1 2 1 6 1 1 6619"

            d <- read.table(text=tt, header=T)





            share|improve this answer


























            • Thanks! Generally, it looks good, however, if I use summarise, I lose all the variables that I wanted to keep the same (e.g. group) but if I use mutate, the doesn't eliminate "duplicate" rows... Is there a way to disregard "trial" and get one row per subject in each condition?

              – Max
              Nov 16 '18 at 10:29











            • Hi see my update, use select(-trial) in order to remove that column.

              – RLave
              Nov 16 '18 at 10:35











            • if you need to add more grouping conditions try something like group_by(subject, cond, group), in group_by you can add more variables.

              – RLave
              Nov 16 '18 at 10:36











            • basically if you need more grouping variables just add the min group_by()

              – RLave
              Nov 16 '18 at 10:39














            0












            0








            0







            The package dplyr is made for this:



            library(dplyr)
            d %>%
            group_by(subject, cond) %>% # we group by the two values
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T) # here we apply each function to summarise values
            )


            # A tibble: 3 x 5
            # Groups: subject [?]
            subject cond mean_rt sd_rt sum_acc
            <fct> <int> <dbl> <dbl> <int>
            1 S1 1 5967. 821. 3
            2 S1 2 8060. 37.5 2
            3 S1 3 8756 NA 0
            # NA for the last sd_rt is because you can't have
            # sd for a single obs.


            Basically you need to group_by the columns (one or more) that you need to use as grouping, then inside summarise, you apply each function you need (mean, sd, sum, ecc) to each variable (rt, acc, ecc).



            Change summarise with mutate if you want to keep all variables:



            d %>% 
            select(-trial) %>% # use select with -var_name to eliminate columns
            group_by(subject, cond) %>%
            mutate(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            ) %>%
            ungroup()
            # A tibble: 6 x 9
            subject group age cond acc rt mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 2 1 1 1 5045 5967. 821. 3
            2 S1 2 1 2 1 8034 8060. 37.5 2
            3 S1 2 1 1 1 6236 5967. 821. 3
            4 S1 2 1 2 1 8087 8060. 37.5 2
            5 S1 2 1 3 0 8756 8756 NA 0
            6 S1 2 1 1 1 6619 5967. 821. 3


            Update based on op request, maybe this is what you need:



            d %>% 
            group_by(subject, cond, group, age) %>%
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            )
            # A tibble: 3 x 7
            # Groups: subject, cond, group [?]
            subject cond group age mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 1 2 1 5967. 821. 3
            2 S1 2 2 1 8060. 37.5 2
            3 S1 3 2 1 8756 NA 0


            Data used:



            tt <- "subject group age trial cond acc  rt
            S1 2 1 1 1 1 5045
            S1 2 1 2 2 1 8034
            S1 2 1 3 1 1 6236
            S1 2 1 4 2 1 8087
            S1 2 1 5 3 0 8756
            S1 2 1 6 1 1 6619"

            d <- read.table(text=tt, header=T)





            share|improve this answer















            The package dplyr is made for this:



            library(dplyr)
            d %>%
            group_by(subject, cond) %>% # we group by the two values
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T) # here we apply each function to summarise values
            )


            # A tibble: 3 x 5
            # Groups: subject [?]
            subject cond mean_rt sd_rt sum_acc
            <fct> <int> <dbl> <dbl> <int>
            1 S1 1 5967. 821. 3
            2 S1 2 8060. 37.5 2
            3 S1 3 8756 NA 0
            # NA for the last sd_rt is because you can't have
            # sd for a single obs.


            Basically you need to group_by the columns (one or more) that you need to use as grouping, then inside summarise, you apply each function you need (mean, sd, sum, ecc) to each variable (rt, acc, ecc).



            Change summarise with mutate if you want to keep all variables:



            d %>% 
            select(-trial) %>% # use select with -var_name to eliminate columns
            group_by(subject, cond) %>%
            mutate(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            ) %>%
            ungroup()
            # A tibble: 6 x 9
            subject group age cond acc rt mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 2 1 1 1 5045 5967. 821. 3
            2 S1 2 1 2 1 8034 8060. 37.5 2
            3 S1 2 1 1 1 6236 5967. 821. 3
            4 S1 2 1 2 1 8087 8060. 37.5 2
            5 S1 2 1 3 0 8756 8756 NA 0
            6 S1 2 1 1 1 6619 5967. 821. 3


            Update based on op request, maybe this is what you need:



            d %>% 
            group_by(subject, cond, group, age) %>%
            summarise(
            mean_rt = mean(rt, na.rm=T),
            sd_rt = sd(rt, na.rm=T),
            sum_acc = sum(acc, na.rm=T)
            )
            # A tibble: 3 x 7
            # Groups: subject, cond, group [?]
            subject cond group age mean_rt sd_rt sum_acc
            <fct> <int> <int> <int> <dbl> <dbl> <int>
            1 S1 1 2 1 5967. 821. 3
            2 S1 2 2 1 8060. 37.5 2
            3 S1 3 2 1 8756 NA 0


            Data used:



            tt <- "subject group age trial cond acc  rt
            S1 2 1 1 1 1 5045
            S1 2 1 2 2 1 8034
            S1 2 1 3 1 1 6236
            S1 2 1 4 2 1 8087
            S1 2 1 5 3 0 8756
            S1 2 1 6 1 1 6619"

            d <- read.table(text=tt, header=T)






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 16 '18 at 10:34

























            answered Nov 16 '18 at 10:11









            RLaveRLave

            5,21911226




            5,21911226













            • Thanks! Generally, it looks good, however, if I use summarise, I lose all the variables that I wanted to keep the same (e.g. group) but if I use mutate, the doesn't eliminate "duplicate" rows... Is there a way to disregard "trial" and get one row per subject in each condition?

              – Max
              Nov 16 '18 at 10:29











            • Hi see my update, use select(-trial) in order to remove that column.

              – RLave
              Nov 16 '18 at 10:35











            • if you need to add more grouping conditions try something like group_by(subject, cond, group), in group_by you can add more variables.

              – RLave
              Nov 16 '18 at 10:36











            • basically if you need more grouping variables just add the min group_by()

              – RLave
              Nov 16 '18 at 10:39



















            • Thanks! Generally, it looks good, however, if I use summarise, I lose all the variables that I wanted to keep the same (e.g. group) but if I use mutate, the doesn't eliminate "duplicate" rows... Is there a way to disregard "trial" and get one row per subject in each condition?

              – Max
              Nov 16 '18 at 10:29











            • Hi see my update, use select(-trial) in order to remove that column.

              – RLave
              Nov 16 '18 at 10:35











            • if you need to add more grouping conditions try something like group_by(subject, cond, group), in group_by you can add more variables.

              – RLave
              Nov 16 '18 at 10:36











            • basically if you need more grouping variables just add the min group_by()

              – RLave
              Nov 16 '18 at 10:39

















            Thanks! Generally, it looks good, however, if I use summarise, I lose all the variables that I wanted to keep the same (e.g. group) but if I use mutate, the doesn't eliminate "duplicate" rows... Is there a way to disregard "trial" and get one row per subject in each condition?

            – Max
            Nov 16 '18 at 10:29





            Thanks! Generally, it looks good, however, if I use summarise, I lose all the variables that I wanted to keep the same (e.g. group) but if I use mutate, the doesn't eliminate "duplicate" rows... Is there a way to disregard "trial" and get one row per subject in each condition?

            – Max
            Nov 16 '18 at 10:29













            Hi see my update, use select(-trial) in order to remove that column.

            – RLave
            Nov 16 '18 at 10:35





            Hi see my update, use select(-trial) in order to remove that column.

            – RLave
            Nov 16 '18 at 10:35













            if you need to add more grouping conditions try something like group_by(subject, cond, group), in group_by you can add more variables.

            – RLave
            Nov 16 '18 at 10:36





            if you need to add more grouping conditions try something like group_by(subject, cond, group), in group_by you can add more variables.

            – RLave
            Nov 16 '18 at 10:36













            basically if you need more grouping variables just add the min group_by()

            – RLave
            Nov 16 '18 at 10:39





            basically if you need more grouping variables just add the min group_by()

            – RLave
            Nov 16 '18 at 10:39













            1














            If you don't mind using the data.table package:



            library(data.table)
            data <- data.table(data)
            data[, ':=' (rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)]
            data

            subject group age trial cond acc rt rt_mean rt_sd acc_sum
            1: S1 2 1 1 1 1 5045 5966.667 820.83758 3
            2: S1 2 1 2 2 1 8034 8060.500 37.47666 2
            3: S1 2 1 3 1 1 6236 5966.667 820.83758 3
            4: S1 2 1 4 2 1 8087 8060.500 37.47666 2
            5: S1 2 1 5 3 0 8756 8756.000 NA 0
            6: S1 2 1 6 1 1 6619 5966.667 820.83758 3


            Edit:



            If you want to get rid of some of the variables and duplicated rows, you need only a small modification - remove the := assignment operator (instead of adding new colums, it will now create a new data.table), add the variables you want to keep and use the unique function:



            unique(dt[, .(group, age, rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)])
            subject cond group age rt_mean rt_sd acc_sum
            1: S1 1 2 1 5966.667 820.83758 3
            2: S1 2 2 1 8060.500 37.47666 2
            3: S1 3 2 1 8756.000 NA 0


            If you additionally want to get rid of rows with missing values, use the na.omit function.






            share|improve this answer


























            • First of all thank you for your help. This looks really close but I'm sorry, I described my issue a bit unclear/wrong: I would like to get rid of trials and end up with one row per subject per condition instead of having the same rt_mean for each trial of a specific condition and subject.

              – Max
              Nov 16 '18 at 10:35













            • @Max Ok, it would require a simple modification, I edited the answer to adress it :)

              – MRau
              Nov 16 '18 at 11:12
















            1














            If you don't mind using the data.table package:



            library(data.table)
            data <- data.table(data)
            data[, ':=' (rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)]
            data

            subject group age trial cond acc rt rt_mean rt_sd acc_sum
            1: S1 2 1 1 1 1 5045 5966.667 820.83758 3
            2: S1 2 1 2 2 1 8034 8060.500 37.47666 2
            3: S1 2 1 3 1 1 6236 5966.667 820.83758 3
            4: S1 2 1 4 2 1 8087 8060.500 37.47666 2
            5: S1 2 1 5 3 0 8756 8756.000 NA 0
            6: S1 2 1 6 1 1 6619 5966.667 820.83758 3


            Edit:



            If you want to get rid of some of the variables and duplicated rows, you need only a small modification - remove the := assignment operator (instead of adding new colums, it will now create a new data.table), add the variables you want to keep and use the unique function:



            unique(dt[, .(group, age, rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)])
            subject cond group age rt_mean rt_sd acc_sum
            1: S1 1 2 1 5966.667 820.83758 3
            2: S1 2 2 1 8060.500 37.47666 2
            3: S1 3 2 1 8756.000 NA 0


            If you additionally want to get rid of rows with missing values, use the na.omit function.






            share|improve this answer


























            • First of all thank you for your help. This looks really close but I'm sorry, I described my issue a bit unclear/wrong: I would like to get rid of trials and end up with one row per subject per condition instead of having the same rt_mean for each trial of a specific condition and subject.

              – Max
              Nov 16 '18 at 10:35













            • @Max Ok, it would require a simple modification, I edited the answer to adress it :)

              – MRau
              Nov 16 '18 at 11:12














            1












            1








            1







            If you don't mind using the data.table package:



            library(data.table)
            data <- data.table(data)
            data[, ':=' (rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)]
            data

            subject group age trial cond acc rt rt_mean rt_sd acc_sum
            1: S1 2 1 1 1 1 5045 5966.667 820.83758 3
            2: S1 2 1 2 2 1 8034 8060.500 37.47666 2
            3: S1 2 1 3 1 1 6236 5966.667 820.83758 3
            4: S1 2 1 4 2 1 8087 8060.500 37.47666 2
            5: S1 2 1 5 3 0 8756 8756.000 NA 0
            6: S1 2 1 6 1 1 6619 5966.667 820.83758 3


            Edit:



            If you want to get rid of some of the variables and duplicated rows, you need only a small modification - remove the := assignment operator (instead of adding new colums, it will now create a new data.table), add the variables you want to keep and use the unique function:



            unique(dt[, .(group, age, rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)])
            subject cond group age rt_mean rt_sd acc_sum
            1: S1 1 2 1 5966.667 820.83758 3
            2: S1 2 2 1 8060.500 37.47666 2
            3: S1 3 2 1 8756.000 NA 0


            If you additionally want to get rid of rows with missing values, use the na.omit function.






            share|improve this answer















            If you don't mind using the data.table package:



            library(data.table)
            data <- data.table(data)
            data[, ':=' (rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)]
            data

            subject group age trial cond acc rt rt_mean rt_sd acc_sum
            1: S1 2 1 1 1 1 5045 5966.667 820.83758 3
            2: S1 2 1 2 2 1 8034 8060.500 37.47666 2
            3: S1 2 1 3 1 1 6236 5966.667 820.83758 3
            4: S1 2 1 4 2 1 8087 8060.500 37.47666 2
            5: S1 2 1 5 3 0 8756 8756.000 NA 0
            6: S1 2 1 6 1 1 6619 5966.667 820.83758 3


            Edit:



            If you want to get rid of some of the variables and duplicated rows, you need only a small modification - remove the := assignment operator (instead of adding new colums, it will now create a new data.table), add the variables you want to keep and use the unique function:



            unique(dt[, .(group, age, rt_mean = mean(rt), rt_sd = sd(rt), acc_sum = sum(acc)), by = .(subject, cond)])
            subject cond group age rt_mean rt_sd acc_sum
            1: S1 1 2 1 5966.667 820.83758 3
            2: S1 2 2 1 8060.500 37.47666 2
            3: S1 3 2 1 8756.000 NA 0


            If you additionally want to get rid of rows with missing values, use the na.omit function.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 16 '18 at 11:00

























            answered Nov 16 '18 at 10:11









            MRauMRau

            31318




            31318













            • First of all thank you for your help. This looks really close but I'm sorry, I described my issue a bit unclear/wrong: I would like to get rid of trials and end up with one row per subject per condition instead of having the same rt_mean for each trial of a specific condition and subject.

              – Max
              Nov 16 '18 at 10:35













            • @Max Ok, it would require a simple modification, I edited the answer to adress it :)

              – MRau
              Nov 16 '18 at 11:12



















            • First of all thank you for your help. This looks really close but I'm sorry, I described my issue a bit unclear/wrong: I would like to get rid of trials and end up with one row per subject per condition instead of having the same rt_mean for each trial of a specific condition and subject.

              – Max
              Nov 16 '18 at 10:35













            • @Max Ok, it would require a simple modification, I edited the answer to adress it :)

              – MRau
              Nov 16 '18 at 11:12

















            First of all thank you for your help. This looks really close but I'm sorry, I described my issue a bit unclear/wrong: I would like to get rid of trials and end up with one row per subject per condition instead of having the same rt_mean for each trial of a specific condition and subject.

            – Max
            Nov 16 '18 at 10:35







            First of all thank you for your help. This looks really close but I'm sorry, I described my issue a bit unclear/wrong: I would like to get rid of trials and end up with one row per subject per condition instead of having the same rt_mean for each trial of a specific condition and subject.

            – Max
            Nov 16 '18 at 10:35















            @Max Ok, it would require a simple modification, I edited the answer to adress it :)

            – MRau
            Nov 16 '18 at 11:12





            @Max Ok, it would require a simple modification, I edited the answer to adress it :)

            – MRau
            Nov 16 '18 at 11:12











            0














            If you want to compute for example the mean of rt for subject S1 under condition 1, you can use mean(data[data$subject == "S1" & data$cond == 1, 7]).



            I hope this gives you an idea how you can filter your values.






            share|improve this answer




























              0














              If you want to compute for example the mean of rt for subject S1 under condition 1, you can use mean(data[data$subject == "S1" & data$cond == 1, 7]).



              I hope this gives you an idea how you can filter your values.






              share|improve this answer


























                0












                0








                0







                If you want to compute for example the mean of rt for subject S1 under condition 1, you can use mean(data[data$subject == "S1" & data$cond == 1, 7]).



                I hope this gives you an idea how you can filter your values.






                share|improve this answer













                If you want to compute for example the mean of rt for subject S1 under condition 1, you can use mean(data[data$subject == "S1" & data$cond == 1, 7]).



                I hope this gives you an idea how you can filter your values.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 16 '18 at 10:02









                RamonaRamona

                8029




                8029






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53335207%2fdata-frame-mean-over-certain-variables-ignore-but-keep-others%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    List item for chat from Array inside array React Native

                    Thiostrepton

                    Caerphilly