Truncating a dataframe according to count of vector elements in R











up vote
0
down vote

favorite












I have a dataframe df, containing three vectors:



subject  condition  value
01 A 12
01 A 6
01 B 10
01 B 2
02 A 5
02 A 11
02 B 3
02 B 5
02 B 9
...


There are four observations (and hence four rows) for subject 01, with two observations corresponding to condition A and two corresponding to condition B. Let's say that due to a technical error, there are three condition B observations for subject 02.



My question is this: how can I truncate df to ensure that each condition only has two observations for each individual subject (hence removing the erroneous third row where condition==B for subject 02)?



Thanks in advance for any assistance!










share|improve this question




























    up vote
    0
    down vote

    favorite












    I have a dataframe df, containing three vectors:



    subject  condition  value
    01 A 12
    01 A 6
    01 B 10
    01 B 2
    02 A 5
    02 A 11
    02 B 3
    02 B 5
    02 B 9
    ...


    There are four observations (and hence four rows) for subject 01, with two observations corresponding to condition A and two corresponding to condition B. Let's say that due to a technical error, there are three condition B observations for subject 02.



    My question is this: how can I truncate df to ensure that each condition only has two observations for each individual subject (hence removing the erroneous third row where condition==B for subject 02)?



    Thanks in advance for any assistance!










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I have a dataframe df, containing three vectors:



      subject  condition  value
      01 A 12
      01 A 6
      01 B 10
      01 B 2
      02 A 5
      02 A 11
      02 B 3
      02 B 5
      02 B 9
      ...


      There are four observations (and hence four rows) for subject 01, with two observations corresponding to condition A and two corresponding to condition B. Let's say that due to a technical error, there are three condition B observations for subject 02.



      My question is this: how can I truncate df to ensure that each condition only has two observations for each individual subject (hence removing the erroneous third row where condition==B for subject 02)?



      Thanks in advance for any assistance!










      share|improve this question















      I have a dataframe df, containing three vectors:



      subject  condition  value
      01 A 12
      01 A 6
      01 B 10
      01 B 2
      02 A 5
      02 A 11
      02 B 3
      02 B 5
      02 B 9
      ...


      There are four observations (and hence four rows) for subject 01, with two observations corresponding to condition A and two corresponding to condition B. Let's say that due to a technical error, there are three condition B observations for subject 02.



      My question is this: how can I truncate df to ensure that each condition only has two observations for each individual subject (hence removing the erroneous third row where condition==B for subject 02)?



      Thanks in advance for any assistance!







      r dataframe vector






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 11 at 0:01

























      asked Nov 10 at 23:56









      Lyam

      287




      287
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Here's a dplyr solution -



          df %>%
          group_by(subject, condition) %>%
          filter(row_number() < 3) %>%
          ungroup()

          # A tibble: 8 x 3
          subject condition value
          <chr> <chr> <dbl>
          1 01 A 12
          2 01 A 6
          3 01 B 10
          4 01 B 2
          5 02 A 5
          6 02 A 11
          7 02 B 3
          8 02 B 5





          share|improve this answer





















          • perfect, thank you! Just as a side note, the solution provided by G. Grothendieck also works, but has to be modified if the dataframe contains other vectors that are not defined in seq = ave()
            – Lyam
            Nov 11 at 1:44


















          up vote
          0
          down vote













          For each subject/condition pair create a sequence number seq for its rows and then only keep those rows whose sequence number is less than 3.



          subset(transform(DF, seq = ave(value, subject, condition, FUN = seq_along)), seq < 3)


          giving:



            subject condition value seq
          1 01 A 12 1
          2 01 A 6 2
          3 01 B 10 1
          4 01 B 2 2
          5 02 A 5 1
          6 02 A 11 2
          7 02 B 3 1
          8 02 B 5 2


          Note



          The input in reprodudible form is assumed to be:



          Lines <- "subject  condition  value
          01 A 12
          01 A 6
          01 B 10
          01 B 2
          02 A 5
          02 A 11
          02 B 3
          02 B 5
          02 B 9"
          DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE,
          colClasses = c("character", "character", "numeric"))





          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244591%2ftruncating-a-dataframe-according-to-count-of-vector-elements-in-r%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            Here's a dplyr solution -



            df %>%
            group_by(subject, condition) %>%
            filter(row_number() < 3) %>%
            ungroup()

            # A tibble: 8 x 3
            subject condition value
            <chr> <chr> <dbl>
            1 01 A 12
            2 01 A 6
            3 01 B 10
            4 01 B 2
            5 02 A 5
            6 02 A 11
            7 02 B 3
            8 02 B 5





            share|improve this answer





















            • perfect, thank you! Just as a side note, the solution provided by G. Grothendieck also works, but has to be modified if the dataframe contains other vectors that are not defined in seq = ave()
              – Lyam
              Nov 11 at 1:44















            up vote
            1
            down vote



            accepted










            Here's a dplyr solution -



            df %>%
            group_by(subject, condition) %>%
            filter(row_number() < 3) %>%
            ungroup()

            # A tibble: 8 x 3
            subject condition value
            <chr> <chr> <dbl>
            1 01 A 12
            2 01 A 6
            3 01 B 10
            4 01 B 2
            5 02 A 5
            6 02 A 11
            7 02 B 3
            8 02 B 5





            share|improve this answer





















            • perfect, thank you! Just as a side note, the solution provided by G. Grothendieck also works, but has to be modified if the dataframe contains other vectors that are not defined in seq = ave()
              – Lyam
              Nov 11 at 1:44













            up vote
            1
            down vote



            accepted







            up vote
            1
            down vote



            accepted






            Here's a dplyr solution -



            df %>%
            group_by(subject, condition) %>%
            filter(row_number() < 3) %>%
            ungroup()

            # A tibble: 8 x 3
            subject condition value
            <chr> <chr> <dbl>
            1 01 A 12
            2 01 A 6
            3 01 B 10
            4 01 B 2
            5 02 A 5
            6 02 A 11
            7 02 B 3
            8 02 B 5





            share|improve this answer












            Here's a dplyr solution -



            df %>%
            group_by(subject, condition) %>%
            filter(row_number() < 3) %>%
            ungroup()

            # A tibble: 8 x 3
            subject condition value
            <chr> <chr> <dbl>
            1 01 A 12
            2 01 A 6
            3 01 B 10
            4 01 B 2
            5 02 A 5
            6 02 A 11
            7 02 B 3
            8 02 B 5






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 11 at 0:06









            Shree

            2,708321




            2,708321












            • perfect, thank you! Just as a side note, the solution provided by G. Grothendieck also works, but has to be modified if the dataframe contains other vectors that are not defined in seq = ave()
              – Lyam
              Nov 11 at 1:44


















            • perfect, thank you! Just as a side note, the solution provided by G. Grothendieck also works, but has to be modified if the dataframe contains other vectors that are not defined in seq = ave()
              – Lyam
              Nov 11 at 1:44
















            perfect, thank you! Just as a side note, the solution provided by G. Grothendieck also works, but has to be modified if the dataframe contains other vectors that are not defined in seq = ave()
            – Lyam
            Nov 11 at 1:44




            perfect, thank you! Just as a side note, the solution provided by G. Grothendieck also works, but has to be modified if the dataframe contains other vectors that are not defined in seq = ave()
            – Lyam
            Nov 11 at 1:44












            up vote
            0
            down vote













            For each subject/condition pair create a sequence number seq for its rows and then only keep those rows whose sequence number is less than 3.



            subset(transform(DF, seq = ave(value, subject, condition, FUN = seq_along)), seq < 3)


            giving:



              subject condition value seq
            1 01 A 12 1
            2 01 A 6 2
            3 01 B 10 1
            4 01 B 2 2
            5 02 A 5 1
            6 02 A 11 2
            7 02 B 3 1
            8 02 B 5 2


            Note



            The input in reprodudible form is assumed to be:



            Lines <- "subject  condition  value
            01 A 12
            01 A 6
            01 B 10
            01 B 2
            02 A 5
            02 A 11
            02 B 3
            02 B 5
            02 B 9"
            DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE,
            colClasses = c("character", "character", "numeric"))





            share|improve this answer

























              up vote
              0
              down vote













              For each subject/condition pair create a sequence number seq for its rows and then only keep those rows whose sequence number is less than 3.



              subset(transform(DF, seq = ave(value, subject, condition, FUN = seq_along)), seq < 3)


              giving:



                subject condition value seq
              1 01 A 12 1
              2 01 A 6 2
              3 01 B 10 1
              4 01 B 2 2
              5 02 A 5 1
              6 02 A 11 2
              7 02 B 3 1
              8 02 B 5 2


              Note



              The input in reprodudible form is assumed to be:



              Lines <- "subject  condition  value
              01 A 12
              01 A 6
              01 B 10
              01 B 2
              02 A 5
              02 A 11
              02 B 3
              02 B 5
              02 B 9"
              DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE,
              colClasses = c("character", "character", "numeric"))





              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                For each subject/condition pair create a sequence number seq for its rows and then only keep those rows whose sequence number is less than 3.



                subset(transform(DF, seq = ave(value, subject, condition, FUN = seq_along)), seq < 3)


                giving:



                  subject condition value seq
                1 01 A 12 1
                2 01 A 6 2
                3 01 B 10 1
                4 01 B 2 2
                5 02 A 5 1
                6 02 A 11 2
                7 02 B 3 1
                8 02 B 5 2


                Note



                The input in reprodudible form is assumed to be:



                Lines <- "subject  condition  value
                01 A 12
                01 A 6
                01 B 10
                01 B 2
                02 A 5
                02 A 11
                02 B 3
                02 B 5
                02 B 9"
                DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE,
                colClasses = c("character", "character", "numeric"))





                share|improve this answer












                For each subject/condition pair create a sequence number seq for its rows and then only keep those rows whose sequence number is less than 3.



                subset(transform(DF, seq = ave(value, subject, condition, FUN = seq_along)), seq < 3)


                giving:



                  subject condition value seq
                1 01 A 12 1
                2 01 A 6 2
                3 01 B 10 1
                4 01 B 2 2
                5 02 A 5 1
                6 02 A 11 2
                7 02 B 3 1
                8 02 B 5 2


                Note



                The input in reprodudible form is assumed to be:



                Lines <- "subject  condition  value
                01 A 12
                01 A 6
                01 B 10
                01 B 2
                02 A 5
                02 A 11
                02 B 3
                02 B 5
                02 B 9"
                DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE,
                colClasses = c("character", "character", "numeric"))






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 11 at 0:02









                G. Grothendieck

                142k9123227




                142k9123227






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53244591%2ftruncating-a-dataframe-according-to-count-of-vector-elements-in-r%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Xamarin.iOS Cant Deploy on Iphone

                    Glorious Revolution

                    Dulmage-Mendelsohn matrix decomposition in Python