Subset to only return complete cases on group basis with multiple rows












1















If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:



GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5
1 2 4 5 6 6
1 3 4 2 3 4
1 4 NA 6 6 3
1 3 2 1 2 4
2 9 8 2 7 1
2 2 3 3 2 6
2 2 5 8 9 2
3 2 1 5 6 NA
3 NA NA 5 8 9
3 4 2 6 8 7


In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.



How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?










share|improve this question





























    1















    If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:



    GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5
    1 2 4 5 6 6
    1 3 4 2 3 4
    1 4 NA 6 6 3
    1 3 2 1 2 4
    2 9 8 2 7 1
    2 2 3 3 2 6
    2 2 5 8 9 2
    3 2 1 5 6 NA
    3 NA NA 5 8 9
    3 4 2 6 8 7


    In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.



    How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?










    share|improve this question



























      1












      1








      1








      If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:



      GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5
      1 2 4 5 6 6
      1 3 4 2 3 4
      1 4 NA 6 6 3
      1 3 2 1 2 4
      2 9 8 2 7 1
      2 2 3 3 2 6
      2 2 5 8 9 2
      3 2 1 5 6 NA
      3 NA NA 5 8 9
      3 4 2 6 8 7


      In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.



      How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?










      share|improve this question
















      If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:



      GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5
      1 2 4 5 6 6
      1 3 4 2 3 4
      1 4 NA 6 6 3
      1 3 2 1 2 4
      2 9 8 2 7 1
      2 2 3 3 2 6
      2 2 5 8 9 2
      3 2 1 5 6 NA
      3 NA NA 5 8 9
      3 4 2 6 8 7


      In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.



      How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?







      r dplyr data.table plyr tidyr






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 2:16









      Ronak Shah

      38.2k104161




      38.2k104161










      asked Nov 14 '18 at 12:43









      ChaoChao

      494




      494
























          4 Answers
          4






          active

          oldest

          votes


















          0














          You can use dplyr for this. First group_by your grouping variable(s), and then use filter_at to filter on the variables that you care about complete cases for. Here we use starts_with to select all the VAR variables (in fact because there are no other columns we could have used filter_all). Then we use all_vars to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars)."





          library(tidyverse)
          tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

          tbl %>%
          group_by(GROUP_ID) %>%
          filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
          #> # A tibble: 3 x 6
          #> # Groups: GROUP_ID [1]
          #> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
          #> <int> <int> <int> <int> <int> <int>
          #> 1 2 9 8 2 7 1
          #> 2 2 2 3 3 2 6
          #> 3 2 2 5 8 9 2


          Created on 2018-11-14 by the reprex package (v0.2.1)






          share|improve this answer





















          • 1





            If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

            – Chao
            Nov 14 '18 at 13:11











          • all_vars applies to those selected in vars(). variables not selected will be ignored

            – Calum You
            Nov 14 '18 at 19:28











          • Great! Thank you!

            – Chao
            Nov 15 '18 at 12:59



















          1














          Here's a simple one-line base R solution -



          DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]

          GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
          5 2 9 8 2 7 1
          6 2 2 3 3 2 6
          7 2 2 5 8 9 2





          share|improve this answer

































            0














            A base R approach by using ave would be



            df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 
            FUN = function(x) all(x == 0))), ]

            # GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
            #5 2 9 8 2 7 1
            #6 2 2 3 3 2 6
            #7 2 2 5 8 9 2


            We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.





            Or a little more concise which gives the same result



            df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]





            share|improve this answer

































              0














              Here is another option using the number of NA per group (in case it is something you are also interested in).



              GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
              DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
              DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5

              # output
              GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
              5 2 9 8 2 7 1
              6 2 2 3 3 2 6
              7 2 2 5 8 9 2





              share|improve this answer

























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300515%2fsubset-to-only-return-complete-cases-on-group-basis-with-multiple-rows%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                0














                You can use dplyr for this. First group_by your grouping variable(s), and then use filter_at to filter on the variables that you care about complete cases for. Here we use starts_with to select all the VAR variables (in fact because there are no other columns we could have used filter_all). Then we use all_vars to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars)."





                library(tidyverse)
                tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

                tbl %>%
                group_by(GROUP_ID) %>%
                filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
                #> # A tibble: 3 x 6
                #> # Groups: GROUP_ID [1]
                #> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                #> <int> <int> <int> <int> <int> <int>
                #> 1 2 9 8 2 7 1
                #> 2 2 2 3 3 2 6
                #> 3 2 2 5 8 9 2


                Created on 2018-11-14 by the reprex package (v0.2.1)






                share|improve this answer





















                • 1





                  If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

                  – Chao
                  Nov 14 '18 at 13:11











                • all_vars applies to those selected in vars(). variables not selected will be ignored

                  – Calum You
                  Nov 14 '18 at 19:28











                • Great! Thank you!

                  – Chao
                  Nov 15 '18 at 12:59
















                0














                You can use dplyr for this. First group_by your grouping variable(s), and then use filter_at to filter on the variables that you care about complete cases for. Here we use starts_with to select all the VAR variables (in fact because there are no other columns we could have used filter_all). Then we use all_vars to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars)."





                library(tidyverse)
                tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

                tbl %>%
                group_by(GROUP_ID) %>%
                filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
                #> # A tibble: 3 x 6
                #> # Groups: GROUP_ID [1]
                #> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                #> <int> <int> <int> <int> <int> <int>
                #> 1 2 9 8 2 7 1
                #> 2 2 2 3 3 2 6
                #> 3 2 2 5 8 9 2


                Created on 2018-11-14 by the reprex package (v0.2.1)






                share|improve this answer





















                • 1





                  If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

                  – Chao
                  Nov 14 '18 at 13:11











                • all_vars applies to those selected in vars(). variables not selected will be ignored

                  – Calum You
                  Nov 14 '18 at 19:28











                • Great! Thank you!

                  – Chao
                  Nov 15 '18 at 12:59














                0












                0








                0







                You can use dplyr for this. First group_by your grouping variable(s), and then use filter_at to filter on the variables that you care about complete cases for. Here we use starts_with to select all the VAR variables (in fact because there are no other columns we could have used filter_all). Then we use all_vars to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars)."





                library(tidyverse)
                tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

                tbl %>%
                group_by(GROUP_ID) %>%
                filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
                #> # A tibble: 3 x 6
                #> # Groups: GROUP_ID [1]
                #> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                #> <int> <int> <int> <int> <int> <int>
                #> 1 2 9 8 2 7 1
                #> 2 2 2 3 3 2 6
                #> 3 2 2 5 8 9 2


                Created on 2018-11-14 by the reprex package (v0.2.1)






                share|improve this answer















                You can use dplyr for this. First group_by your grouping variable(s), and then use filter_at to filter on the variables that you care about complete cases for. Here we use starts_with to select all the VAR variables (in fact because there are no other columns we could have used filter_all). Then we use all_vars to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars)."





                library(tidyverse)
                tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))

                tbl %>%
                group_by(GROUP_ID) %>%
                filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
                #> # A tibble: 3 x 6
                #> # Groups: GROUP_ID [1]
                #> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                #> <int> <int> <int> <int> <int> <int>
                #> 1 2 9 8 2 7 1
                #> 2 2 2 3 3 2 6
                #> 3 2 2 5 8 9 2


                Created on 2018-11-14 by the reprex package (v0.2.1)







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 15 '18 at 19:51

























                answered Nov 14 '18 at 12:53









                Calum YouCalum You

                7,3501829




                7,3501829








                • 1





                  If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

                  – Chao
                  Nov 14 '18 at 13:11











                • all_vars applies to those selected in vars(). variables not selected will be ignored

                  – Calum You
                  Nov 14 '18 at 19:28











                • Great! Thank you!

                  – Chao
                  Nov 15 '18 at 12:59














                • 1





                  If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

                  – Chao
                  Nov 14 '18 at 13:11











                • all_vars applies to those selected in vars(). variables not selected will be ignored

                  – Calum You
                  Nov 14 '18 at 19:28











                • Great! Thank you!

                  – Chao
                  Nov 15 '18 at 12:59








                1




                1





                If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

                – Chao
                Nov 14 '18 at 13:11





                If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

                – Chao
                Nov 14 '18 at 13:11













                all_vars applies to those selected in vars(). variables not selected will be ignored

                – Calum You
                Nov 14 '18 at 19:28





                all_vars applies to those selected in vars(). variables not selected will be ignored

                – Calum You
                Nov 14 '18 at 19:28













                Great! Thank you!

                – Chao
                Nov 15 '18 at 12:59





                Great! Thank you!

                – Chao
                Nov 15 '18 at 12:59













                1














                Here's a simple one-line base R solution -



                DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]

                GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                5 2 9 8 2 7 1
                6 2 2 3 3 2 6
                7 2 2 5 8 9 2





                share|improve this answer






























                  1














                  Here's a simple one-line base R solution -



                  DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]

                  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                  5 2 9 8 2 7 1
                  6 2 2 3 3 2 6
                  7 2 2 5 8 9 2





                  share|improve this answer




























                    1












                    1








                    1







                    Here's a simple one-line base R solution -



                    DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]

                    GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                    5 2 9 8 2 7 1
                    6 2 2 3 3 2 6
                    7 2 2 5 8 9 2





                    share|improve this answer















                    Here's a simple one-line base R solution -



                    DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]

                    GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                    5 2 9 8 2 7 1
                    6 2 2 3 3 2 6
                    7 2 2 5 8 9 2






                    share|improve this answer














                    share|improve this answer



                    share|improve this answer








                    edited Nov 14 '18 at 13:06

























                    answered Nov 14 '18 at 13:01









                    ShreeShree

                    3,4111323




                    3,4111323























                        0














                        A base R approach by using ave would be



                        df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 
                        FUN = function(x) all(x == 0))), ]

                        # GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                        #5 2 9 8 2 7 1
                        #6 2 2 3 3 2 6
                        #7 2 2 5 8 9 2


                        We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.





                        Or a little more concise which gives the same result



                        df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]





                        share|improve this answer






























                          0














                          A base R approach by using ave would be



                          df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 
                          FUN = function(x) all(x == 0))), ]

                          # GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                          #5 2 9 8 2 7 1
                          #6 2 2 3 3 2 6
                          #7 2 2 5 8 9 2


                          We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.





                          Or a little more concise which gives the same result



                          df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]





                          share|improve this answer




























                            0












                            0








                            0







                            A base R approach by using ave would be



                            df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 
                            FUN = function(x) all(x == 0))), ]

                            # GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                            #5 2 9 8 2 7 1
                            #6 2 2 3 3 2 6
                            #7 2 2 5 8 9 2


                            We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.





                            Or a little more concise which gives the same result



                            df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]





                            share|improve this answer















                            A base R approach by using ave would be



                            df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 
                            FUN = function(x) all(x == 0))), ]

                            # GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                            #5 2 9 8 2 7 1
                            #6 2 2 3 3 2 6
                            #7 2 2 5 8 9 2


                            We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.





                            Or a little more concise which gives the same result



                            df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]






                            share|improve this answer














                            share|improve this answer



                            share|improve this answer








                            edited Nov 14 '18 at 13:27

























                            answered Nov 14 '18 at 12:52









                            Ronak ShahRonak Shah

                            38.2k104161




                            38.2k104161























                                0














                                Here is another option using the number of NA per group (in case it is something you are also interested in).



                                GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
                                DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
                                DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5

                                # output
                                GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                                5 2 9 8 2 7 1
                                6 2 2 3 3 2 6
                                7 2 2 5 8 9 2





                                share|improve this answer






























                                  0














                                  Here is another option using the number of NA per group (in case it is something you are also interested in).



                                  GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
                                  DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
                                  DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5

                                  # output
                                  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                                  5 2 9 8 2 7 1
                                  6 2 2 3 3 2 6
                                  7 2 2 5 8 9 2





                                  share|improve this answer




























                                    0












                                    0








                                    0







                                    Here is another option using the number of NA per group (in case it is something you are also interested in).



                                    GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
                                    DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
                                    DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5

                                    # output
                                    GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                                    5 2 9 8 2 7 1
                                    6 2 2 3 3 2 6
                                    7 2 2 5 8 9 2





                                    share|improve this answer















                                    Here is another option using the number of NA per group (in case it is something you are also interested in).



                                    GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
                                    DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
                                    DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5

                                    # output
                                    GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
                                    5 2 9 8 2 7 1
                                    6 2 2 3 3 2 6
                                    7 2 2 5 8 9 2






                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Nov 14 '18 at 13:31

























                                    answered Nov 14 '18 at 13:25









                                    ANGANG

                                    4,4712720




                                    4,4712720






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300515%2fsubset-to-only-return-complete-cases-on-group-basis-with-multiple-rows%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Bressuire

                                        Vorschmack

                                        Quarantine