Subset to only return complete cases on group basis with multiple rows

If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:

GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5

1             2        4        5        6        6

1             3        4        2        3        4

1             4       NA        6        6        3

1             3        2        1        2        4

2             9        8        2        7        1

2             2        3        3        2        6

2             2        5        8        9        2

3             2        1        5        6       NA

3            NA       NA        5        8        9

3             4        2        6        8        7

In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.

How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

asked Nov 14 '18 at 12:43

Chao

494

add a comment |

If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:

GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5

1             2        4        5        6        6

1             3        4        2        3        4

1             4       NA        6        6        3

1             3        2        1        2        4

2             9        8        2        7        1

2             2        3        3        2        6

2             2        5        8        9        2

3             2        1        5        6       NA

3            NA       NA        5        8        9

3             4        2        6        8        7

How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

asked Nov 14 '18 at 12:43

Chao

494

add a comment |

If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:

GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5

1             2        4        5        6        6

1             3        4        2        3        4

1             4       NA        6        6        3

1             3        2        1        2        4

2             9        8        2        7        1

2             2        3        3        2        6

2             2        5        8        9        2

3             2        1        5        6       NA

3            NA       NA        5        8        9

3             4        2        6        8        7

How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

asked Nov 14 '18 at 12:43

Chao

494

If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:

GROUP_ID    VAR_1    VAR_2    VAR_3    VAR_4    VAR_5

1             2        4        5        6        6

1             3        4        2        3        4

1             4       NA        6        6        3

1             3        2        1        2        4

2             9        8        2        7        1

2             2        3        3        2        6

2             2        5        8        9        2

3             2        1        5        6       NA

3            NA       NA        5        8        9

3             4        2        6        8        7

How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?

r dplyr data.table plyr tidyr

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

asked Nov 14 '18 at 12:43

Chao

494

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

asked Nov 14 '18 at 12:43

Chao

494

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

edited Nov 22 '18 at 2:16

Ronak Shah

38.2k104161

asked Nov 14 '18 at 12:43

Chao

494

asked Nov 14 '18 at 12:43

Chao

494

asked Nov 14 '18 at 12:43

Chao

494

add a comment |

4 Answers
4

active

oldest

votes

You can use dplyr for this. First group_by your grouping variable(s), and then use filter_at to filter on the variables that you care about complete cases for. Here we use starts_with to select all the VAR variables (in fact because there are no other columns we could have used filter_all). Then we use all_vars to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars)."

library(tidyverse)

tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))



tbl %>%

  group_by(GROUP_ID) %>%

  filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))

#> # A tibble: 3 x 6

#> # Groups:   GROUP_ID [1]

#>   GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#>      <int> <int> <int> <int> <int> <int>

#> 1        2     9     8     2     7     1

#> 2        2     2     3     3     2     6

#> 3        2     2     5     8     9     2

^{Created on 2018-11-14 by the reprex package (v0.2.1)}

edited Nov 15 '18 at 19:51

answered Nov 14 '18 at 12:53

Calum You

7,3501829

1

If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

– Chao
Nov 14 '18 at 13:11

all_vars applies to those selected in vars(). variables not selected will be ignored

– Calum You
Nov 14 '18 at 19:28

Great! Thank you!

– Chao
Nov 15 '18 at 12:59

add a comment |

Here's a simple one-line base R solution -

DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]



  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:06

answered Nov 14 '18 at 13:01

Shree

3,4111323

add a comment |

A base R approach by using ave would be

df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 

                     FUN = function(x) all(x == 0))), ]



#  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#5        2     9     8     2     7     1

#6        2     2     3     3     2     6

#7        2     2     5     8     9     2

We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.

Or a little more concise which gives the same result

df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]

edited Nov 14 '18 at 13:27

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

add a comment |

Here is another option using the number of NA per group (in case it is something you are also interested in).

GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),

                            DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5

DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5



# output

  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:31

answered Nov 14 '18 at 13:25

ANG

4,4712720

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300515%2fsubset-to-only-return-complete-cases-on-group-basis-with-multiple-rows%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

library(tidyverse)

tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))



tbl %>%

  group_by(GROUP_ID) %>%

  filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))

#> # A tibble: 3 x 6

#> # Groups:   GROUP_ID [1]

#>   GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#>      <int> <int> <int> <int> <int> <int>

#> 1        2     9     8     2     7     1

#> 2        2     2     3     3     2     6

#> 3        2     2     5     8     9     2

^{Created on 2018-11-14 by the reprex package (v0.2.1)}

edited Nov 15 '18 at 19:51

answered Nov 14 '18 at 12:53

Calum You

7,3501829

1

If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

– Chao
Nov 14 '18 at 13:11

all_vars applies to those selected in vars(). variables not selected will be ignored

– Calum You
Nov 14 '18 at 19:28

Great! Thank you!

– Chao
Nov 15 '18 at 12:59

add a comment |

library(tidyverse)

tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))



tbl %>%

  group_by(GROUP_ID) %>%

  filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))

#> # A tibble: 3 x 6

#> # Groups:   GROUP_ID [1]

#>   GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#>      <int> <int> <int> <int> <int> <int>

#> 1        2     9     8     2     7     1

#> 2        2     2     3     3     2     6

#> 3        2     2     5     8     9     2

^{Created on 2018-11-14 by the reprex package (v0.2.1)}

edited Nov 15 '18 at 19:51

answered Nov 14 '18 at 12:53

Calum You

7,3501829

1

If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

– Chao
Nov 14 '18 at 13:11

all_vars applies to those selected in vars(). variables not selected will be ignored

– Calum You
Nov 14 '18 at 19:28

Great! Thank you!

– Chao
Nov 15 '18 at 12:59

add a comment |

library(tidyverse)

tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))



tbl %>%

  group_by(GROUP_ID) %>%

  filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))

#> # A tibble: 3 x 6

#> # Groups:   GROUP_ID [1]

#>   GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#>      <int> <int> <int> <int> <int> <int>

#> 1        2     9     8     2     7     1

#> 2        2     2     3     3     2     6

#> 3        2     2     5     8     9     2

^{Created on 2018-11-14 by the reprex package (v0.2.1)}

edited Nov 15 '18 at 19:51

answered Nov 14 '18 at 12:53

Calum You

7,3501829

library(tidyverse)

tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))



tbl %>%

  group_by(GROUP_ID) %>%

  filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))

#> # A tibble: 3 x 6

#> # Groups:   GROUP_ID [1]

#>   GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#>      <int> <int> <int> <int> <int> <int>

#> 1        2     9     8     2     7     1

#> 2        2     2     3     3     2     6

#> 3        2     2     5     8     9     2

^{Created on 2018-11-14 by the reprex package (v0.2.1)}

edited Nov 15 '18 at 19:51

answered Nov 14 '18 at 12:53

Calum You

7,3501829

edited Nov 15 '18 at 19:51

answered Nov 14 '18 at 12:53

Calum You

7,3501829

answered Nov 14 '18 at 12:53

Calum You

7,3501829

answered Nov 14 '18 at 12:53

Calum You

7,3501829

1

If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

– Chao
Nov 14 '18 at 13:11

all_vars applies to those selected in vars(). variables not selected will be ignored

– Calum You
Nov 14 '18 at 19:28

Great! Thank you!

– Chao
Nov 15 '18 at 12:59

add a comment |

1

If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

– Chao
Nov 14 '18 at 13:11

all_vars applies to those selected in vars(). variables not selected will be ignored

– Calum You
Nov 14 '18 at 19:28

Great! Thank you!

– Chao
Nov 15 '18 at 12:59

If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?

– Chao
Nov 14 '18 at 13:11

all_vars applies to those selected in vars(). variables not selected will be ignored

– Calum You
Nov 14 '18 at 19:28

Great! Thank you!

– Chao
Nov 15 '18 at 12:59

add a comment |

Here's a simple one-line base R solution -

DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]



  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:06

answered Nov 14 '18 at 13:01

Shree

3,4111323

add a comment |

Here's a simple one-line base R solution -

DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]



  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:06

answered Nov 14 '18 at 13:01

Shree

3,4111323

add a comment |

Here's a simple one-line base R solution -

DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]



  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:06

answered Nov 14 '18 at 13:01

Shree

3,4111323

Here's a simple one-line base R solution -

DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]



  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:06

answered Nov 14 '18 at 13:01

Shree

3,4111323

edited Nov 14 '18 at 13:06

answered Nov 14 '18 at 13:01

Shree

3,4111323

answered Nov 14 '18 at 13:01

Shree

3,4111323

answered Nov 14 '18 at 13:01

Shree

3,4111323

add a comment |

A base R approach by using ave would be

df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 

                     FUN = function(x) all(x == 0))), ]



#  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#5        2     9     8     2     7     1

#6        2     2     3     3     2     6

#7        2     2     5     8     9     2

We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.

Or a little more concise which gives the same result

df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]

edited Nov 14 '18 at 13:27

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

add a comment |

A base R approach by using ave would be

df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 

                     FUN = function(x) all(x == 0))), ]



#  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#5        2     9     8     2     7     1

#6        2     2     3     3     2     6

#7        2     2     5     8     9     2

We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.

Or a little more concise which gives the same result

df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]

edited Nov 14 '18 at 13:27

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

add a comment |

A base R approach by using ave would be

df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 

                     FUN = function(x) all(x == 0))), ]



#  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#5        2     9     8     2     7     1

#6        2     2     3     3     2     6

#7        2     2     5     8     9     2

We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.

Or a little more concise which gives the same result

df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]

edited Nov 14 '18 at 13:27

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

A base R approach by using ave would be

df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID, 

                     FUN = function(x) all(x == 0))), ]



#  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

#5        2     9     8     2     7     1

#6        2     2     3     3     2     6

#7        2     2     5     8     9     2

We count the number of NAs in every row and filter the rows which have 0 NA's for all of the row values in each GROUP_ID.

Or a little more concise which gives the same result

df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]

edited Nov 14 '18 at 13:27

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

edited Nov 14 '18 at 13:27

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

answered Nov 14 '18 at 12:52

Ronak Shah

38.2k104161

add a comment |

Here is another option using the number of NA per group (in case it is something you are also interested in).

GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),

                            DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5

DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5



# output

  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:31

answered Nov 14 '18 at 13:25

ANG

4,4712720

add a comment |

Here is another option using the number of NA per group (in case it is something you are also interested in).

GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),

                            DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5

DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5



# output

  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:31

answered Nov 14 '18 at 13:25

ANG

4,4712720

add a comment |

Here is another option using the number of NA per group (in case it is something you are also interested in).

GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),

                            DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5

DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5



# output

  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:31

answered Nov 14 '18 at 13:25

ANG

4,4712720

Here is another option using the number of NA per group (in case it is something you are also interested in).

GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),

                            DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5

DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5



# output

  GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5

5        2     9     8     2     7     1

6        2     2     3     3     2     6

7        2     2     5     8     9     2

edited Nov 14 '18 at 13:31

answered Nov 14 '18 at 13:25

ANG

4,4712720

edited Nov 14 '18 at 13:31

answered Nov 14 '18 at 13:25

ANG

4,4712720

answered Nov 14 '18 at 13:25

ANG

4,4712720

answered Nov 14 '18 at 13:25

ANG

4,4712720

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

r Ayypcz TWzffhN,v r25Iyb2wZJnh56jg1Qrd2diUs eOqO7H2Y op,cp6ETi0cHgOX12Plpf4k6lq4DB7Gj Uo,ttyrB 8f

搜尋此網誌

Vfrdtyky