Subset to only return complete cases on group basis with multiple rows
If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
1 2 4 5 6 6
1 3 4 2 3 4
1 4 NA 6 6 3
1 3 2 1 2 4
2 9 8 2 7 1
2 2 3 3 2 6
2 2 5 8 9 2
3 2 1 5 6 NA
3 NA NA 5 8 9
3 4 2 6 8 7
In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.
How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?
r dplyr data.table plyr tidyr
add a comment |
If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
1 2 4 5 6 6
1 3 4 2 3 4
1 4 NA 6 6 3
1 3 2 1 2 4
2 9 8 2 7 1
2 2 3 3 2 6
2 2 5 8 9 2
3 2 1 5 6 NA
3 NA NA 5 8 9
3 4 2 6 8 7
In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.
How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?
r dplyr data.table plyr tidyr
add a comment |
If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
1 2 4 5 6 6
1 3 4 2 3 4
1 4 NA 6 6 3
1 3 2 1 2 4
2 9 8 2 7 1
2 2 3 3 2 6
2 2 5 8 9 2
3 2 1 5 6 NA
3 NA NA 5 8 9
3 4 2 6 8 7
In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.
How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?
r dplyr data.table plyr tidyr
If I have a data.frame, DF, with 5 variables with varying groups with between 1 and 24 rows - exemplified with this:
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
1 2 4 5 6 6
1 3 4 2 3 4
1 4 NA 6 6 3
1 3 2 1 2 4
2 9 8 2 7 1
2 2 3 3 2 6
2 2 5 8 9 2
3 2 1 5 6 NA
3 NA NA 5 8 9
3 4 2 6 8 7
In this, complete case would be defined as groups with has no NA in any of the variables in any of the rows with that respective group_ID. In this example, only GROUP_ID 2 would represent a complete case.
How do I subset the data.frame in such way that it finds the groups with complete cases and return all the rows with that group ID (in this example only returning all the rows for GROUP_ID 2)?
r dplyr data.table plyr tidyr
r dplyr data.table plyr tidyr
edited Nov 22 '18 at 2:16
Ronak Shah
38.2k104161
38.2k104161
asked Nov 14 '18 at 12:43
ChaoChao
494
494
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
You can use dplyr
for this. First group_by
your grouping variable(s), and then use filter_at
to filter on the variables that you care about complete cases for. Here we use starts_with
to select all the VAR
variables (in fact because there are no other columns we could have used filter_all
). Then we use all_vars
to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all
the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars
)."
library(tidyverse)
tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
tbl %>%
group_by(GROUP_ID) %>%
filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
#> # A tibble: 3 x 6
#> # Groups: GROUP_ID [1]
#> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#> <int> <int> <int> <int> <int> <int>
#> 1 2 9 8 2 7 1
#> 2 2 2 3 3 2 6
#> 3 2 2 5 8 9 2
Created on 2018-11-14 by the reprex package (v0.2.1)
1
If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?
– Chao
Nov 14 '18 at 13:11
all_vars applies to those selected in vars(). variables not selected will be ignored
– Calum You
Nov 14 '18 at 19:28
Great! Thank you!
– Chao
Nov 15 '18 at 12:59
add a comment |
Here's a simple one-line base R solution -
DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
add a comment |
A base R approach by using ave
would be
df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID,
FUN = function(x) all(x == 0))), ]
# GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#5 2 9 8 2 7 1
#6 2 2 3 3 2 6
#7 2 2 5 8 9 2
We count the number of NA
s in every row and filter the rows which have 0 NA
's for all
of the row values in each GROUP_ID
.
Or a little more concise which gives the same result
df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]
add a comment |
Here is another option using the number of NA
per group (in case it is something you are also interested in).
GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5
# output
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300515%2fsubset-to-only-return-complete-cases-on-group-basis-with-multiple-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
You can use dplyr
for this. First group_by
your grouping variable(s), and then use filter_at
to filter on the variables that you care about complete cases for. Here we use starts_with
to select all the VAR
variables (in fact because there are no other columns we could have used filter_all
). Then we use all_vars
to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all
the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars
)."
library(tidyverse)
tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
tbl %>%
group_by(GROUP_ID) %>%
filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
#> # A tibble: 3 x 6
#> # Groups: GROUP_ID [1]
#> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#> <int> <int> <int> <int> <int> <int>
#> 1 2 9 8 2 7 1
#> 2 2 2 3 3 2 6
#> 3 2 2 5 8 9 2
Created on 2018-11-14 by the reprex package (v0.2.1)
1
If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?
– Chao
Nov 14 '18 at 13:11
all_vars applies to those selected in vars(). variables not selected will be ignored
– Calum You
Nov 14 '18 at 19:28
Great! Thank you!
– Chao
Nov 15 '18 at 12:59
add a comment |
You can use dplyr
for this. First group_by
your grouping variable(s), and then use filter_at
to filter on the variables that you care about complete cases for. Here we use starts_with
to select all the VAR
variables (in fact because there are no other columns we could have used filter_all
). Then we use all_vars
to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all
the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars
)."
library(tidyverse)
tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
tbl %>%
group_by(GROUP_ID) %>%
filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
#> # A tibble: 3 x 6
#> # Groups: GROUP_ID [1]
#> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#> <int> <int> <int> <int> <int> <int>
#> 1 2 9 8 2 7 1
#> 2 2 2 3 3 2 6
#> 3 2 2 5 8 9 2
Created on 2018-11-14 by the reprex package (v0.2.1)
1
If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?
– Chao
Nov 14 '18 at 13:11
all_vars applies to those selected in vars(). variables not selected will be ignored
– Calum You
Nov 14 '18 at 19:28
Great! Thank you!
– Chao
Nov 15 '18 at 12:59
add a comment |
You can use dplyr
for this. First group_by
your grouping variable(s), and then use filter_at
to filter on the variables that you care about complete cases for. Here we use starts_with
to select all the VAR
variables (in fact because there are no other columns we could have used filter_all
). Then we use all_vars
to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all
the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars
)."
library(tidyverse)
tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
tbl %>%
group_by(GROUP_ID) %>%
filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
#> # A tibble: 3 x 6
#> # Groups: GROUP_ID [1]
#> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#> <int> <int> <int> <int> <int> <int>
#> 1 2 9 8 2 7 1
#> 2 2 2 3 3 2 6
#> 3 2 2 5 8 9 2
Created on 2018-11-14 by the reprex package (v0.2.1)
You can use dplyr
for this. First group_by
your grouping variable(s), and then use filter_at
to filter on the variables that you care about complete cases for. Here we use starts_with
to select all the VAR
variables (in fact because there are no other columns we could have used filter_all
). Then we use all_vars
to wrap the predicate that checks for the values being non-missing. This basically says "for each group, check if all
the values in each column are nonmissing. Keep the row if this is true for all the columns (all_vars
)."
library(tidyverse)
tbl <- structure(list(GROUP_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), VAR_1 = c(2L, 3L, 4L, 3L, 9L, 2L, 2L, 2L, NA, 4L), VAR_2 = c(4L, 4L, NA, 2L, 8L, 3L, 5L, 1L, NA, 2L), VAR_3 = c(5L, 2L, 6L, 1L, 2L, 3L, 8L, 5L, 5L, 6L), VAR_4 = c(6L, 3L, 6L, 2L, 7L, 2L, 9L, 6L, 8L, 8L), VAR_5 = c(6L, 4L, 3L, 4L, 1L, 6L, 2L, NA, 9L, 7L)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"), spec = structure(list(cols = list(GROUP_ID = structure(list(), class = c("collector_integer", "collector")), VAR_1 = structure(list(), class = c("collector_integer", "collector")), VAR_2 = structure(list(), class = c("collector_integer", "collector")), VAR_3 = structure(list(), class = c("collector_integer", "collector")), VAR_4 = structure(list(), class = c("collector_integer", "collector")), VAR_5 = structure(list(), class = c("collector_integer", "collector"))), default = structure(list(), class = c("collector_guess", "collector"))), class = "col_spec"))
tbl %>%
group_by(GROUP_ID) %>%
filter_at(vars(starts_with("VAR")), all_vars(all(!is.na(.))))
#> # A tibble: 3 x 6
#> # Groups: GROUP_ID [1]
#> GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#> <int> <int> <int> <int> <int> <int>
#> 1 2 9 8 2 7 1
#> 2 2 2 3 3 2 6
#> 3 2 2 5 8 9 2
Created on 2018-11-14 by the reprex package (v0.2.1)
edited Nov 15 '18 at 19:51
answered Nov 14 '18 at 12:53
Calum YouCalum You
7,3501829
7,3501829
1
If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?
– Chao
Nov 14 '18 at 13:11
all_vars applies to those selected in vars(). variables not selected will be ignored
– Calum You
Nov 14 '18 at 19:28
Great! Thank you!
– Chao
Nov 15 '18 at 12:59
add a comment |
1
If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?
– Chao
Nov 14 '18 at 13:11
all_vars applies to those selected in vars(). variables not selected will be ignored
– Calum You
Nov 14 '18 at 19:28
Great! Thank you!
– Chao
Nov 15 '18 at 12:59
1
1
If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?
– Chao
Nov 14 '18 at 13:11
If I have additional variables in the data.frame, but they are to be omitted from this as only VAR_1 to VAR_5 are of interest, do the "all_vars" function apply only to variables specified within filter_at or all variables in data.frame?
– Chao
Nov 14 '18 at 13:11
all_vars applies to those selected in vars(). variables not selected will be ignored
– Calum You
Nov 14 '18 at 19:28
all_vars applies to those selected in vars(). variables not selected will be ignored
– Calum You
Nov 14 '18 at 19:28
Great! Thank you!
– Chao
Nov 15 '18 at 12:59
Great! Thank you!
– Chao
Nov 15 '18 at 12:59
add a comment |
Here's a simple one-line base R solution -
DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
add a comment |
Here's a simple one-line base R solution -
DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
add a comment |
Here's a simple one-line base R solution -
DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
Here's a simple one-line base R solution -
DF[ave(complete.cases(DF), DF$GROUP_ID, FUN = all), ]
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
edited Nov 14 '18 at 13:06
answered Nov 14 '18 at 13:01
ShreeShree
3,4111323
3,4111323
add a comment |
add a comment |
A base R approach by using ave
would be
df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID,
FUN = function(x) all(x == 0))), ]
# GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#5 2 9 8 2 7 1
#6 2 2 3 3 2 6
#7 2 2 5 8 9 2
We count the number of NA
s in every row and filter the rows which have 0 NA
's for all
of the row values in each GROUP_ID
.
Or a little more concise which gives the same result
df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]
add a comment |
A base R approach by using ave
would be
df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID,
FUN = function(x) all(x == 0))), ]
# GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#5 2 9 8 2 7 1
#6 2 2 3 3 2 6
#7 2 2 5 8 9 2
We count the number of NA
s in every row and filter the rows which have 0 NA
's for all
of the row values in each GROUP_ID
.
Or a little more concise which gives the same result
df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]
add a comment |
A base R approach by using ave
would be
df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID,
FUN = function(x) all(x == 0))), ]
# GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#5 2 9 8 2 7 1
#6 2 2 3 3 2 6
#7 2 2 5 8 9 2
We count the number of NA
s in every row and filter the rows which have 0 NA
's for all
of the row values in each GROUP_ID
.
Or a little more concise which gives the same result
df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]
A base R approach by using ave
would be
df[as.logical(ave(rowSums(is.na(df)), df$GROUP_ID,
FUN = function(x) all(x == 0))), ]
# GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
#5 2 9 8 2 7 1
#6 2 2 3 3 2 6
#7 2 2 5 8 9 2
We count the number of NA
s in every row and filter the rows which have 0 NA
's for all
of the row values in each GROUP_ID
.
Or a little more concise which gives the same result
df[ave(rowSums(is.na(df)) == 0, df$GROUP_ID, FUN = all), ]
edited Nov 14 '18 at 13:27
answered Nov 14 '18 at 12:52
Ronak ShahRonak Shah
38.2k104161
38.2k104161
add a comment |
add a comment |
Here is another option using the number of NA
per group (in case it is something you are also interested in).
GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5
# output
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
add a comment |
Here is another option using the number of NA
per group (in case it is something you are also interested in).
GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5
# output
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
add a comment |
Here is another option using the number of NA
per group (in case it is something you are also interested in).
GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5
# output
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
Here is another option using the number of NA
per group (in case it is something you are also interested in).
GROUP_ID.NA.COUNT <- tapply(rowSums(is.na(DF[paste("VAR", 1:5, sep = "_")])),
DF$GROUP_ID, FUN = sum) # number of NA per group based on values from VAR_1 to VAR_5
DF[DF$GROUP_ID %in% which(GROUP_ID.NA.COUNT == 0), ] # Subset to only return group with 0 NA based on values from VAR_1 to VAR_5
# output
GROUP_ID VAR_1 VAR_2 VAR_3 VAR_4 VAR_5
5 2 9 8 2 7 1
6 2 2 3 3 2 6
7 2 2 5 8 9 2
edited Nov 14 '18 at 13:31
answered Nov 14 '18 at 13:25
ANGANG
4,4712720
4,4712720
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300515%2fsubset-to-only-return-complete-cases-on-group-basis-with-multiple-rows%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown