Using is.na with Sapply function in R
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NA
s when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
add a comment |
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NA
s when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
1
You are looping through the columns (withsapply
- assumingX
is adata.frame
), get the number of NA elements (by doing thesum
of logical vector (is.na(x)
) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming thatX
is the same asairports
, or is a subset of theairports
columns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. DroppingNA
s in sum would like likesum(..something.., na.rm = TRUE)
. In this case what is being summed is whether or not each value isNA
,sum(is.na(x))
essentially means "count the number ofNA
s inx
"
– Gregor
Nov 12 '18 at 21:42
add a comment |
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NA
s when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
Can anyone tell me what the line of code written below do?
sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100
What is understood is that it will drop NA
s when it applies the sum function but keeps them in the matrix.
Any help is appreciated.
Thank you
r lapply na sapply
r lapply na sapply
edited Nov 12 '18 at 21:46
Joe
2,9891736
2,9891736
asked Nov 12 '18 at 21:30
srkale
92
92
1
You are looping through the columns (withsapply
- assumingX
is adata.frame
), get the number of NA elements (by doing thesum
of logical vector (is.na(x)
) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming thatX
is the same asairports
, or is a subset of theairports
columns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. DroppingNA
s in sum would like likesum(..something.., na.rm = TRUE)
. In this case what is being summed is whether or not each value isNA
,sum(is.na(x))
essentially means "count the number ofNA
s inx
"
– Gregor
Nov 12 '18 at 21:42
add a comment |
1
You are looping through the columns (withsapply
- assumingX
is adata.frame
), get the number of NA elements (by doing thesum
of logical vector (is.na(x)
) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming thatX
is the same asairports
, or is a subset of theairports
columns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. DroppingNA
s in sum would like likesum(..something.., na.rm = TRUE)
. In this case what is being summed is whether or not each value isNA
,sum(is.na(x))
essentially means "count the number ofNA
s inx
"
– Gregor
Nov 12 '18 at 21:42
1
1
You are looping through the columns (with
sapply
- assuming X
is a data.frame
), get the number of NA elements (by doing the sum
of logical vector (is.na(x)
) and divide by the number of rows or airports.– akrun
Nov 12 '18 at 21:35
You are looping through the columns (with
sapply
- assuming X
is a data.frame
), get the number of NA elements (by doing the sum
of logical vector (is.na(x)
) and divide by the number of rows or airports.– akrun
Nov 12 '18 at 21:35
1
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
2
@EnriquePérezHerrero not by row, by column. (And assuming that
X
is the same as airports
, or is a subset of the airports
columns, or has the same number of rows. That part isn't clear at all.)– Gregor
Nov 12 '18 at 21:39
@EnriquePérezHerrero not by row, by column. (And assuming that
X
is the same as airports
, or is a subset of the airports
columns, or has the same number of rows. That part isn't clear at all.)– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. Dropping
NA
s in sum would like like sum(..something.., na.rm = TRUE)
. In this case what is being summed is whether or not each value is NA
, sum(is.na(x))
essentially means "count the number of NA
s in x
"– Gregor
Nov 12 '18 at 21:42
@srkale it is not dropping anything. Dropping
NA
s in sum would like like sum(..something.., na.rm = TRUE)
. In this case what is being summed is whether or not each value is NA
, sum(is.na(x))
essentially means "count the number of NA
s in x
"– Gregor
Nov 12 '18 at 21:42
add a comment |
1 Answer
1
active
oldest
votes
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X
, then divides the result by the number of rows in airports
and multiplies by 100. Calculating the percentage of missing values in each column, assuming X
has the same number of rows as airports
.
It's strange to mix and match the columns of X
with the nrow(airports)
, I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports)
or sapply(X, ...) / nrow(X)
.
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum
ignoring the NA
values, you do sum(foo, na.rm = TRUE)
. Instead, here, *what is being summed is is.na(x)
, that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo))
is the idiomatic way to count the number of NA
values in foo
.
In this case, where the goal is a percent not a count, we can simplify by using mean()
instead of sum() / n
:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na()
on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270393%2fusing-is-na-with-sapply-function-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X
, then divides the result by the number of rows in airports
and multiplies by 100. Calculating the percentage of missing values in each column, assuming X
has the same number of rows as airports
.
It's strange to mix and match the columns of X
with the nrow(airports)
, I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports)
or sapply(X, ...) / nrow(X)
.
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum
ignoring the NA
values, you do sum(foo, na.rm = TRUE)
. Instead, here, *what is being summed is is.na(x)
, that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo))
is the idiomatic way to count the number of NA
values in foo
.
In this case, where the goal is a percent not a count, we can simplify by using mean()
instead of sum() / n
:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na()
on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X
, then divides the result by the number of rows in airports
and multiplies by 100. Calculating the percentage of missing values in each column, assuming X
has the same number of rows as airports
.
It's strange to mix and match the columns of X
with the nrow(airports)
, I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports)
or sapply(X, ...) / nrow(X)
.
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum
ignoring the NA
values, you do sum(foo, na.rm = TRUE)
. Instead, here, *what is being summed is is.na(x)
, that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo))
is the idiomatic way to count the number of NA
values in foo
.
In this case, where the goal is a percent not a count, we can simplify by using mean()
instead of sum() / n
:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na()
on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X
, then divides the result by the number of rows in airports
and multiplies by 100. Calculating the percentage of missing values in each column, assuming X
has the same number of rows as airports
.
It's strange to mix and match the columns of X
with the nrow(airports)
, I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports)
or sapply(X, ...) / nrow(X)
.
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum
ignoring the NA
values, you do sum(foo, na.rm = TRUE)
. Instead, here, *what is being summed is is.na(x)
, that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo))
is the idiomatic way to count the number of NA
values in foo
.
In this case, where the goal is a percent not a count, we can simplify by using mean()
instead of sum() / n
:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na()
on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
Enough comments, time for an answer:
sapply(X, # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100
In words, it counts the number of missing values in each column of X
, then divides the result by the number of rows in airports
and multiplies by 100. Calculating the percentage of missing values in each column, assuming X
has the same number of rows as airports
.
It's strange to mix and match the columns of X
with the nrow(airports)
, I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports)
or sapply(X, ...) / nrow(X)
.
As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum
ignoring the NA
values, you do sum(foo, na.rm = TRUE)
. Instead, here, *what is being summed is is.na(x)
, that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo))
is the idiomatic way to count the number of NA
values in foo
.
In this case, where the goal is a percent not a count, we can simplify by using mean()
instead of sum() / n
:
# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100
We could also use is.na()
on the entire data so we don't need the "anonymous function":
# rearrange for more simplicity
sapply(is.na(airports), mean) * 100
edited Nov 12 '18 at 22:04
answered Nov 12 '18 at 21:47
Gregor
62.7k988167
62.7k988167
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270393%2fusing-is-na-with-sapply-function-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
You are looping through the columns (with
sapply
- assumingX
is adata.frame
), get the number of NA elements (by doing thesum
of logical vector (is.na(x)
) and divide by the number of rows or airports.– akrun
Nov 12 '18 at 21:35
1
I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38
2
@EnriquePérezHerrero not by row, by column. (And assuming that
X
is the same asairports
, or is a subset of theairports
columns, or has the same number of rows. That part isn't clear at all.)– Gregor
Nov 12 '18 at 21:39
@srkale it is not dropping anything. Dropping
NA
s in sum would like likesum(..something.., na.rm = TRUE)
. In this case what is being summed is whether or not each value isNA
,sum(is.na(x))
essentially means "count the number ofNA
s inx
"– Gregor
Nov 12 '18 at 21:42