Using is.na with Sapply function in R












1














Can anyone tell me what the line of code written below do?



sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100



What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.



Any help is appreciated.



Thank you










share|improve this question




















  • 1




    You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.
    – akrun
    Nov 12 '18 at 21:35








  • 1




    I think it counts the percentage of NA's entries by row
    – Enrique Pérez Herrero
    Nov 12 '18 at 21:38






  • 2




    @EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)
    – Gregor
    Nov 12 '18 at 21:39












  • @srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"
    – Gregor
    Nov 12 '18 at 21:42
















1














Can anyone tell me what the line of code written below do?



sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100



What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.



Any help is appreciated.



Thank you










share|improve this question




















  • 1




    You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.
    – akrun
    Nov 12 '18 at 21:35








  • 1




    I think it counts the percentage of NA's entries by row
    – Enrique Pérez Herrero
    Nov 12 '18 at 21:38






  • 2




    @EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)
    – Gregor
    Nov 12 '18 at 21:39












  • @srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"
    – Gregor
    Nov 12 '18 at 21:42














1












1








1







Can anyone tell me what the line of code written below do?



sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100



What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.



Any help is appreciated.



Thank you










share|improve this question















Can anyone tell me what the line of code written below do?



sapply(X, function(x) sum(is.na(x))) / nrow(airports) * 100



What is understood is that it will drop NAs when it applies the sum function but keeps them in the matrix.



Any help is appreciated.



Thank you







r lapply na sapply






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 '18 at 21:46









Joe

2,9891736




2,9891736










asked Nov 12 '18 at 21:30









srkale

92




92








  • 1




    You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.
    – akrun
    Nov 12 '18 at 21:35








  • 1




    I think it counts the percentage of NA's entries by row
    – Enrique Pérez Herrero
    Nov 12 '18 at 21:38






  • 2




    @EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)
    – Gregor
    Nov 12 '18 at 21:39












  • @srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"
    – Gregor
    Nov 12 '18 at 21:42














  • 1




    You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.
    – akrun
    Nov 12 '18 at 21:35








  • 1




    I think it counts the percentage of NA's entries by row
    – Enrique Pérez Herrero
    Nov 12 '18 at 21:38






  • 2




    @EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)
    – Gregor
    Nov 12 '18 at 21:39












  • @srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"
    – Gregor
    Nov 12 '18 at 21:42








1




1




You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35






You are looping through the columns (with sapply - assuming X is a data.frame), get the number of NA elements (by doing the sum of logical vector (is.na(x)) and divide by the number of rows or airports.
– akrun
Nov 12 '18 at 21:35






1




1




I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38




I think it counts the percentage of NA's entries by row
– Enrique Pérez Herrero
Nov 12 '18 at 21:38




2




2




@EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39






@EnriquePérezHerrero not by row, by column. (And assuming that X is the same as airports, or is a subset of the airports columns, or has the same number of rows. That part isn't clear at all.)
– Gregor
Nov 12 '18 at 21:39














@srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"
– Gregor
Nov 12 '18 at 21:42




@srkale it is not dropping anything. Dropping NAs in sum would like like sum(..something.., na.rm = TRUE). In this case what is being summed is whether or not each value is NA, sum(is.na(x)) essentially means "count the number of NAs in x"
– Gregor
Nov 12 '18 at 21:42












1 Answer
1






active

oldest

votes


















3














Enough comments, time for an answer:



sapply(X,      # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100


In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.



It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).



As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.



In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:



# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100


We could also use is.na() on the entire data so we don't need the "anonymous function":



# rearrange for more simplicity
sapply(is.na(airports), mean) * 100





share|improve this answer























  • Thank you for the explanation! I appreciate it!
    – srkale
    Nov 21 '18 at 18:15











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270393%2fusing-is-na-with-sapply-function-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














Enough comments, time for an answer:



sapply(X,      # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100


In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.



It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).



As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.



In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:



# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100


We could also use is.na() on the entire data so we don't need the "anonymous function":



# rearrange for more simplicity
sapply(is.na(airports), mean) * 100





share|improve this answer























  • Thank you for the explanation! I appreciate it!
    – srkale
    Nov 21 '18 at 18:15
















3














Enough comments, time for an answer:



sapply(X,      # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100


In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.



It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).



As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.



In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:



# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100


We could also use is.na() on the entire data so we don't need the "anonymous function":



# rearrange for more simplicity
sapply(is.na(airports), mean) * 100





share|improve this answer























  • Thank you for the explanation! I appreciate it!
    – srkale
    Nov 21 '18 at 18:15














3












3








3






Enough comments, time for an answer:



sapply(X,      # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100


In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.



It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).



As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.



In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:



# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100


We could also use is.na() on the entire data so we don't need the "anonymous function":



# rearrange for more simplicity
sapply(is.na(airports), mean) * 100





share|improve this answer














Enough comments, time for an answer:



sapply(X,      # apply to each item of X (each column, if X is a data frame)
function(x) # this function:
sum(is.na(x)) # count the NAs
) / nrow(airports) * 100 # then divide the result by the number of rows in the the airports object
# and multiply by 100


In words, it counts the number of missing values in each column of X, then divides the result by the number of rows in airports and multiplies by 100. Calculating the percentage of missing values in each column, assuming X has the same number of rows as airports.



It's strange to mix and match the columns of X with the nrow(airports), I would expect those to be the same (that is, either sapply(airports, ...) / nrow(airports) or sapply(X, ...) / nrow(X).



As I mentioned in comments, nothing is being "dropped". If you wanted to do a sum ignoring the NA values, you do sum(foo, na.rm = TRUE). Instead, here, *what is being summed is is.na(x), that is we are summing whether or not each value is missing: counting missing values. sum(is.na(foo)) is the idiomatic way to count the number of NA values in foo.



In this case, where the goal is a percent not a count, we can simplify by using mean() instead of sum() / n:



# slightly simpler, consistent object
sapply(airports, function(x) mean(is.na(x))) * 100


We could also use is.na() on the entire data so we don't need the "anonymous function":



# rearrange for more simplicity
sapply(is.na(airports), mean) * 100






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 12 '18 at 22:04

























answered Nov 12 '18 at 21:47









Gregor

62.7k988167




62.7k988167












  • Thank you for the explanation! I appreciate it!
    – srkale
    Nov 21 '18 at 18:15


















  • Thank you for the explanation! I appreciate it!
    – srkale
    Nov 21 '18 at 18:15
















Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15




Thank you for the explanation! I appreciate it!
– srkale
Nov 21 '18 at 18:15


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53270393%2fusing-is-na-with-sapply-function-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python