Successive sum of column only if another column has the good value in R
I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):
Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1
I'd like to have the sum of values for successive criteria so I would get something like this:
Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1
So like this
I can't find what to do, how can I deal with that?
Thanks.
EDITED: More accurate dataframe
r sum
add a comment |
I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):
Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1
I'd like to have the sum of values for successive criteria so I would get something like this:
Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1
So like this
I can't find what to do, how can I deal with that?
Thanks.
EDITED: More accurate dataframe
r sum
So you want each row where Zone1 is true to have the length of the previous Zone1 run?
– tstenner
Nov 14 '18 at 20:15
Yes it's that, but your code is working, I'll mark it as a solution :)
– Clément POIRET
Nov 15 '18 at 14:57
add a comment |
I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):
Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1
I'd like to have the sum of values for successive criteria so I would get something like this:
Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1
So like this
I can't find what to do, how can I deal with that?
Thanks.
EDITED: More accurate dataframe
r sum
I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):
Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1
I'd like to have the sum of values for successive criteria so I would get something like this:
Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1
So like this
I can't find what to do, how can I deal with that?
Thanks.
EDITED: More accurate dataframe
r sum
r sum
edited Nov 14 '18 at 13:37
Clément POIRET
asked Nov 14 '18 at 12:52
Clément POIRETClément POIRET
277
277
So you want each row where Zone1 is true to have the length of the previous Zone1 run?
– tstenner
Nov 14 '18 at 20:15
Yes it's that, but your code is working, I'll mark it as a solution :)
– Clément POIRET
Nov 15 '18 at 14:57
add a comment |
So you want each row where Zone1 is true to have the length of the previous Zone1 run?
– tstenner
Nov 14 '18 at 20:15
Yes it's that, but your code is working, I'll mark it as a solution :)
– Clément POIRET
Nov 15 '18 at 14:57
So you want each row where Zone1 is true to have the length of the previous Zone1 run?
– tstenner
Nov 14 '18 at 20:15
So you want each row where Zone1 is true to have the length of the previous Zone1 run?
– tstenner
Nov 14 '18 at 20:15
Yes it's that, but your code is working, I'll mark it as a solution :)
– Clément POIRET
Nov 15 '18 at 14:57
Yes it's that, but your code is working, I'll mark it as a solution :)
– Clément POIRET
Nov 15 '18 at 14:57
add a comment |
1 Answer
1
active
oldest
votes
I'm not entirely sure, where the last two rows came from, but here's my take on it:
library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)
Result:
Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2
To have the sum of the last block, use some data.table magic:
setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]
Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?
– Clément POIRET
Nov 14 '18 at 13:45
I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!
– Clément POIRET
Nov 15 '18 at 14:58
That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.
– tstenner
Nov 15 '18 at 16:04
Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum
– Clément POIRET
Nov 15 '18 at 18:39
Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]
) can also be a join, trydf[ nextblocksums, ]
to see it in action.
– tstenner
Nov 16 '18 at 9:36
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300700%2fsuccessive-sum-of-column-only-if-another-column-has-the-good-value-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I'm not entirely sure, where the last two rows came from, but here's my take on it:
library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)
Result:
Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2
To have the sum of the last block, use some data.table magic:
setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]
Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?
– Clément POIRET
Nov 14 '18 at 13:45
I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!
– Clément POIRET
Nov 15 '18 at 14:58
That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.
– tstenner
Nov 15 '18 at 16:04
Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum
– Clément POIRET
Nov 15 '18 at 18:39
Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]
) can also be a join, trydf[ nextblocksums, ]
to see it in action.
– tstenner
Nov 16 '18 at 9:36
add a comment |
I'm not entirely sure, where the last two rows came from, but here's my take on it:
library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)
Result:
Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2
To have the sum of the last block, use some data.table magic:
setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]
Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?
– Clément POIRET
Nov 14 '18 at 13:45
I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!
– Clément POIRET
Nov 15 '18 at 14:58
That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.
– tstenner
Nov 15 '18 at 16:04
Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum
– Clément POIRET
Nov 15 '18 at 18:39
Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]
) can also be a join, trydf[ nextblocksums, ]
to see it in action.
– tstenner
Nov 16 '18 at 9:36
add a comment |
I'm not entirely sure, where the last two rows came from, but here's my take on it:
library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)
Result:
Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2
To have the sum of the last block, use some data.table magic:
setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]
I'm not entirely sure, where the last two rows came from, but here's my take on it:
library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)
Result:
Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2
To have the sum of the last block, use some data.table magic:
setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]
edited Nov 16 '18 at 9:34
answered Nov 14 '18 at 13:07
tstennertstenner
6,32964277
6,32964277
Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?
– Clément POIRET
Nov 14 '18 at 13:45
I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!
– Clément POIRET
Nov 15 '18 at 14:58
That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.
– tstenner
Nov 15 '18 at 16:04
Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum
– Clément POIRET
Nov 15 '18 at 18:39
Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]
) can also be a join, trydf[ nextblocksums, ]
to see it in action.
– tstenner
Nov 16 '18 at 9:36
add a comment |
Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?
– Clément POIRET
Nov 14 '18 at 13:45
I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!
– Clément POIRET
Nov 15 '18 at 14:58
That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.
– tstenner
Nov 15 '18 at 16:04
Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum
– Clément POIRET
Nov 15 '18 at 18:39
Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]
) can also be a join, trydf[ nextblocksums, ]
to see it in action.
– tstenner
Nov 16 '18 at 9:36
Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?
– Clément POIRET
Nov 14 '18 at 13:45
Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?
– Clément POIRET
Nov 14 '18 at 13:45
I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!
– Clément POIRET
Nov 15 '18 at 14:58
I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!
– Clément POIRET
Nov 15 '18 at 14:58
That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.
– tstenner
Nov 15 '18 at 16:04
That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.
– tstenner
Nov 15 '18 at 16:04
Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum
– Clément POIRET
Nov 15 '18 at 18:39
Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum
– Clément POIRET
Nov 15 '18 at 18:39
Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (
df[i, j]
) can also be a join, try df[ nextblocksums, ]
to see it in action.– tstenner
Nov 16 '18 at 9:36
Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (
df[i, j]
) can also be a join, try df[ nextblocksums, ]
to see it in action.– tstenner
Nov 16 '18 at 9:36
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300700%2fsuccessive-sum-of-column-only-if-another-column-has-the-good-value-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
So you want each row where Zone1 is true to have the length of the previous Zone1 run?
– tstenner
Nov 14 '18 at 20:15
Yes it's that, but your code is working, I'll mark it as a solution :)
– Clément POIRET
Nov 15 '18 at 14:57