Successive sum of column only if another column has the good value in R












1















I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):



Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1


I'd like to have the sum of values for successive criteria so I would get something like this:



Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1


So like this



I can't find what to do, how can I deal with that?
Thanks.



EDITED: More accurate dataframe










share|improve this question

























  • So you want each row where Zone1 is true to have the length of the previous Zone1 run?

    – tstenner
    Nov 14 '18 at 20:15











  • Yes it's that, but your code is working, I'll mark it as a solution :)

    – Clément POIRET
    Nov 15 '18 at 14:57
















1















I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):



Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1


I'd like to have the sum of values for successive criteria so I would get something like this:



Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1


So like this



I can't find what to do, how can I deal with that?
Thanks.



EDITED: More accurate dataframe










share|improve this question

























  • So you want each row where Zone1 is true to have the length of the previous Zone1 run?

    – tstenner
    Nov 14 '18 at 20:15











  • Yes it's that, but your code is working, I'll mark it as a solution :)

    – Clément POIRET
    Nov 15 '18 at 14:57














1












1








1








I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):



Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1


I'd like to have the sum of values for successive criteria so I would get something like this:



Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1


So like this



I can't find what to do, how can I deal with that?
Thanks.



EDITED: More accurate dataframe










share|improve this question
















I currently have a dataframe looking like that (with time in seconds and Zone1 a boolean):



Time Zone1
1 0
3 0
4 1
5 1
6 1
7 0
9 1
10 1


I'd like to have the sum of values for successive criteria so I would get something like this:



Time Zone1 TimeInZone
1 0 NA
3 0 NA
4 1 2
5 1 2
6 1 2
7 0 NA
9 1 1
10 1 1


So like this



I can't find what to do, how can I deal with that?
Thanks.



EDITED: More accurate dataframe







r sum






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 13:37







Clément POIRET

















asked Nov 14 '18 at 12:52









Clément POIRETClément POIRET

277




277













  • So you want each row where Zone1 is true to have the length of the previous Zone1 run?

    – tstenner
    Nov 14 '18 at 20:15











  • Yes it's that, but your code is working, I'll mark it as a solution :)

    – Clément POIRET
    Nov 15 '18 at 14:57



















  • So you want each row where Zone1 is true to have the length of the previous Zone1 run?

    – tstenner
    Nov 14 '18 at 20:15











  • Yes it's that, but your code is working, I'll mark it as a solution :)

    – Clément POIRET
    Nov 15 '18 at 14:57

















So you want each row where Zone1 is true to have the length of the previous Zone1 run?

– tstenner
Nov 14 '18 at 20:15





So you want each row where Zone1 is true to have the length of the previous Zone1 run?

– tstenner
Nov 14 '18 at 20:15













Yes it's that, but your code is working, I'll mark it as a solution :)

– Clément POIRET
Nov 15 '18 at 14:57





Yes it's that, but your code is working, I'll mark it as a solution :)

– Clément POIRET
Nov 15 '18 at 14:57












1 Answer
1






active

oldest

votes


















2














I'm not entirely sure, where the last two rows came from, but here's my take on it:



library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)


Result:



   Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2


To have the sum of the last block, use some data.table magic:



setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]





share|improve this answer


























  • Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?

    – Clément POIRET
    Nov 14 '18 at 13:45











  • I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!

    – Clément POIRET
    Nov 15 '18 at 14:58











  • That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.

    – tstenner
    Nov 15 '18 at 16:04











  • Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum

    – Clément POIRET
    Nov 15 '18 at 18:39













  • Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]) can also be a join, try df[ nextblocksums, ] to see it in action.

    – tstenner
    Nov 16 '18 at 9:36











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300700%2fsuccessive-sum-of-column-only-if-another-column-has-the-good-value-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














I'm not entirely sure, where the last two rows came from, but here's my take on it:



library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)


Result:



   Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2


To have the sum of the last block, use some data.table magic:



setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]





share|improve this answer


























  • Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?

    – Clément POIRET
    Nov 14 '18 at 13:45











  • I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!

    – Clément POIRET
    Nov 15 '18 at 14:58











  • That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.

    – tstenner
    Nov 15 '18 at 16:04











  • Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum

    – Clément POIRET
    Nov 15 '18 at 18:39













  • Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]) can also be a join, try df[ nextblocksums, ] to see it in action.

    – tstenner
    Nov 16 '18 at 9:36
















2














I'm not entirely sure, where the last two rows came from, but here's my take on it:



library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)


Result:



   Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2


To have the sum of the last block, use some data.table magic:



setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]





share|improve this answer


























  • Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?

    – Clément POIRET
    Nov 14 '18 at 13:45











  • I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!

    – Clément POIRET
    Nov 15 '18 at 14:58











  • That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.

    – tstenner
    Nov 15 '18 at 16:04











  • Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum

    – Clément POIRET
    Nov 15 '18 at 18:39













  • Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]) can also be a join, try df[ nextblocksums, ] to see it in action.

    – tstenner
    Nov 16 '18 at 9:36














2












2








2







I'm not entirely sure, where the last two rows came from, but here's my take on it:



library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)


Result:



   Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2


To have the sum of the last block, use some data.table magic:



setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]





share|improve this answer















I'm not entirely sure, where the last two rows came from, but here's my take on it:



library(data.table)
df <- data.table(Value=c(3,4,1,1,2), Criteria=c(1,1,2,1,3))
# First, generate a logical vector that indicates if the criterium changed:
df[, changed:=c(TRUE, Criteria[-1] != Criteria[-length(Criteria)])]
# Then, calculate the cumulative sum to get an index:
df[, index:=cumsum(changed)]
# Calculate the sum for each level of index:
df[, Sum:=sum(Value), by=index]
# print everything:
print(df)


Result:



   Value Criteria changed index Sum
1: 3 1 TRUE 1 7
2: 4 1 FALSE 1 7
3: 1 2 TRUE 2 1
4: 1 1 TRUE 3 1
5: 2 3 TRUE 4 2


To have the sum of the last block, use some data.table magic:



setkey(df, index)
nextblocksums <- df[index!=max(index), .(index=index+1,nextBlockSum=Sum)]
df[ nextblocksums , LastBlocksSum:=i.nextBlockSum]






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 '18 at 9:34

























answered Nov 14 '18 at 13:07









tstennertstenner

6,32964277




6,32964277













  • Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?

    – Clément POIRET
    Nov 14 '18 at 13:45











  • I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!

    – Clément POIRET
    Nov 15 '18 at 14:58











  • That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.

    – tstenner
    Nov 15 '18 at 16:04











  • Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum

    – Clément POIRET
    Nov 15 '18 at 18:39













  • Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]) can also be a join, try df[ nextblocksums, ] to see it in action.

    – tstenner
    Nov 16 '18 at 9:36



















  • Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?

    – Clément POIRET
    Nov 14 '18 at 13:45











  • I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!

    – Clément POIRET
    Nov 15 '18 at 14:58











  • That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.

    – tstenner
    Nov 15 '18 at 16:04











  • Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum

    – Clément POIRET
    Nov 15 '18 at 18:39













  • Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]) can also be a join, try df[ nextblocksums, ] to see it in action.

    – tstenner
    Nov 16 '18 at 9:36

















Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?

– Clément POIRET
Nov 14 '18 at 13:45





Thanks, it's working but it appears that my dataframe was a bit more complicated than just sum, in fact I need a time so I updated my first post. The time is the difference between last and first value in the range. Do you have an idea?

– Clément POIRET
Nov 14 '18 at 13:45













I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!

– Clément POIRET
Nov 15 '18 at 14:58





I managed to get it work by replacing sum by doing the difference between last and first line of a df[df$index == i]$Time in a For loop! Thanks!

– Clément POIRET
Nov 15 '18 at 14:58













That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.

– tstenner
Nov 15 '18 at 16:04





That works, but it's slow for anything but tiny amounts of data. See my edit for a faster solution.

– tstenner
Nov 15 '18 at 16:04













Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum

– Clément POIRET
Nov 15 '18 at 18:39







Hmm pretty complicated ahah, I've troubles understanding this and I'm getting NA in LastBlocksSum

– Clément POIRET
Nov 15 '18 at 18:39















Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]) can also be a join, try df[ nextblocksums, ] to see it in action.

– tstenner
Nov 16 '18 at 9:36





Of course there's a NA in the first LastBlocksSum, because there's no previous block for that one :-) I've clarified the last step somewhat, just remember that with data.tables the first part (df[i, j]) can also be a join, try df[ nextblocksums, ] to see it in action.

– tstenner
Nov 16 '18 at 9:36




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53300700%2fsuccessive-sum-of-column-only-if-another-column-has-the-good-value-in-r%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python