in R, perform a function over rows in a data frame and join the result together in a new date frame
I feel I'm in the realm of the right answer but for the life of me I can't figure out how to do this. I want to take an entry on a product ledger and change the shape of the data where I have a row/case for every life month of a product transaction.
As a note, sometimes these ledger entries are reversed/refunded (and have date_from > date_thru). I have an example of a new purchase and a refund in the data frame below. Also, transaction dates are pre-cleaned to always be the first of the month in which they were made, since I only care about monthly recurring revenue (mrr) on a month-by-month basis.
Sample df:
user <- c("1001", "1002")
line_item <- c("abc123", "def456")
date_from <- as.Date(c("2015-01-01", "2015-06-01"), "%Y-%m-%d")
date_thru <- as.Date(c("2015-04-01", "2014-12-01"), "%Y-%m-%d")
mrr <- c(2.22, -4.44)
df <- cbind.data.frame(user, line_item, date_from, date_thru, mrr)
Output:
user line_item date_from date_thru mrr
1 1001 abc123 2015-01-01 2015-04-01 2.22
2 1002 def456 2015-06-01 2014-12-01 -4.44
Desired result:
user line_item month mrr
1001 abc123 2015-01-01 2.22
1001 abc123 2015-02-01 2.22
1001 abc123 2015-03-01 2.22
1001 abc123 2015-04-01 2.22
1002 def456 2015-06-01 -4.44
1002 def456 2015-05-01 -4.44
1002 def456 2015-04-01 -4.44
1002 def456 2015-02-01 -4.44
1002 def456 2015-01-01 -4.44
1002 def456 2014-12-01 -4.44
How can I create a new month
column by vectorizing a function like seq(date_from, date_thru, by="months")
, and then joining all the resulting vectors into one df again like above?
I've been trying lapply
, dplyr::mutate
, and seq
and can't get them to work correctly together.
Thanks in advance!
r dplyr
add a comment |
I feel I'm in the realm of the right answer but for the life of me I can't figure out how to do this. I want to take an entry on a product ledger and change the shape of the data where I have a row/case for every life month of a product transaction.
As a note, sometimes these ledger entries are reversed/refunded (and have date_from > date_thru). I have an example of a new purchase and a refund in the data frame below. Also, transaction dates are pre-cleaned to always be the first of the month in which they were made, since I only care about monthly recurring revenue (mrr) on a month-by-month basis.
Sample df:
user <- c("1001", "1002")
line_item <- c("abc123", "def456")
date_from <- as.Date(c("2015-01-01", "2015-06-01"), "%Y-%m-%d")
date_thru <- as.Date(c("2015-04-01", "2014-12-01"), "%Y-%m-%d")
mrr <- c(2.22, -4.44)
df <- cbind.data.frame(user, line_item, date_from, date_thru, mrr)
Output:
user line_item date_from date_thru mrr
1 1001 abc123 2015-01-01 2015-04-01 2.22
2 1002 def456 2015-06-01 2014-12-01 -4.44
Desired result:
user line_item month mrr
1001 abc123 2015-01-01 2.22
1001 abc123 2015-02-01 2.22
1001 abc123 2015-03-01 2.22
1001 abc123 2015-04-01 2.22
1002 def456 2015-06-01 -4.44
1002 def456 2015-05-01 -4.44
1002 def456 2015-04-01 -4.44
1002 def456 2015-02-01 -4.44
1002 def456 2015-01-01 -4.44
1002 def456 2014-12-01 -4.44
How can I create a new month
column by vectorizing a function like seq(date_from, date_thru, by="months")
, and then joining all the resulting vectors into one df again like above?
I've been trying lapply
, dplyr::mutate
, and seq
and can't get them to work correctly together.
Thanks in advance!
r dplyr
@Shree yes that is correct, it is an example of a reversedline_item
. A reversed lineitem has a negative value and will have a date_thru that is before a date_from (sort of like a negative date range).
– Adam Ellsworth
Nov 14 '18 at 2:20
I think you are missing date2015-03-01
for user1002
in your desired output
– Shree
Nov 14 '18 at 2:58
You should redefine theseq.Date
fucntion so it accepts positive and negative sequences gracefully.
– 42-
Nov 14 '18 at 3:31
add a comment |
I feel I'm in the realm of the right answer but for the life of me I can't figure out how to do this. I want to take an entry on a product ledger and change the shape of the data where I have a row/case for every life month of a product transaction.
As a note, sometimes these ledger entries are reversed/refunded (and have date_from > date_thru). I have an example of a new purchase and a refund in the data frame below. Also, transaction dates are pre-cleaned to always be the first of the month in which they were made, since I only care about monthly recurring revenue (mrr) on a month-by-month basis.
Sample df:
user <- c("1001", "1002")
line_item <- c("abc123", "def456")
date_from <- as.Date(c("2015-01-01", "2015-06-01"), "%Y-%m-%d")
date_thru <- as.Date(c("2015-04-01", "2014-12-01"), "%Y-%m-%d")
mrr <- c(2.22, -4.44)
df <- cbind.data.frame(user, line_item, date_from, date_thru, mrr)
Output:
user line_item date_from date_thru mrr
1 1001 abc123 2015-01-01 2015-04-01 2.22
2 1002 def456 2015-06-01 2014-12-01 -4.44
Desired result:
user line_item month mrr
1001 abc123 2015-01-01 2.22
1001 abc123 2015-02-01 2.22
1001 abc123 2015-03-01 2.22
1001 abc123 2015-04-01 2.22
1002 def456 2015-06-01 -4.44
1002 def456 2015-05-01 -4.44
1002 def456 2015-04-01 -4.44
1002 def456 2015-02-01 -4.44
1002 def456 2015-01-01 -4.44
1002 def456 2014-12-01 -4.44
How can I create a new month
column by vectorizing a function like seq(date_from, date_thru, by="months")
, and then joining all the resulting vectors into one df again like above?
I've been trying lapply
, dplyr::mutate
, and seq
and can't get them to work correctly together.
Thanks in advance!
r dplyr
I feel I'm in the realm of the right answer but for the life of me I can't figure out how to do this. I want to take an entry on a product ledger and change the shape of the data where I have a row/case for every life month of a product transaction.
As a note, sometimes these ledger entries are reversed/refunded (and have date_from > date_thru). I have an example of a new purchase and a refund in the data frame below. Also, transaction dates are pre-cleaned to always be the first of the month in which they were made, since I only care about monthly recurring revenue (mrr) on a month-by-month basis.
Sample df:
user <- c("1001", "1002")
line_item <- c("abc123", "def456")
date_from <- as.Date(c("2015-01-01", "2015-06-01"), "%Y-%m-%d")
date_thru <- as.Date(c("2015-04-01", "2014-12-01"), "%Y-%m-%d")
mrr <- c(2.22, -4.44)
df <- cbind.data.frame(user, line_item, date_from, date_thru, mrr)
Output:
user line_item date_from date_thru mrr
1 1001 abc123 2015-01-01 2015-04-01 2.22
2 1002 def456 2015-06-01 2014-12-01 -4.44
Desired result:
user line_item month mrr
1001 abc123 2015-01-01 2.22
1001 abc123 2015-02-01 2.22
1001 abc123 2015-03-01 2.22
1001 abc123 2015-04-01 2.22
1002 def456 2015-06-01 -4.44
1002 def456 2015-05-01 -4.44
1002 def456 2015-04-01 -4.44
1002 def456 2015-02-01 -4.44
1002 def456 2015-01-01 -4.44
1002 def456 2014-12-01 -4.44
How can I create a new month
column by vectorizing a function like seq(date_from, date_thru, by="months")
, and then joining all the resulting vectors into one df again like above?
I've been trying lapply
, dplyr::mutate
, and seq
and can't get them to work correctly together.
Thanks in advance!
r dplyr
r dplyr
asked Nov 14 '18 at 1:48
Adam EllsworthAdam Ellsworth
84
84
@Shree yes that is correct, it is an example of a reversedline_item
. A reversed lineitem has a negative value and will have a date_thru that is before a date_from (sort of like a negative date range).
– Adam Ellsworth
Nov 14 '18 at 2:20
I think you are missing date2015-03-01
for user1002
in your desired output
– Shree
Nov 14 '18 at 2:58
You should redefine theseq.Date
fucntion so it accepts positive and negative sequences gracefully.
– 42-
Nov 14 '18 at 3:31
add a comment |
@Shree yes that is correct, it is an example of a reversedline_item
. A reversed lineitem has a negative value and will have a date_thru that is before a date_from (sort of like a negative date range).
– Adam Ellsworth
Nov 14 '18 at 2:20
I think you are missing date2015-03-01
for user1002
in your desired output
– Shree
Nov 14 '18 at 2:58
You should redefine theseq.Date
fucntion so it accepts positive and negative sequences gracefully.
– 42-
Nov 14 '18 at 3:31
@Shree yes that is correct, it is an example of a reversed
line_item
. A reversed lineitem has a negative value and will have a date_thru that is before a date_from (sort of like a negative date range).– Adam Ellsworth
Nov 14 '18 at 2:20
@Shree yes that is correct, it is an example of a reversed
line_item
. A reversed lineitem has a negative value and will have a date_thru that is before a date_from (sort of like a negative date range).– Adam Ellsworth
Nov 14 '18 at 2:20
I think you are missing date
2015-03-01
for user 1002
in your desired output– Shree
Nov 14 '18 at 2:58
I think you are missing date
2015-03-01
for user 1002
in your desired output– Shree
Nov 14 '18 at 2:58
You should redefine the
seq.Date
fucntion so it accepts positive and negative sequences gracefully.– 42-
Nov 14 '18 at 3:31
You should redefine the
seq.Date
fucntion so it accepts positive and negative sequences gracefully.– 42-
Nov 14 '18 at 3:31
add a comment |
3 Answers
3
active
oldest
votes
Just to uncover the special powers of tidyr
with the complete
and fill
functions
library(tidyverse)
df %>%
# turn into a long format to handle the dates as one variable
gather(key, date, date_from:date_thru) %>%
# group the dataframe to apply operations separately on each user
group_by(user) %>%
# now complete the dataset with the implicit dates within the range of dates
# Note the consideration of: date_from > date_to with 'min' and 'max'
complete(date = seq.Date(from = min(date),to = max(date),by = "month") ) %>%
#now fill in the missing 'mrr'
fill(mrr, line_item)
# A tibble: 11 x 5
# Groups: user [2]
user date line_item mrr key
<fct> <date> <fct> <dbl> <chr>
1 1001 2015-01-01 abc123 2.22 date_from
2 1001 2015-02-01 abc123 2.22 <NA>
3 1001 2015-03-01 abc123 2.22 <NA>
4 1001 2015-04-01 abc123 2.22 date_thru
5 1002 2014-12-01 def456 -4.44 date_thru
6 1002 2015-01-01 def456 -4.44 <NA>
7 1002 2015-02-01 def456 -4.44 <NA>
8 1002 2015-03-01 def456 -4.44 <NA>
9 1002 2015-04-01 def456 -4.44 <NA>
10 1002 2015-05-01 def456 -4.44 <NA>
11 1002 2015-06-01 def456 -4.44 date_from
If you want to remove the key
variable add %>% select(-key)
following after the end of the fill
although you might want to keep it for some reason...
P.S. I really liked the clever trick in the function creation by @G. Grothendieck with the sign
to handle date_from
> date_to
add a comment |
First define a month_seq
function which produces the desired date sequence given the from
and to
dates. Then Map
it to the date_from
and date_thru
columns producing a column of type list whose components are the date sequences. Then unnest
the month
and select the desired columns.
library(dplyr)
library(tidyr)
month_seq <- function(from, to) seq(from, to, paste(sign(to - from), "month"))
df %>%
mutate(month = Map(month_seq, date_from, date_thru)) %>%
unnest %>%
select(user, line_item, month, mrr)
giving:
user line_item month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
add a comment |
Here's a solution using dplyr
, tidyr
, and lubridate
. I found the elapsed_months
function on this SO post Number of months between two dates. I have modified it a bit for your case.
Also, I am assuming negative mrr
indicates that date_from > date_thru
.
elapsed_months <- function(end_date, start_date) {
ed <- as.POSIXlt(end_date)
sd <- as.POSIXlt(start_date)
12 * (ed$year - sd$year) + (ed$mon - sd$mon) + ifelse(ed >= sd, 1, -1)
}
df %>%
uncount(weights = abs(elapsed_months(date_thru, date_from)), .id = "Months") %>%
mutate(
Month = date_from %m+% months(sign(mrr)*(Months - 1))
) %>%
select(user, line_item, Month, mrr)
user line_item Month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292044%2fin-r-perform-a-function-over-rows-in-a-data-frame-and-join-the-result-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Just to uncover the special powers of tidyr
with the complete
and fill
functions
library(tidyverse)
df %>%
# turn into a long format to handle the dates as one variable
gather(key, date, date_from:date_thru) %>%
# group the dataframe to apply operations separately on each user
group_by(user) %>%
# now complete the dataset with the implicit dates within the range of dates
# Note the consideration of: date_from > date_to with 'min' and 'max'
complete(date = seq.Date(from = min(date),to = max(date),by = "month") ) %>%
#now fill in the missing 'mrr'
fill(mrr, line_item)
# A tibble: 11 x 5
# Groups: user [2]
user date line_item mrr key
<fct> <date> <fct> <dbl> <chr>
1 1001 2015-01-01 abc123 2.22 date_from
2 1001 2015-02-01 abc123 2.22 <NA>
3 1001 2015-03-01 abc123 2.22 <NA>
4 1001 2015-04-01 abc123 2.22 date_thru
5 1002 2014-12-01 def456 -4.44 date_thru
6 1002 2015-01-01 def456 -4.44 <NA>
7 1002 2015-02-01 def456 -4.44 <NA>
8 1002 2015-03-01 def456 -4.44 <NA>
9 1002 2015-04-01 def456 -4.44 <NA>
10 1002 2015-05-01 def456 -4.44 <NA>
11 1002 2015-06-01 def456 -4.44 date_from
If you want to remove the key
variable add %>% select(-key)
following after the end of the fill
although you might want to keep it for some reason...
P.S. I really liked the clever trick in the function creation by @G. Grothendieck with the sign
to handle date_from
> date_to
add a comment |
Just to uncover the special powers of tidyr
with the complete
and fill
functions
library(tidyverse)
df %>%
# turn into a long format to handle the dates as one variable
gather(key, date, date_from:date_thru) %>%
# group the dataframe to apply operations separately on each user
group_by(user) %>%
# now complete the dataset with the implicit dates within the range of dates
# Note the consideration of: date_from > date_to with 'min' and 'max'
complete(date = seq.Date(from = min(date),to = max(date),by = "month") ) %>%
#now fill in the missing 'mrr'
fill(mrr, line_item)
# A tibble: 11 x 5
# Groups: user [2]
user date line_item mrr key
<fct> <date> <fct> <dbl> <chr>
1 1001 2015-01-01 abc123 2.22 date_from
2 1001 2015-02-01 abc123 2.22 <NA>
3 1001 2015-03-01 abc123 2.22 <NA>
4 1001 2015-04-01 abc123 2.22 date_thru
5 1002 2014-12-01 def456 -4.44 date_thru
6 1002 2015-01-01 def456 -4.44 <NA>
7 1002 2015-02-01 def456 -4.44 <NA>
8 1002 2015-03-01 def456 -4.44 <NA>
9 1002 2015-04-01 def456 -4.44 <NA>
10 1002 2015-05-01 def456 -4.44 <NA>
11 1002 2015-06-01 def456 -4.44 date_from
If you want to remove the key
variable add %>% select(-key)
following after the end of the fill
although you might want to keep it for some reason...
P.S. I really liked the clever trick in the function creation by @G. Grothendieck with the sign
to handle date_from
> date_to
add a comment |
Just to uncover the special powers of tidyr
with the complete
and fill
functions
library(tidyverse)
df %>%
# turn into a long format to handle the dates as one variable
gather(key, date, date_from:date_thru) %>%
# group the dataframe to apply operations separately on each user
group_by(user) %>%
# now complete the dataset with the implicit dates within the range of dates
# Note the consideration of: date_from > date_to with 'min' and 'max'
complete(date = seq.Date(from = min(date),to = max(date),by = "month") ) %>%
#now fill in the missing 'mrr'
fill(mrr, line_item)
# A tibble: 11 x 5
# Groups: user [2]
user date line_item mrr key
<fct> <date> <fct> <dbl> <chr>
1 1001 2015-01-01 abc123 2.22 date_from
2 1001 2015-02-01 abc123 2.22 <NA>
3 1001 2015-03-01 abc123 2.22 <NA>
4 1001 2015-04-01 abc123 2.22 date_thru
5 1002 2014-12-01 def456 -4.44 date_thru
6 1002 2015-01-01 def456 -4.44 <NA>
7 1002 2015-02-01 def456 -4.44 <NA>
8 1002 2015-03-01 def456 -4.44 <NA>
9 1002 2015-04-01 def456 -4.44 <NA>
10 1002 2015-05-01 def456 -4.44 <NA>
11 1002 2015-06-01 def456 -4.44 date_from
If you want to remove the key
variable add %>% select(-key)
following after the end of the fill
although you might want to keep it for some reason...
P.S. I really liked the clever trick in the function creation by @G. Grothendieck with the sign
to handle date_from
> date_to
Just to uncover the special powers of tidyr
with the complete
and fill
functions
library(tidyverse)
df %>%
# turn into a long format to handle the dates as one variable
gather(key, date, date_from:date_thru) %>%
# group the dataframe to apply operations separately on each user
group_by(user) %>%
# now complete the dataset with the implicit dates within the range of dates
# Note the consideration of: date_from > date_to with 'min' and 'max'
complete(date = seq.Date(from = min(date),to = max(date),by = "month") ) %>%
#now fill in the missing 'mrr'
fill(mrr, line_item)
# A tibble: 11 x 5
# Groups: user [2]
user date line_item mrr key
<fct> <date> <fct> <dbl> <chr>
1 1001 2015-01-01 abc123 2.22 date_from
2 1001 2015-02-01 abc123 2.22 <NA>
3 1001 2015-03-01 abc123 2.22 <NA>
4 1001 2015-04-01 abc123 2.22 date_thru
5 1002 2014-12-01 def456 -4.44 date_thru
6 1002 2015-01-01 def456 -4.44 <NA>
7 1002 2015-02-01 def456 -4.44 <NA>
8 1002 2015-03-01 def456 -4.44 <NA>
9 1002 2015-04-01 def456 -4.44 <NA>
10 1002 2015-05-01 def456 -4.44 <NA>
11 1002 2015-06-01 def456 -4.44 date_from
If you want to remove the key
variable add %>% select(-key)
following after the end of the fill
although you might want to keep it for some reason...
P.S. I really liked the clever trick in the function creation by @G. Grothendieck with the sign
to handle date_from
> date_to
answered Nov 14 '18 at 14:03
Lefkios PaikousisLefkios Paikousis
837
837
add a comment |
add a comment |
First define a month_seq
function which produces the desired date sequence given the from
and to
dates. Then Map
it to the date_from
and date_thru
columns producing a column of type list whose components are the date sequences. Then unnest
the month
and select the desired columns.
library(dplyr)
library(tidyr)
month_seq <- function(from, to) seq(from, to, paste(sign(to - from), "month"))
df %>%
mutate(month = Map(month_seq, date_from, date_thru)) %>%
unnest %>%
select(user, line_item, month, mrr)
giving:
user line_item month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
add a comment |
First define a month_seq
function which produces the desired date sequence given the from
and to
dates. Then Map
it to the date_from
and date_thru
columns producing a column of type list whose components are the date sequences. Then unnest
the month
and select the desired columns.
library(dplyr)
library(tidyr)
month_seq <- function(from, to) seq(from, to, paste(sign(to - from), "month"))
df %>%
mutate(month = Map(month_seq, date_from, date_thru)) %>%
unnest %>%
select(user, line_item, month, mrr)
giving:
user line_item month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
add a comment |
First define a month_seq
function which produces the desired date sequence given the from
and to
dates. Then Map
it to the date_from
and date_thru
columns producing a column of type list whose components are the date sequences. Then unnest
the month
and select the desired columns.
library(dplyr)
library(tidyr)
month_seq <- function(from, to) seq(from, to, paste(sign(to - from), "month"))
df %>%
mutate(month = Map(month_seq, date_from, date_thru)) %>%
unnest %>%
select(user, line_item, month, mrr)
giving:
user line_item month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
First define a month_seq
function which produces the desired date sequence given the from
and to
dates. Then Map
it to the date_from
and date_thru
columns producing a column of type list whose components are the date sequences. Then unnest
the month
and select the desired columns.
library(dplyr)
library(tidyr)
month_seq <- function(from, to) seq(from, to, paste(sign(to - from), "month"))
df %>%
mutate(month = Map(month_seq, date_from, date_thru)) %>%
unnest %>%
select(user, line_item, month, mrr)
giving:
user line_item month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
edited Nov 14 '18 at 12:50
answered Nov 14 '18 at 3:59
G. GrothendieckG. Grothendieck
147k9130235
147k9130235
add a comment |
add a comment |
Here's a solution using dplyr
, tidyr
, and lubridate
. I found the elapsed_months
function on this SO post Number of months between two dates. I have modified it a bit for your case.
Also, I am assuming negative mrr
indicates that date_from > date_thru
.
elapsed_months <- function(end_date, start_date) {
ed <- as.POSIXlt(end_date)
sd <- as.POSIXlt(start_date)
12 * (ed$year - sd$year) + (ed$mon - sd$mon) + ifelse(ed >= sd, 1, -1)
}
df %>%
uncount(weights = abs(elapsed_months(date_thru, date_from)), .id = "Months") %>%
mutate(
Month = date_from %m+% months(sign(mrr)*(Months - 1))
) %>%
select(user, line_item, Month, mrr)
user line_item Month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
add a comment |
Here's a solution using dplyr
, tidyr
, and lubridate
. I found the elapsed_months
function on this SO post Number of months between two dates. I have modified it a bit for your case.
Also, I am assuming negative mrr
indicates that date_from > date_thru
.
elapsed_months <- function(end_date, start_date) {
ed <- as.POSIXlt(end_date)
sd <- as.POSIXlt(start_date)
12 * (ed$year - sd$year) + (ed$mon - sd$mon) + ifelse(ed >= sd, 1, -1)
}
df %>%
uncount(weights = abs(elapsed_months(date_thru, date_from)), .id = "Months") %>%
mutate(
Month = date_from %m+% months(sign(mrr)*(Months - 1))
) %>%
select(user, line_item, Month, mrr)
user line_item Month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
add a comment |
Here's a solution using dplyr
, tidyr
, and lubridate
. I found the elapsed_months
function on this SO post Number of months between two dates. I have modified it a bit for your case.
Also, I am assuming negative mrr
indicates that date_from > date_thru
.
elapsed_months <- function(end_date, start_date) {
ed <- as.POSIXlt(end_date)
sd <- as.POSIXlt(start_date)
12 * (ed$year - sd$year) + (ed$mon - sd$mon) + ifelse(ed >= sd, 1, -1)
}
df %>%
uncount(weights = abs(elapsed_months(date_thru, date_from)), .id = "Months") %>%
mutate(
Month = date_from %m+% months(sign(mrr)*(Months - 1))
) %>%
select(user, line_item, Month, mrr)
user line_item Month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
Here's a solution using dplyr
, tidyr
, and lubridate
. I found the elapsed_months
function on this SO post Number of months between two dates. I have modified it a bit for your case.
Also, I am assuming negative mrr
indicates that date_from > date_thru
.
elapsed_months <- function(end_date, start_date) {
ed <- as.POSIXlt(end_date)
sd <- as.POSIXlt(start_date)
12 * (ed$year - sd$year) + (ed$mon - sd$mon) + ifelse(ed >= sd, 1, -1)
}
df %>%
uncount(weights = abs(elapsed_months(date_thru, date_from)), .id = "Months") %>%
mutate(
Month = date_from %m+% months(sign(mrr)*(Months - 1))
) %>%
select(user, line_item, Month, mrr)
user line_item Month mrr
1 1001 abc123 2015-01-01 2.22
2 1001 abc123 2015-02-01 2.22
3 1001 abc123 2015-03-01 2.22
4 1001 abc123 2015-04-01 2.22
5 1002 def456 2015-06-01 -4.44
6 1002 def456 2015-05-01 -4.44
7 1002 def456 2015-04-01 -4.44
8 1002 def456 2015-03-01 -4.44
9 1002 def456 2015-02-01 -4.44
10 1002 def456 2015-01-01 -4.44
11 1002 def456 2014-12-01 -4.44
edited Nov 14 '18 at 3:01
answered Nov 14 '18 at 2:56
ShreeShree
3,3811323
3,3811323
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53292044%2fin-r-perform-a-function-over-rows-in-a-data-frame-and-join-the-result-together%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
@Shree yes that is correct, it is an example of a reversed
line_item
. A reversed lineitem has a negative value and will have a date_thru that is before a date_from (sort of like a negative date range).– Adam Ellsworth
Nov 14 '18 at 2:20
I think you are missing date
2015-03-01
for user1002
in your desired output– Shree
Nov 14 '18 at 2:58
You should redefine the
seq.Date
fucntion so it accepts positive and negative sequences gracefully.– 42-
Nov 14 '18 at 3:31