Improve speed of drawdown.duration implementation
I have working code that calculates a running drawdown.duration
where drawdown.duration
is defined as the number of months between the current month and the previous peak
. I implemented the code, however, as a for
loop and it runs quite slow.
Is there a more efficient/faster way to implement this in R
?
The code takes a data.frame
(specifically a tibble
since I have been working with dplyr
) named returnsWithValues
.
> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046,
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1),
peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535,
-0.0732436886232377, -0.125790335902834, -0.156973989083751,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
# A tibble: 6 x 4
date value peak drawdown
<date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126
5 1972-06-30 0.843 1 -0.157
6 1972-07-31 2.1 2.1 0
I have implemented drawdown.duration
using a for
loop:
returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
Which gives the correct answer as:
> returnsWithValues
# A tibble: 6 x 5
date value peak drawdown drawdown.duration
<date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0
r performance dplyr
add a comment |
I have working code that calculates a running drawdown.duration
where drawdown.duration
is defined as the number of months between the current month and the previous peak
. I implemented the code, however, as a for
loop and it runs quite slow.
Is there a more efficient/faster way to implement this in R
?
The code takes a data.frame
(specifically a tibble
since I have been working with dplyr
) named returnsWithValues
.
> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046,
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1),
peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535,
-0.0732436886232377, -0.125790335902834, -0.156973989083751,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
# A tibble: 6 x 4
date value peak drawdown
<date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126
5 1972-06-30 0.843 1 -0.157
6 1972-07-31 2.1 2.1 0
I have implemented drawdown.duration
using a for
loop:
returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
Which gives the correct answer as:
> returnsWithValues
# A tibble: 6 x 5
date value peak drawdown drawdown.duration
<date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0
r performance dplyr
add a comment |
I have working code that calculates a running drawdown.duration
where drawdown.duration
is defined as the number of months between the current month and the previous peak
. I implemented the code, however, as a for
loop and it runs quite slow.
Is there a more efficient/faster way to implement this in R
?
The code takes a data.frame
(specifically a tibble
since I have been working with dplyr
) named returnsWithValues
.
> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046,
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1),
peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535,
-0.0732436886232377, -0.125790335902834, -0.156973989083751,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
# A tibble: 6 x 4
date value peak drawdown
<date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126
5 1972-06-30 0.843 1 -0.157
6 1972-07-31 2.1 2.1 0
I have implemented drawdown.duration
using a for
loop:
returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
Which gives the correct answer as:
> returnsWithValues
# A tibble: 6 x 5
date value peak drawdown drawdown.duration
<date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0
r performance dplyr
I have working code that calculates a running drawdown.duration
where drawdown.duration
is defined as the number of months between the current month and the previous peak
. I implemented the code, however, as a for
loop and it runs quite slow.
Is there a more efficient/faster way to implement this in R
?
The code takes a data.frame
(specifically a tibble
since I have been working with dplyr
) named returnsWithValues
.
> structure(list(date = structure(c(789, 820, 850, 881, 911, 942
), class = "Date"), value = c(0.94031052, 0.930751624153046,
0.926756311376762, 0.874209664097166, 0.843026010916249, 2.1),
peak = c(1, 1, 1, 1, 1, 2.1), drawdown = c(-0.05968948, -0.0692483758469535,
-0.0732436886232377, -0.125790335902834, -0.156973989083751,
0)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
# A tibble: 6 x 4
date value peak drawdown
<date> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597
2 1972-03-31 0.931 1 -0.0692
3 1972-04-30 0.927 1 -0.0732
4 1972-05-31 0.874 1 -0.126
5 1972-06-30 0.843 1 -0.157
6 1972-07-31 2.1 2.1 0
I have implemented drawdown.duration
using a for
loop:
returnsWithValues <- returnsWithValues %>% mutate(drawdown.duration = NA)
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
Which gives the correct answer as:
> returnsWithValues
# A tibble: 6 x 5
date value peak drawdown drawdown.duration
<date> <dbl> <dbl> <dbl> <dbl>
1 1972-02-29 0.940 1 -0.0597 1
2 1972-03-31 0.931 1 -0.0692 2
3 1972-04-30 0.927 1 -0.0732 3
4 1972-05-31 0.874 1 -0.126 4
5 1972-06-30 0.843 1 -0.157 5
6 1972-07-31 2.1 2.1 0 0
r performance dplyr
r performance dplyr
asked Nov 15 '18 at 2:35
cpagecpage
213210
213210
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I think this will do it, as long as each peak
value is unique and not repeated in another group later on:
returnsWithValues %>%
group_by(peak) %>%
mutate(drawdown.duration = cumsum(value != peak))
If you do have repeated peak values, you might need a way to group just within consecutive peak
values, e.g.
returns %>%
# Start counting the number of groups at 1, and every time
# peak changes compared to the previous row, add 1
mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
group_by(peak_group) %>%
mutate(drawdown.duration = cumsum(value != peak))
add a comment |
I will remove the for loop as you want and I will use the idea of indexing.
indices <- function(returnsWithValues){
indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
indices_to_zero<-which(indices_logical) # which values are true
indices_drawdpwn<-which(!indices_logical) # which values are false
returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
returnsWithValues
Here is you for loop wrapped in a function.
for_loop<-function(returnsWithValues){
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
returnsWithValues
}
Here is a benchmark compared to your for loop.
microbenchmark::microbenchmark(
"for loop" = flp<-for_loop(returnsWithValues),
indices = ind<-indices(returnsWithValues),
times = 10
)
Unit: microseconds
expr min lq mean median uq max neval
for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311617%2fimprove-speed-of-drawdown-duration-implementation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think this will do it, as long as each peak
value is unique and not repeated in another group later on:
returnsWithValues %>%
group_by(peak) %>%
mutate(drawdown.duration = cumsum(value != peak))
If you do have repeated peak values, you might need a way to group just within consecutive peak
values, e.g.
returns %>%
# Start counting the number of groups at 1, and every time
# peak changes compared to the previous row, add 1
mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
group_by(peak_group) %>%
mutate(drawdown.duration = cumsum(value != peak))
add a comment |
I think this will do it, as long as each peak
value is unique and not repeated in another group later on:
returnsWithValues %>%
group_by(peak) %>%
mutate(drawdown.duration = cumsum(value != peak))
If you do have repeated peak values, you might need a way to group just within consecutive peak
values, e.g.
returns %>%
# Start counting the number of groups at 1, and every time
# peak changes compared to the previous row, add 1
mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
group_by(peak_group) %>%
mutate(drawdown.duration = cumsum(value != peak))
add a comment |
I think this will do it, as long as each peak
value is unique and not repeated in another group later on:
returnsWithValues %>%
group_by(peak) %>%
mutate(drawdown.duration = cumsum(value != peak))
If you do have repeated peak values, you might need a way to group just within consecutive peak
values, e.g.
returns %>%
# Start counting the number of groups at 1, and every time
# peak changes compared to the previous row, add 1
mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
group_by(peak_group) %>%
mutate(drawdown.duration = cumsum(value != peak))
I think this will do it, as long as each peak
value is unique and not repeated in another group later on:
returnsWithValues %>%
group_by(peak) %>%
mutate(drawdown.duration = cumsum(value != peak))
If you do have repeated peak values, you might need a way to group just within consecutive peak
values, e.g.
returns %>%
# Start counting the number of groups at 1, and every time
# peak changes compared to the previous row, add 1
mutate(peak_group = cumsum(c(1, peak[-1] != head(peak, -1)))) %>%
group_by(peak_group) %>%
mutate(drawdown.duration = cumsum(value != peak))
edited Nov 15 '18 at 3:03
answered Nov 15 '18 at 2:56
MariusMarius
32.4k97376
32.4k97376
add a comment |
add a comment |
I will remove the for loop as you want and I will use the idea of indexing.
indices <- function(returnsWithValues){
indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
indices_to_zero<-which(indices_logical) # which values are true
indices_drawdpwn<-which(!indices_logical) # which values are false
returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
returnsWithValues
Here is you for loop wrapped in a function.
for_loop<-function(returnsWithValues){
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
returnsWithValues
}
Here is a benchmark compared to your for loop.
microbenchmark::microbenchmark(
"for loop" = flp<-for_loop(returnsWithValues),
indices = ind<-indices(returnsWithValues),
times = 10
)
Unit: microseconds
expr min lq mean median uq max neval
for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE
add a comment |
I will remove the for loop as you want and I will use the idea of indexing.
indices <- function(returnsWithValues){
indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
indices_to_zero<-which(indices_logical) # which values are true
indices_drawdpwn<-which(!indices_logical) # which values are false
returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
returnsWithValues
Here is you for loop wrapped in a function.
for_loop<-function(returnsWithValues){
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
returnsWithValues
}
Here is a benchmark compared to your for loop.
microbenchmark::microbenchmark(
"for loop" = flp<-for_loop(returnsWithValues),
indices = ind<-indices(returnsWithValues),
times = 10
)
Unit: microseconds
expr min lq mean median uq max neval
for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE
add a comment |
I will remove the for loop as you want and I will use the idea of indexing.
indices <- function(returnsWithValues){
indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
indices_to_zero<-which(indices_logical) # which values are true
indices_drawdpwn<-which(!indices_logical) # which values are false
returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
returnsWithValues
Here is you for loop wrapped in a function.
for_loop<-function(returnsWithValues){
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
returnsWithValues
}
Here is a benchmark compared to your for loop.
microbenchmark::microbenchmark(
"for loop" = flp<-for_loop(returnsWithValues),
indices = ind<-indices(returnsWithValues),
times = 10
)
Unit: microseconds
expr min lq mean median uq max neval
for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE
I will remove the for loop as you want and I will use the idea of indexing.
indices <- function(returnsWithValues){
indices_logical<-(returnsWithValues[["value"]] == returnsWithValues[["peak"]]) #return a logical vector where true values are for equal and false for not.
indices_to_zero<-which(indices_logical) # which values are true
indices_drawdpwn<-which(!indices_logical) # which values are false
returnsWithValues[indices_to_zero,"drawdown.duration"] <- 0
returnsWithValues[indices_drawdpwn,"drawdown.duration"] <- 1:length(indices_drawdpwn) #basically you compute this if I understand correctly
returnsWithValues
Here is you for loop wrapped in a function.
for_loop<-function(returnsWithValues){
# add drawdown.duration col
for (row in 1:nrow(returnsWithValues)) {
if(returnsWithValues[row,"value"] == returnsWithValues[row,"peak"]) {
returnsWithValues[row,"drawdown.duration"] = 0
} else {
if(row == 1){
returnsWithValues[row,"drawdown.duration"] = 1
} else {
returnsWithValues[row,"drawdown.duration"] = returnsWithValues[row - 1,"drawdown.duration"] + 1
}
}
}
returnsWithValues
}
Here is a benchmark compared to your for loop.
microbenchmark::microbenchmark(
"for loop" = flp<-for_loop(returnsWithValues),
indices = ind<-indices(returnsWithValues),
times = 10
)
Unit: microseconds
expr min lq mean median uq max neval
for loop 8671.228 8699.555 8857.198 8826.8185 8967.631 9196.708 10
indices 92.781 99.349 106.328 102.8385 115.360 122.749 10
all.equal(ind,flp)
[1] TRUE
answered Nov 16 '18 at 18:29
CsdCsd
31819
31819
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53311617%2fimprove-speed-of-drawdown-duration-implementation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown