R and Simmer: Performance boost on large data frames
I've got on own dataframe on actual events/task and I use the simmer r package to simulate how many task can be done if different resources were available. My simulation runs very fast up to 120.000 rows within my dataframe.
rm(list=ls())
library(dplyr)
library(simmer)
library(simmer.plot)
load("task_df.RDATA")
working_hours <- 7.8
productivity <- 0.7
no.employees <- 292
SIM_TIME <- round((working_hours*productivity*60), 0)+1
employees <- vector("character")
for (i in 1:no.employees) {
employees[i] <- paste("employee", i, sep="_")
}
taskTraj <- trajectory(name = "tasK simulation") %>%
simmer::select(resources = employees, policy = "shortest-queue") %>%
seize_selected(amount = 1) %>%
timeout_from_attribute("duration") %>%
release_selected(amount = 1)
arrivals_gen <- simmer()
for (i in 1:no.employees) {arrivals_gen %>%
add_resource(paste("employee", i, sep="_"), capacity = 1)
}
ptm <- proc.time()
arrivals_gen <- arrivals_gen %>%
add_dataframe("Task_", taskTraj, task_df, mon = 2, col_time = "time", time = "absolute", col_priority="priority") %>%
run(SIM_TIME)
proc.time() - ptm
But my dataframe tasK_df contains 350k datasets and thats the point where my simulation takes a lot of more time.
head(task_df, n = 50)
workload_shift task_id duration priority time
1 20180403 68347632 3 2.502 0
2 20180403 68151881 10 24.478 0
3 20180403 68069718 3 0.724 0
4 20180403 68345621 4 2.226 0
5 20180403 68508858 3 36.062 0
6 20180403 66148996 3 9.421 0
7 20180403 68565066 2 24.478 0
8 20180403 68005344 3 7.910 0
9 20180403 55979902 3 3.732 0
10 20180403 66452138 2 2.502 0
11 20180403 68051869 10 2.226 0
12 20180403 68561364 10 3.584 0
13 20180403 59292591 3 2.138 0
14 20180403 68415657 10 2.853 0
15 20180403 66848400 3 2.290 0
16 20180403 68454851 10 6.167 0
17 20180403 68361846 10 11.688 0
18 20180403 68572723 2 6.259 0
19 20180403 68520328 2 24.478 0
20 20180403 68500955 10 1.855 0
21 20180403 67000753 3 219.751 0
22 20180403 68487613 3 8.131 0
23 20180403 68333674 4 5.263 0
24 20180403 66423486 3 2.290 0
25 20180403 68241616 5 1.470 0
26 20180403 68415001 4 3.584 0
27 20180403 67487967 3 2.636 0
28 20180403 68494771 10 6.259 0
29 20180403 67673981 10 2.226 0
30 20180403 68355727 3 2.613 0
31 20180403 36942995 3 0.590 0
32 20180403 66633446 3 5.968 0
33 20180403 68461510 2 24.478 0
34 20180403 67126138 3 0.357 0
35 20180403 68485682 3 8.131 0
36 20180403 67852953 10 2.290 0
37 20180403 68150106 10 6.259 0
38 20180403 67833053 10 4.114 0
39 20180403 67816673 3 6.259 0
40 20180403 68041431 5 2.502 0
41 20180403 66283761 5 2.502 0
42 20180403 68543314 2 26.302 0
43 20180403 68492843 3 2.290 0
44 20180403 68556960 4 2.853 0
45 20180403 66885335 3 5.975 0
46 20180403 66249231 5 2.636 0
47 20180403 68242565 12 1.470 0
48 20180403 68530355 2 2.290 0
49 20180403 66683717 5 5.705 0
50 20180403 67802538 4 0.864 0
user system elapsed
76.745 0.039 76.717
vs
user system elapsed
608.443 0.270 608.186
My CPU
Is there a way to boost my simulation? I use simmer 4.1.0 and Rcpp 1.0.0. Memory doesnt seems to be an issue.
c++ r simulation
add a comment |
I've got on own dataframe on actual events/task and I use the simmer r package to simulate how many task can be done if different resources were available. My simulation runs very fast up to 120.000 rows within my dataframe.
rm(list=ls())
library(dplyr)
library(simmer)
library(simmer.plot)
load("task_df.RDATA")
working_hours <- 7.8
productivity <- 0.7
no.employees <- 292
SIM_TIME <- round((working_hours*productivity*60), 0)+1
employees <- vector("character")
for (i in 1:no.employees) {
employees[i] <- paste("employee", i, sep="_")
}
taskTraj <- trajectory(name = "tasK simulation") %>%
simmer::select(resources = employees, policy = "shortest-queue") %>%
seize_selected(amount = 1) %>%
timeout_from_attribute("duration") %>%
release_selected(amount = 1)
arrivals_gen <- simmer()
for (i in 1:no.employees) {arrivals_gen %>%
add_resource(paste("employee", i, sep="_"), capacity = 1)
}
ptm <- proc.time()
arrivals_gen <- arrivals_gen %>%
add_dataframe("Task_", taskTraj, task_df, mon = 2, col_time = "time", time = "absolute", col_priority="priority") %>%
run(SIM_TIME)
proc.time() - ptm
But my dataframe tasK_df contains 350k datasets and thats the point where my simulation takes a lot of more time.
head(task_df, n = 50)
workload_shift task_id duration priority time
1 20180403 68347632 3 2.502 0
2 20180403 68151881 10 24.478 0
3 20180403 68069718 3 0.724 0
4 20180403 68345621 4 2.226 0
5 20180403 68508858 3 36.062 0
6 20180403 66148996 3 9.421 0
7 20180403 68565066 2 24.478 0
8 20180403 68005344 3 7.910 0
9 20180403 55979902 3 3.732 0
10 20180403 66452138 2 2.502 0
11 20180403 68051869 10 2.226 0
12 20180403 68561364 10 3.584 0
13 20180403 59292591 3 2.138 0
14 20180403 68415657 10 2.853 0
15 20180403 66848400 3 2.290 0
16 20180403 68454851 10 6.167 0
17 20180403 68361846 10 11.688 0
18 20180403 68572723 2 6.259 0
19 20180403 68520328 2 24.478 0
20 20180403 68500955 10 1.855 0
21 20180403 67000753 3 219.751 0
22 20180403 68487613 3 8.131 0
23 20180403 68333674 4 5.263 0
24 20180403 66423486 3 2.290 0
25 20180403 68241616 5 1.470 0
26 20180403 68415001 4 3.584 0
27 20180403 67487967 3 2.636 0
28 20180403 68494771 10 6.259 0
29 20180403 67673981 10 2.226 0
30 20180403 68355727 3 2.613 0
31 20180403 36942995 3 0.590 0
32 20180403 66633446 3 5.968 0
33 20180403 68461510 2 24.478 0
34 20180403 67126138 3 0.357 0
35 20180403 68485682 3 8.131 0
36 20180403 67852953 10 2.290 0
37 20180403 68150106 10 6.259 0
38 20180403 67833053 10 4.114 0
39 20180403 67816673 3 6.259 0
40 20180403 68041431 5 2.502 0
41 20180403 66283761 5 2.502 0
42 20180403 68543314 2 26.302 0
43 20180403 68492843 3 2.290 0
44 20180403 68556960 4 2.853 0
45 20180403 66885335 3 5.975 0
46 20180403 66249231 5 2.636 0
47 20180403 68242565 12 1.470 0
48 20180403 68530355 2 2.290 0
49 20180403 66683717 5 5.705 0
50 20180403 67802538 4 0.864 0
user system elapsed
76.745 0.039 76.717
vs
user system elapsed
608.443 0.270 608.186
My CPU
Is there a way to boost my simulation? I use simmer 4.1.0 and Rcpp 1.0.0. Memory doesnt seems to be an issue.
c++ r simulation
1
Based on your code above, I tried dataframes with 100k and 1M observations (with random data) and I see no performance issues (i.e., 1M takes x10 the time of 100k rows, as expected). Could you provide a reproducible example?
– Iñaki Úcar
Nov 14 '18 at 9:29
@IñakiÚcar Thanks in advance for your fast reply. I have updated my code snippet above to give a reproducible example.
– MCR90
Nov 14 '18 at 13:09
add a comment |
I've got on own dataframe on actual events/task and I use the simmer r package to simulate how many task can be done if different resources were available. My simulation runs very fast up to 120.000 rows within my dataframe.
rm(list=ls())
library(dplyr)
library(simmer)
library(simmer.plot)
load("task_df.RDATA")
working_hours <- 7.8
productivity <- 0.7
no.employees <- 292
SIM_TIME <- round((working_hours*productivity*60), 0)+1
employees <- vector("character")
for (i in 1:no.employees) {
employees[i] <- paste("employee", i, sep="_")
}
taskTraj <- trajectory(name = "tasK simulation") %>%
simmer::select(resources = employees, policy = "shortest-queue") %>%
seize_selected(amount = 1) %>%
timeout_from_attribute("duration") %>%
release_selected(amount = 1)
arrivals_gen <- simmer()
for (i in 1:no.employees) {arrivals_gen %>%
add_resource(paste("employee", i, sep="_"), capacity = 1)
}
ptm <- proc.time()
arrivals_gen <- arrivals_gen %>%
add_dataframe("Task_", taskTraj, task_df, mon = 2, col_time = "time", time = "absolute", col_priority="priority") %>%
run(SIM_TIME)
proc.time() - ptm
But my dataframe tasK_df contains 350k datasets and thats the point where my simulation takes a lot of more time.
head(task_df, n = 50)
workload_shift task_id duration priority time
1 20180403 68347632 3 2.502 0
2 20180403 68151881 10 24.478 0
3 20180403 68069718 3 0.724 0
4 20180403 68345621 4 2.226 0
5 20180403 68508858 3 36.062 0
6 20180403 66148996 3 9.421 0
7 20180403 68565066 2 24.478 0
8 20180403 68005344 3 7.910 0
9 20180403 55979902 3 3.732 0
10 20180403 66452138 2 2.502 0
11 20180403 68051869 10 2.226 0
12 20180403 68561364 10 3.584 0
13 20180403 59292591 3 2.138 0
14 20180403 68415657 10 2.853 0
15 20180403 66848400 3 2.290 0
16 20180403 68454851 10 6.167 0
17 20180403 68361846 10 11.688 0
18 20180403 68572723 2 6.259 0
19 20180403 68520328 2 24.478 0
20 20180403 68500955 10 1.855 0
21 20180403 67000753 3 219.751 0
22 20180403 68487613 3 8.131 0
23 20180403 68333674 4 5.263 0
24 20180403 66423486 3 2.290 0
25 20180403 68241616 5 1.470 0
26 20180403 68415001 4 3.584 0
27 20180403 67487967 3 2.636 0
28 20180403 68494771 10 6.259 0
29 20180403 67673981 10 2.226 0
30 20180403 68355727 3 2.613 0
31 20180403 36942995 3 0.590 0
32 20180403 66633446 3 5.968 0
33 20180403 68461510 2 24.478 0
34 20180403 67126138 3 0.357 0
35 20180403 68485682 3 8.131 0
36 20180403 67852953 10 2.290 0
37 20180403 68150106 10 6.259 0
38 20180403 67833053 10 4.114 0
39 20180403 67816673 3 6.259 0
40 20180403 68041431 5 2.502 0
41 20180403 66283761 5 2.502 0
42 20180403 68543314 2 26.302 0
43 20180403 68492843 3 2.290 0
44 20180403 68556960 4 2.853 0
45 20180403 66885335 3 5.975 0
46 20180403 66249231 5 2.636 0
47 20180403 68242565 12 1.470 0
48 20180403 68530355 2 2.290 0
49 20180403 66683717 5 5.705 0
50 20180403 67802538 4 0.864 0
user system elapsed
76.745 0.039 76.717
vs
user system elapsed
608.443 0.270 608.186
My CPU
Is there a way to boost my simulation? I use simmer 4.1.0 and Rcpp 1.0.0. Memory doesnt seems to be an issue.
c++ r simulation
I've got on own dataframe on actual events/task and I use the simmer r package to simulate how many task can be done if different resources were available. My simulation runs very fast up to 120.000 rows within my dataframe.
rm(list=ls())
library(dplyr)
library(simmer)
library(simmer.plot)
load("task_df.RDATA")
working_hours <- 7.8
productivity <- 0.7
no.employees <- 292
SIM_TIME <- round((working_hours*productivity*60), 0)+1
employees <- vector("character")
for (i in 1:no.employees) {
employees[i] <- paste("employee", i, sep="_")
}
taskTraj <- trajectory(name = "tasK simulation") %>%
simmer::select(resources = employees, policy = "shortest-queue") %>%
seize_selected(amount = 1) %>%
timeout_from_attribute("duration") %>%
release_selected(amount = 1)
arrivals_gen <- simmer()
for (i in 1:no.employees) {arrivals_gen %>%
add_resource(paste("employee", i, sep="_"), capacity = 1)
}
ptm <- proc.time()
arrivals_gen <- arrivals_gen %>%
add_dataframe("Task_", taskTraj, task_df, mon = 2, col_time = "time", time = "absolute", col_priority="priority") %>%
run(SIM_TIME)
proc.time() - ptm
But my dataframe tasK_df contains 350k datasets and thats the point where my simulation takes a lot of more time.
head(task_df, n = 50)
workload_shift task_id duration priority time
1 20180403 68347632 3 2.502 0
2 20180403 68151881 10 24.478 0
3 20180403 68069718 3 0.724 0
4 20180403 68345621 4 2.226 0
5 20180403 68508858 3 36.062 0
6 20180403 66148996 3 9.421 0
7 20180403 68565066 2 24.478 0
8 20180403 68005344 3 7.910 0
9 20180403 55979902 3 3.732 0
10 20180403 66452138 2 2.502 0
11 20180403 68051869 10 2.226 0
12 20180403 68561364 10 3.584 0
13 20180403 59292591 3 2.138 0
14 20180403 68415657 10 2.853 0
15 20180403 66848400 3 2.290 0
16 20180403 68454851 10 6.167 0
17 20180403 68361846 10 11.688 0
18 20180403 68572723 2 6.259 0
19 20180403 68520328 2 24.478 0
20 20180403 68500955 10 1.855 0
21 20180403 67000753 3 219.751 0
22 20180403 68487613 3 8.131 0
23 20180403 68333674 4 5.263 0
24 20180403 66423486 3 2.290 0
25 20180403 68241616 5 1.470 0
26 20180403 68415001 4 3.584 0
27 20180403 67487967 3 2.636 0
28 20180403 68494771 10 6.259 0
29 20180403 67673981 10 2.226 0
30 20180403 68355727 3 2.613 0
31 20180403 36942995 3 0.590 0
32 20180403 66633446 3 5.968 0
33 20180403 68461510 2 24.478 0
34 20180403 67126138 3 0.357 0
35 20180403 68485682 3 8.131 0
36 20180403 67852953 10 2.290 0
37 20180403 68150106 10 6.259 0
38 20180403 67833053 10 4.114 0
39 20180403 67816673 3 6.259 0
40 20180403 68041431 5 2.502 0
41 20180403 66283761 5 2.502 0
42 20180403 68543314 2 26.302 0
43 20180403 68492843 3 2.290 0
44 20180403 68556960 4 2.853 0
45 20180403 66885335 3 5.975 0
46 20180403 66249231 5 2.636 0
47 20180403 68242565 12 1.470 0
48 20180403 68530355 2 2.290 0
49 20180403 66683717 5 5.705 0
50 20180403 67802538 4 0.864 0
user system elapsed
76.745 0.039 76.717
vs
user system elapsed
608.443 0.270 608.186
My CPU
Is there a way to boost my simulation? I use simmer 4.1.0 and Rcpp 1.0.0. Memory doesnt seems to be an issue.
c++ r simulation
c++ r simulation
edited Nov 14 '18 at 13:08
MCR90
asked Nov 13 '18 at 14:18
MCR90MCR90
83
83
1
Based on your code above, I tried dataframes with 100k and 1M observations (with random data) and I see no performance issues (i.e., 1M takes x10 the time of 100k rows, as expected). Could you provide a reproducible example?
– Iñaki Úcar
Nov 14 '18 at 9:29
@IñakiÚcar Thanks in advance for your fast reply. I have updated my code snippet above to give a reproducible example.
– MCR90
Nov 14 '18 at 13:09
add a comment |
1
Based on your code above, I tried dataframes with 100k and 1M observations (with random data) and I see no performance issues (i.e., 1M takes x10 the time of 100k rows, as expected). Could you provide a reproducible example?
– Iñaki Úcar
Nov 14 '18 at 9:29
@IñakiÚcar Thanks in advance for your fast reply. I have updated my code snippet above to give a reproducible example.
– MCR90
Nov 14 '18 at 13:09
1
1
Based on your code above, I tried dataframes with 100k and 1M observations (with random data) and I see no performance issues (i.e., 1M takes x10 the time of 100k rows, as expected). Could you provide a reproducible example?
– Iñaki Úcar
Nov 14 '18 at 9:29
Based on your code above, I tried dataframes with 100k and 1M observations (with random data) and I see no performance issues (i.e., 1M takes x10 the time of 100k rows, as expected). Could you provide a reproducible example?
– Iñaki Úcar
Nov 14 '18 at 9:29
@IñakiÚcar Thanks in advance for your fast reply. I have updated my code snippet above to give a reproducible example.
– MCR90
Nov 14 '18 at 13:09
@IñakiÚcar Thanks in advance for your fast reply. I have updated my code snippet above to give a reproducible example.
– MCR90
Nov 14 '18 at 13:09
add a comment |
1 Answer
1
active
oldest
votes
I took your table and simply replicated it to build 100k and 400k datasets, and I confirm the issue: the execution time is not linear.
Internally, attributes are always double
, so there are lots of conversions, row by row, which apparently take most of the execution time (!). Try converting your table before feeding it into simmer
. Using dplyr
,
task_df <- mutate_all(task_df, as.double)
The simulation should be much faster, and the execution time for increasing number of rows should grow more or less linearly. It's evident why so many casts are degrading the performance, though I'm not sure why it makes execution time non-linear.
Anyway, in future releases, we may want to apply this automatically, so that the user doesn't have to bother about these performance issues.
Thank you! it worked very well and it seems to be linear!
– MCR90
Nov 15 '18 at 16:45
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283052%2fr-and-simmer-performance-boost-on-large-data-frames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I took your table and simply replicated it to build 100k and 400k datasets, and I confirm the issue: the execution time is not linear.
Internally, attributes are always double
, so there are lots of conversions, row by row, which apparently take most of the execution time (!). Try converting your table before feeding it into simmer
. Using dplyr
,
task_df <- mutate_all(task_df, as.double)
The simulation should be much faster, and the execution time for increasing number of rows should grow more or less linearly. It's evident why so many casts are degrading the performance, though I'm not sure why it makes execution time non-linear.
Anyway, in future releases, we may want to apply this automatically, so that the user doesn't have to bother about these performance issues.
Thank you! it worked very well and it seems to be linear!
– MCR90
Nov 15 '18 at 16:45
add a comment |
I took your table and simply replicated it to build 100k and 400k datasets, and I confirm the issue: the execution time is not linear.
Internally, attributes are always double
, so there are lots of conversions, row by row, which apparently take most of the execution time (!). Try converting your table before feeding it into simmer
. Using dplyr
,
task_df <- mutate_all(task_df, as.double)
The simulation should be much faster, and the execution time for increasing number of rows should grow more or less linearly. It's evident why so many casts are degrading the performance, though I'm not sure why it makes execution time non-linear.
Anyway, in future releases, we may want to apply this automatically, so that the user doesn't have to bother about these performance issues.
Thank you! it worked very well and it seems to be linear!
– MCR90
Nov 15 '18 at 16:45
add a comment |
I took your table and simply replicated it to build 100k and 400k datasets, and I confirm the issue: the execution time is not linear.
Internally, attributes are always double
, so there are lots of conversions, row by row, which apparently take most of the execution time (!). Try converting your table before feeding it into simmer
. Using dplyr
,
task_df <- mutate_all(task_df, as.double)
The simulation should be much faster, and the execution time for increasing number of rows should grow more or less linearly. It's evident why so many casts are degrading the performance, though I'm not sure why it makes execution time non-linear.
Anyway, in future releases, we may want to apply this automatically, so that the user doesn't have to bother about these performance issues.
I took your table and simply replicated it to build 100k and 400k datasets, and I confirm the issue: the execution time is not linear.
Internally, attributes are always double
, so there are lots of conversions, row by row, which apparently take most of the execution time (!). Try converting your table before feeding it into simmer
. Using dplyr
,
task_df <- mutate_all(task_df, as.double)
The simulation should be much faster, and the execution time for increasing number of rows should grow more or less linearly. It's evident why so many casts are degrading the performance, though I'm not sure why it makes execution time non-linear.
Anyway, in future releases, we may want to apply this automatically, so that the user doesn't have to bother about these performance issues.
answered Nov 15 '18 at 13:32
Iñaki ÚcarIñaki Úcar
1969
1969
Thank you! it worked very well and it seems to be linear!
– MCR90
Nov 15 '18 at 16:45
add a comment |
Thank you! it worked very well and it seems to be linear!
– MCR90
Nov 15 '18 at 16:45
Thank you! it worked very well and it seems to be linear!
– MCR90
Nov 15 '18 at 16:45
Thank you! it worked very well and it seems to be linear!
– MCR90
Nov 15 '18 at 16:45
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283052%2fr-and-simmer-performance-boost-on-large-data-frames%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Based on your code above, I tried dataframes with 100k and 1M observations (with random data) and I see no performance issues (i.e., 1M takes x10 the time of 100k rows, as expected). Could you provide a reproducible example?
– Iñaki Úcar
Nov 14 '18 at 9:29
@IñakiÚcar Thanks in advance for your fast reply. I have updated my code snippet above to give a reproducible example.
– MCR90
Nov 14 '18 at 13:09