Memory leakage in using `ggplot` on large binned datasets
I am making various ggplot
s on a very large dataset (much larger than the examples). I created a binning function on both x- and y-axes to enable plotting of such large dataset.
In the following example, the memory.size()
is recorded at the start. Then the large dataset is simulated as dt
. dt
's x2
is plotted against x1
with binning. Plotting is repeated with different subsets of dt
. The size of the ploted object is checked by object.size()
and stored. After the plotting objects have been created, rm(dt)
is executed, followed by a double gc()
. At this point, memory.size()
is recorded again. At the end, the memory.size()
at the end is compared to that at the beginning and printed.
In view of the small size of the plotted object, it is expected that the memory.size()
at the end should be similar to that at the beginning. But no. memory.size()
does not go down anymore until I restart a new R session.
REPRODUCIBLE EXAMPLE
library(data.table)
library(ggplot2)
library(magrittr)
# The binning function
# x = column name for x-axis (character)
# y = column name for y-axis (character)
# xNItv = Number of bin for x-axis
# yNItv = Number of bin for y-axis
# Value: A binned data.table
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv) {
#Binning
xBreaks = dt[, seq(min(get(x), na.rm = T), max(get(x), na.rm = T), length.out = xNItv + 1)]
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
#Returning table
return(tab_plot)
}
before.mem.size <- memory.size()
# Simulation of dataset
nrow <- 6e5
ncol <- 60
dt <- do.call(data.table, lapply(seq(ncol), function(i) {return(runif(nrow))}) %>% set_names(paste0("x", seq(ncol))))
# Graph plotting
dummyEnv <- new.env()
with(dummyEnv, {
fcn <- function(tab) {
binned.dt <- tab_by_bin_idxy(dt = tab, x = "x1", y = "x2", xNItv = 50, yNItv = 50)
plot <- ggplot(binned.dt, aes(x = xbinMid, y = ybinMid)) + geom_point(aes(size = N))
return(plot)
}
lst_plots <- list(
plot1 = fcn(dt),
plot2 = fcn(dt[x1 <= 0.7]),
plot3 = fcn(dt[x5 <= 0.3])
)
assign("size.of.plots", object.size(lst_plots), envir = .GlobalEnv)
})
rm(dummyEnv)
# After use, remove and clean up of dataset
rm(dt)
gc();gc()
after.mem.size <- memory.size()
# Memory reports
print(paste0("before.mem.size = ", before.mem.size))
print(paste0("after.mem.size = ", after.mem.size))
print(paste0("plot.objs.size = ", size.of.plots / 1000000))
I have tried the following modifications to the code:
- Inside
fcn
, removingggplot
and returning aNULL
instead of a plot object: The memory leakage is totally gone. But this is not a solution. I need the plot. - The less plots requested / less columns / less rows passed to
fcn
, the less is the memory leakage. - Memory leakage also exists if I do not make any subset and make only one plot object (In the examples, I plotted 3).
- After the process, even after I call
rm(list = ls())
, the memory is still non-recoverable.
I wish to know why this happens and how to get rid of it without compromising my need to do binned plots and subset dt
to make different plots.
Thanks for attention!
r memory ggplot2 memory-leaks data.table
add a comment |
I am making various ggplot
s on a very large dataset (much larger than the examples). I created a binning function on both x- and y-axes to enable plotting of such large dataset.
In the following example, the memory.size()
is recorded at the start. Then the large dataset is simulated as dt
. dt
's x2
is plotted against x1
with binning. Plotting is repeated with different subsets of dt
. The size of the ploted object is checked by object.size()
and stored. After the plotting objects have been created, rm(dt)
is executed, followed by a double gc()
. At this point, memory.size()
is recorded again. At the end, the memory.size()
at the end is compared to that at the beginning and printed.
In view of the small size of the plotted object, it is expected that the memory.size()
at the end should be similar to that at the beginning. But no. memory.size()
does not go down anymore until I restart a new R session.
REPRODUCIBLE EXAMPLE
library(data.table)
library(ggplot2)
library(magrittr)
# The binning function
# x = column name for x-axis (character)
# y = column name for y-axis (character)
# xNItv = Number of bin for x-axis
# yNItv = Number of bin for y-axis
# Value: A binned data.table
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv) {
#Binning
xBreaks = dt[, seq(min(get(x), na.rm = T), max(get(x), na.rm = T), length.out = xNItv + 1)]
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
#Returning table
return(tab_plot)
}
before.mem.size <- memory.size()
# Simulation of dataset
nrow <- 6e5
ncol <- 60
dt <- do.call(data.table, lapply(seq(ncol), function(i) {return(runif(nrow))}) %>% set_names(paste0("x", seq(ncol))))
# Graph plotting
dummyEnv <- new.env()
with(dummyEnv, {
fcn <- function(tab) {
binned.dt <- tab_by_bin_idxy(dt = tab, x = "x1", y = "x2", xNItv = 50, yNItv = 50)
plot <- ggplot(binned.dt, aes(x = xbinMid, y = ybinMid)) + geom_point(aes(size = N))
return(plot)
}
lst_plots <- list(
plot1 = fcn(dt),
plot2 = fcn(dt[x1 <= 0.7]),
plot3 = fcn(dt[x5 <= 0.3])
)
assign("size.of.plots", object.size(lst_plots), envir = .GlobalEnv)
})
rm(dummyEnv)
# After use, remove and clean up of dataset
rm(dt)
gc();gc()
after.mem.size <- memory.size()
# Memory reports
print(paste0("before.mem.size = ", before.mem.size))
print(paste0("after.mem.size = ", after.mem.size))
print(paste0("plot.objs.size = ", size.of.plots / 1000000))
I have tried the following modifications to the code:
- Inside
fcn
, removingggplot
and returning aNULL
instead of a plot object: The memory leakage is totally gone. But this is not a solution. I need the plot. - The less plots requested / less columns / less rows passed to
fcn
, the less is the memory leakage. - Memory leakage also exists if I do not make any subset and make only one plot object (In the examples, I plotted 3).
- After the process, even after I call
rm(list = ls())
, the memory is still non-recoverable.
I wish to know why this happens and how to get rid of it without compromising my need to do binned plots and subset dt
to make different plots.
Thanks for attention!
r memory ggplot2 memory-leaks data.table
Addwith(dummyEnv, rm(list = ls()))
before removing the environment.
– Roland
Nov 15 '18 at 7:49
Thank you for your comment. Yes, the suggestion does help to mitigate the inflation in memory used, but memory leakage still occurs at large especially when the data size is big. Is there any other possible sources of leakage?
– Matthew Hui
Nov 15 '18 at 9:36
I would be more careful with calling something "memory leak". It's not easy to investigate this as you have (i) use of environments, (ii) a package object that has some special behavior regarding memory, (iii) two other packages in your example. My suggestion would be to not createdummyEnv
.
– Roland
Nov 15 '18 at 11:18
It might well be that you have found a bug in R or data.table but right now I can't confirm this and can't see which one.
– Roland
Nov 15 '18 at 11:19
Agree that it is not necessarily memory leak. Not creatingdummyEnv
does not help but intensify the increase in memory usage. I suppose it is something in howggplot
handle the dataset... Thank you for helping
– Matthew Hui
Nov 15 '18 at 16:27
add a comment |
I am making various ggplot
s on a very large dataset (much larger than the examples). I created a binning function on both x- and y-axes to enable plotting of such large dataset.
In the following example, the memory.size()
is recorded at the start. Then the large dataset is simulated as dt
. dt
's x2
is plotted against x1
with binning. Plotting is repeated with different subsets of dt
. The size of the ploted object is checked by object.size()
and stored. After the plotting objects have been created, rm(dt)
is executed, followed by a double gc()
. At this point, memory.size()
is recorded again. At the end, the memory.size()
at the end is compared to that at the beginning and printed.
In view of the small size of the plotted object, it is expected that the memory.size()
at the end should be similar to that at the beginning. But no. memory.size()
does not go down anymore until I restart a new R session.
REPRODUCIBLE EXAMPLE
library(data.table)
library(ggplot2)
library(magrittr)
# The binning function
# x = column name for x-axis (character)
# y = column name for y-axis (character)
# xNItv = Number of bin for x-axis
# yNItv = Number of bin for y-axis
# Value: A binned data.table
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv) {
#Binning
xBreaks = dt[, seq(min(get(x), na.rm = T), max(get(x), na.rm = T), length.out = xNItv + 1)]
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
#Returning table
return(tab_plot)
}
before.mem.size <- memory.size()
# Simulation of dataset
nrow <- 6e5
ncol <- 60
dt <- do.call(data.table, lapply(seq(ncol), function(i) {return(runif(nrow))}) %>% set_names(paste0("x", seq(ncol))))
# Graph plotting
dummyEnv <- new.env()
with(dummyEnv, {
fcn <- function(tab) {
binned.dt <- tab_by_bin_idxy(dt = tab, x = "x1", y = "x2", xNItv = 50, yNItv = 50)
plot <- ggplot(binned.dt, aes(x = xbinMid, y = ybinMid)) + geom_point(aes(size = N))
return(plot)
}
lst_plots <- list(
plot1 = fcn(dt),
plot2 = fcn(dt[x1 <= 0.7]),
plot3 = fcn(dt[x5 <= 0.3])
)
assign("size.of.plots", object.size(lst_plots), envir = .GlobalEnv)
})
rm(dummyEnv)
# After use, remove and clean up of dataset
rm(dt)
gc();gc()
after.mem.size <- memory.size()
# Memory reports
print(paste0("before.mem.size = ", before.mem.size))
print(paste0("after.mem.size = ", after.mem.size))
print(paste0("plot.objs.size = ", size.of.plots / 1000000))
I have tried the following modifications to the code:
- Inside
fcn
, removingggplot
and returning aNULL
instead of a plot object: The memory leakage is totally gone. But this is not a solution. I need the plot. - The less plots requested / less columns / less rows passed to
fcn
, the less is the memory leakage. - Memory leakage also exists if I do not make any subset and make only one plot object (In the examples, I plotted 3).
- After the process, even after I call
rm(list = ls())
, the memory is still non-recoverable.
I wish to know why this happens and how to get rid of it without compromising my need to do binned plots and subset dt
to make different plots.
Thanks for attention!
r memory ggplot2 memory-leaks data.table
I am making various ggplot
s on a very large dataset (much larger than the examples). I created a binning function on both x- and y-axes to enable plotting of such large dataset.
In the following example, the memory.size()
is recorded at the start. Then the large dataset is simulated as dt
. dt
's x2
is plotted against x1
with binning. Plotting is repeated with different subsets of dt
. The size of the ploted object is checked by object.size()
and stored. After the plotting objects have been created, rm(dt)
is executed, followed by a double gc()
. At this point, memory.size()
is recorded again. At the end, the memory.size()
at the end is compared to that at the beginning and printed.
In view of the small size of the plotted object, it is expected that the memory.size()
at the end should be similar to that at the beginning. But no. memory.size()
does not go down anymore until I restart a new R session.
REPRODUCIBLE EXAMPLE
library(data.table)
library(ggplot2)
library(magrittr)
# The binning function
# x = column name for x-axis (character)
# y = column name for y-axis (character)
# xNItv = Number of bin for x-axis
# yNItv = Number of bin for y-axis
# Value: A binned data.table
tab_by_bin_idxy <- function(dt, x, y, xNItv, yNItv) {
#Binning
xBreaks = dt[, seq(min(get(x), na.rm = T), max(get(x), na.rm = T), length.out = xNItv + 1)]
yBreaks = dt[, seq(min(get(y), na.rm = T), max(get(y), na.rm = T), length.out = yNItv + 1)]
xbinCode = dt[, .bincode(get(x), breaks = xBreaks, include.lowest = T)]
xbinMid = sapply(seq(xNItv), function(i) {return(mean(xBreaks[c(i, i+1)]))})[xbinCode]
ybinCode = dt[, .bincode(get(y), breaks = yBreaks, include.lowest = T)]
ybinMid = sapply(seq(yNItv), function(i) {return(mean(yBreaks[c(i, i+1)]))})[ybinCode]
#Creating table
tab_match = CJ(xbinCode = seq(xNItv), ybinCode = seq(yNItv))
tab_plot = data.table(xbinCode, xbinMid, ybinCode, ybinMid)[
tab_match, .(xbinMid = xbinMid[1], ybinMid = ybinMid[1], N = .N), keyby = .EACHI, on = c("xbinCode", "ybinCode")
]
#Returning table
return(tab_plot)
}
before.mem.size <- memory.size()
# Simulation of dataset
nrow <- 6e5
ncol <- 60
dt <- do.call(data.table, lapply(seq(ncol), function(i) {return(runif(nrow))}) %>% set_names(paste0("x", seq(ncol))))
# Graph plotting
dummyEnv <- new.env()
with(dummyEnv, {
fcn <- function(tab) {
binned.dt <- tab_by_bin_idxy(dt = tab, x = "x1", y = "x2", xNItv = 50, yNItv = 50)
plot <- ggplot(binned.dt, aes(x = xbinMid, y = ybinMid)) + geom_point(aes(size = N))
return(plot)
}
lst_plots <- list(
plot1 = fcn(dt),
plot2 = fcn(dt[x1 <= 0.7]),
plot3 = fcn(dt[x5 <= 0.3])
)
assign("size.of.plots", object.size(lst_plots), envir = .GlobalEnv)
})
rm(dummyEnv)
# After use, remove and clean up of dataset
rm(dt)
gc();gc()
after.mem.size <- memory.size()
# Memory reports
print(paste0("before.mem.size = ", before.mem.size))
print(paste0("after.mem.size = ", after.mem.size))
print(paste0("plot.objs.size = ", size.of.plots / 1000000))
I have tried the following modifications to the code:
- Inside
fcn
, removingggplot
and returning aNULL
instead of a plot object: The memory leakage is totally gone. But this is not a solution. I need the plot. - The less plots requested / less columns / less rows passed to
fcn
, the less is the memory leakage. - Memory leakage also exists if I do not make any subset and make only one plot object (In the examples, I plotted 3).
- After the process, even after I call
rm(list = ls())
, the memory is still non-recoverable.
I wish to know why this happens and how to get rid of it without compromising my need to do binned plots and subset dt
to make different plots.
Thanks for attention!
r memory ggplot2 memory-leaks data.table
r memory ggplot2 memory-leaks data.table
asked Nov 15 '18 at 5:16
Matthew HuiMatthew Hui
16210
16210
Addwith(dummyEnv, rm(list = ls()))
before removing the environment.
– Roland
Nov 15 '18 at 7:49
Thank you for your comment. Yes, the suggestion does help to mitigate the inflation in memory used, but memory leakage still occurs at large especially when the data size is big. Is there any other possible sources of leakage?
– Matthew Hui
Nov 15 '18 at 9:36
I would be more careful with calling something "memory leak". It's not easy to investigate this as you have (i) use of environments, (ii) a package object that has some special behavior regarding memory, (iii) two other packages in your example. My suggestion would be to not createdummyEnv
.
– Roland
Nov 15 '18 at 11:18
It might well be that you have found a bug in R or data.table but right now I can't confirm this and can't see which one.
– Roland
Nov 15 '18 at 11:19
Agree that it is not necessarily memory leak. Not creatingdummyEnv
does not help but intensify the increase in memory usage. I suppose it is something in howggplot
handle the dataset... Thank you for helping
– Matthew Hui
Nov 15 '18 at 16:27
add a comment |
Addwith(dummyEnv, rm(list = ls()))
before removing the environment.
– Roland
Nov 15 '18 at 7:49
Thank you for your comment. Yes, the suggestion does help to mitigate the inflation in memory used, but memory leakage still occurs at large especially when the data size is big. Is there any other possible sources of leakage?
– Matthew Hui
Nov 15 '18 at 9:36
I would be more careful with calling something "memory leak". It's not easy to investigate this as you have (i) use of environments, (ii) a package object that has some special behavior regarding memory, (iii) two other packages in your example. My suggestion would be to not createdummyEnv
.
– Roland
Nov 15 '18 at 11:18
It might well be that you have found a bug in R or data.table but right now I can't confirm this and can't see which one.
– Roland
Nov 15 '18 at 11:19
Agree that it is not necessarily memory leak. Not creatingdummyEnv
does not help but intensify the increase in memory usage. I suppose it is something in howggplot
handle the dataset... Thank you for helping
– Matthew Hui
Nov 15 '18 at 16:27
Add
with(dummyEnv, rm(list = ls()))
before removing the environment.– Roland
Nov 15 '18 at 7:49
Add
with(dummyEnv, rm(list = ls()))
before removing the environment.– Roland
Nov 15 '18 at 7:49
Thank you for your comment. Yes, the suggestion does help to mitigate the inflation in memory used, but memory leakage still occurs at large especially when the data size is big. Is there any other possible sources of leakage?
– Matthew Hui
Nov 15 '18 at 9:36
Thank you for your comment. Yes, the suggestion does help to mitigate the inflation in memory used, but memory leakage still occurs at large especially when the data size is big. Is there any other possible sources of leakage?
– Matthew Hui
Nov 15 '18 at 9:36
I would be more careful with calling something "memory leak". It's not easy to investigate this as you have (i) use of environments, (ii) a package object that has some special behavior regarding memory, (iii) two other packages in your example. My suggestion would be to not create
dummyEnv
.– Roland
Nov 15 '18 at 11:18
I would be more careful with calling something "memory leak". It's not easy to investigate this as you have (i) use of environments, (ii) a package object that has some special behavior regarding memory, (iii) two other packages in your example. My suggestion would be to not create
dummyEnv
.– Roland
Nov 15 '18 at 11:18
It might well be that you have found a bug in R or data.table but right now I can't confirm this and can't see which one.
– Roland
Nov 15 '18 at 11:19
It might well be that you have found a bug in R or data.table but right now I can't confirm this and can't see which one.
– Roland
Nov 15 '18 at 11:19
Agree that it is not necessarily memory leak. Not creating
dummyEnv
does not help but intensify the increase in memory usage. I suppose it is something in how ggplot
handle the dataset... Thank you for helping– Matthew Hui
Nov 15 '18 at 16:27
Agree that it is not necessarily memory leak. Not creating
dummyEnv
does not help but intensify the increase in memory usage. I suppose it is something in how ggplot
handle the dataset... Thank you for helping– Matthew Hui
Nov 15 '18 at 16:27
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312860%2fmemory-leakage-in-using-ggplot-on-large-binned-datasets%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53312860%2fmemory-leakage-in-using-ggplot-on-large-binned-datasets%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Add
with(dummyEnv, rm(list = ls()))
before removing the environment.– Roland
Nov 15 '18 at 7:49
Thank you for your comment. Yes, the suggestion does help to mitigate the inflation in memory used, but memory leakage still occurs at large especially when the data size is big. Is there any other possible sources of leakage?
– Matthew Hui
Nov 15 '18 at 9:36
I would be more careful with calling something "memory leak". It's not easy to investigate this as you have (i) use of environments, (ii) a package object that has some special behavior regarding memory, (iii) two other packages in your example. My suggestion would be to not create
dummyEnv
.– Roland
Nov 15 '18 at 11:18
It might well be that you have found a bug in R or data.table but right now I can't confirm this and can't see which one.
– Roland
Nov 15 '18 at 11:19
Agree that it is not necessarily memory leak. Not creating
dummyEnv
does not help but intensify the increase in memory usage. I suppose it is something in howggplot
handle the dataset... Thank you for helping– Matthew Hui
Nov 15 '18 at 16:27