TensorFlow efficient shared memory allocation for recursive concatenation

up vote
9
down vote

favorite

DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:

densenet shared memory

How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?

Pytorch efficient DenseNet implementation

Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.

I've created a TensorFlow Feature Request for necessary allocation functionality.

edited Oct 3 '17 at 21:52

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14

However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27

add a comment |

up vote
9
down vote

favorite

densenet shared memory

How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?

Pytorch efficient DenseNet implementation

Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.

I've created a TensorFlow Feature Request for necessary allocation functionality.

edited Oct 3 '17 at 21:52

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14

However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27

add a comment |

up vote
9
down vote

favorite

densenet shared memory

How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?

Pytorch efficient DenseNet implementation

Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.

I've created a TensorFlow Feature Request for necessary allocation functionality.

edited Oct 3 '17 at 21:52

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

densenet shared memory

How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?

Pytorch efficient DenseNet implementation

Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.

I've created a TensorFlow Feature Request for necessary allocation functionality.

python c++ memory-management tensorflow tensorflow-gpu

edited Oct 3 '17 at 21:52

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

edited Oct 3 '17 at 21:52

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

edited Oct 3 '17 at 21:52

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

asked Sep 8 '17 at 21:50

Andrew Hundt

98711747

This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14

However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27

add a comment |

This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14

However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27

This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14

However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

A memory efficient implementation is now available at:

https://github.com/joeyearsley/efficient_densenet_tensorflow

The relevant function from the above link is:

# Gradient checkpoint the layer

_x = tf.contrib.layers.recompute_grad(_x)

answered Nov 10 at 22:15

Andrew Hundt

98711747

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46124965%2ftensorflow-efficient-shared-memory-allocation-for-recursive-concatenation%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

A memory efficient implementation is now available at:

https://github.com/joeyearsley/efficient_densenet_tensorflow

The relevant function from the above link is:

# Gradient checkpoint the layer

_x = tf.contrib.layers.recompute_grad(_x)

answered Nov 10 at 22:15

Andrew Hundt

98711747

add a comment |

up vote
0
down vote

A memory efficient implementation is now available at:

https://github.com/joeyearsley/efficient_densenet_tensorflow

The relevant function from the above link is:

# Gradient checkpoint the layer

_x = tf.contrib.layers.recompute_grad(_x)

answered Nov 10 at 22:15

Andrew Hundt

98711747

add a comment |

up vote
0
down vote

A memory efficient implementation is now available at:

https://github.com/joeyearsley/efficient_densenet_tensorflow

The relevant function from the above link is:

# Gradient checkpoint the layer

_x = tf.contrib.layers.recompute_grad(_x)

answered Nov 10 at 22:15

Andrew Hundt

98711747

A memory efficient implementation is now available at:

https://github.com/joeyearsley/efficient_densenet_tensorflow

The relevant function from the above link is:

# Gradient checkpoint the layer

_x = tf.contrib.layers.recompute_grad(_x)

answered Nov 10 at 22:15

Andrew Hundt

98711747

answered Nov 10 at 22:15

Andrew Hundt

98711747

answered Nov 10 at 22:15

Andrew Hundt

98711747

answered Nov 10 at 22:15

Andrew Hundt

98711747

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

x9StbkqxbGV,8aWDZaSky9H33md9zZYcR,8aix

搜尋此網誌

Vfrdtyky