TensorFlow efficient shared memory allocation for recursive concatenation
up vote
9
down vote
favorite
DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:
How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?
- Pytorch efficient DenseNet implementation
Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.
I've created a TensorFlow Feature Request for necessary allocation functionality.
python c++ memory-management tensorflow tensorflow-gpu
add a comment |
up vote
9
down vote
favorite
DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:
How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?
- Pytorch efficient DenseNet implementation
Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.
I've created a TensorFlow Feature Request for necessary allocation functionality.
python c++ memory-management tensorflow tensorflow-gpu
This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14
However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27
add a comment |
up vote
9
down vote
favorite
up vote
9
down vote
favorite
DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:
How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?
- Pytorch efficient DenseNet implementation
Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.
I've created a TensorFlow Feature Request for necessary allocation functionality.
python c++ memory-management tensorflow tensorflow-gpu
DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:
How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?
- Pytorch efficient DenseNet implementation
Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.
I've created a TensorFlow Feature Request for necessary allocation functionality.
python c++ memory-management tensorflow tensorflow-gpu
python c++ memory-management tensorflow tensorflow-gpu
edited Oct 3 '17 at 21:52
asked Sep 8 '17 at 21:50
Andrew Hundt
98711747
98711747
This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14
However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27
add a comment |
This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14
However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27
This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14
This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14
However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27
However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
A memory efficient implementation is now available at:
https://github.com/joeyearsley/efficient_densenet_tensorflow
The relevant function from the above link is:
# Gradient checkpoint the layer
_x = tf.contrib.layers.recompute_grad(_x)
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
A memory efficient implementation is now available at:
https://github.com/joeyearsley/efficient_densenet_tensorflow
The relevant function from the above link is:
# Gradient checkpoint the layer
_x = tf.contrib.layers.recompute_grad(_x)
add a comment |
up vote
0
down vote
A memory efficient implementation is now available at:
https://github.com/joeyearsley/efficient_densenet_tensorflow
The relevant function from the above link is:
# Gradient checkpoint the layer
_x = tf.contrib.layers.recompute_grad(_x)
add a comment |
up vote
0
down vote
up vote
0
down vote
A memory efficient implementation is now available at:
https://github.com/joeyearsley/efficient_densenet_tensorflow
The relevant function from the above link is:
# Gradient checkpoint the layer
_x = tf.contrib.layers.recompute_grad(_x)
A memory efficient implementation is now available at:
https://github.com/joeyearsley/efficient_densenet_tensorflow
The relevant function from the above link is:
# Gradient checkpoint the layer
_x = tf.contrib.layers.recompute_grad(_x)
answered Nov 10 at 22:15
Andrew Hundt
98711747
98711747
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46124965%2ftensorflow-efficient-shared-memory-allocation-for-recursive-concatenation%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14
However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27