TensorFlow efficient shared memory allocation for recursive concatenation











up vote
9
down vote

favorite
1












DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:



densenet shared memory



How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?




  • Pytorch efficient DenseNet implementation


  • Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.


I've created a TensorFlow Feature Request for necessary allocation functionality.










share|improve this question
























  • This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
    – Alex Weavers
    Sep 28 at 6:14










  • However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
    – Alex Weavers
    Sep 28 at 6:27















up vote
9
down vote

favorite
1












DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:



densenet shared memory



How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?




  • Pytorch efficient DenseNet implementation


  • Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.


I've created a TensorFlow Feature Request for necessary allocation functionality.










share|improve this question
























  • This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
    – Alex Weavers
    Sep 28 at 6:14










  • However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
    – Alex Weavers
    Sep 28 at 6:27













up vote
9
down vote

favorite
1









up vote
9
down vote

favorite
1






1





DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:



densenet shared memory



How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?




  • Pytorch efficient DenseNet implementation


  • Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.


I've created a TensorFlow Feature Request for necessary allocation functionality.










share|improve this question















DenseNets tend to take up a lot of memory in TensorFlow because each concat operation is stored in a separate allocation. A recent paper, Memory-Efficient Implementation of DenseNets, demonstrates that this memory utilization can be dramatically reduced through sharing of allocations. This image from the paper + pytorch implementation illustrates the shared memory approach:



densenet shared memory



How can this be implemented with TensorFlow? If it can't be done via python, how can it be properly implemented in an Op with CPU and GPU support?




  • Pytorch efficient DenseNet implementation


  • Keras DenseNet Implementation with "naive" allocations, works with TensorFlow backend.


I've created a TensorFlow Feature Request for necessary allocation functionality.







python c++ memory-management tensorflow tensorflow-gpu






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 3 '17 at 21:52

























asked Sep 8 '17 at 21:50









Andrew Hundt

98711747




98711747












  • This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
    – Alex Weavers
    Sep 28 at 6:14










  • However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
    – Alex Weavers
    Sep 28 at 6:27


















  • This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
    – Alex Weavers
    Sep 28 at 6:14










  • However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
    – Alex Weavers
    Sep 28 at 6:27
















This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14




This is a powerful idea and if it can't be done in Python, I wouldn't know why, because absolutely everything else in Python is susceptible to this kind of technique.
– Alex Weavers
Sep 28 at 6:14












However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27




However if the needed parts of Tensorflow are written in another language, the shared memory is treated differently or specially arranged, or otherwise something comes up, you're going to end up having to do major surgery and probably have a lot of cascading issues and errors when you to replace it.
– Alex Weavers
Sep 28 at 6:27












1 Answer
1






active

oldest

votes

















up vote
0
down vote













A memory efficient implementation is now available at:



https://github.com/joeyearsley/efficient_densenet_tensorflow



The relevant function from the above link is:



# Gradient checkpoint the layer
_x = tf.contrib.layers.recompute_grad(_x)





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46124965%2ftensorflow-efficient-shared-memory-allocation-for-recursive-concatenation%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    A memory efficient implementation is now available at:



    https://github.com/joeyearsley/efficient_densenet_tensorflow



    The relevant function from the above link is:



    # Gradient checkpoint the layer
    _x = tf.contrib.layers.recompute_grad(_x)





    share|improve this answer

























      up vote
      0
      down vote













      A memory efficient implementation is now available at:



      https://github.com/joeyearsley/efficient_densenet_tensorflow



      The relevant function from the above link is:



      # Gradient checkpoint the layer
      _x = tf.contrib.layers.recompute_grad(_x)





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        A memory efficient implementation is now available at:



        https://github.com/joeyearsley/efficient_densenet_tensorflow



        The relevant function from the above link is:



        # Gradient checkpoint the layer
        _x = tf.contrib.layers.recompute_grad(_x)





        share|improve this answer












        A memory efficient implementation is now available at:



        https://github.com/joeyearsley/efficient_densenet_tensorflow



        The relevant function from the above link is:



        # Gradient checkpoint the layer
        _x = tf.contrib.layers.recompute_grad(_x)






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 22:15









        Andrew Hundt

        98711747




        98711747






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f46124965%2ftensorflow-efficient-shared-memory-allocation-for-recursive-concatenation%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python