Use and modify variables in tensorflow bijectors





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







2















In the reference paper for TensorFlow Distributions (now Probability), it is mentioned that TensorFlow Variables can be used to construct Bijector and TransformedDistribution objects, i.e.:



import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions

tf.enable_eager_execution()

shift = tf.Variable(1., dtype=tf.float32)
myBij = tfp.bijectors.Affine(shift=shift)

# Normal distribution centered in zero, then shifted to 1 using the bijection
myDistr = tfd.TransformedDistribution(
distribution=tfd.Normal(loc=0., scale=1.),
bijector=myBij,
name="test")

# 2 samples of a normal centered at 1:
y = myDistr.sample(2)
# 2 samples of a normal centered at 0, obtained using inverse transform of myBij:
x = myBij.inverse(y)


I would now like to modify the shift variable (say, I might compute gradients of some likelihood function as a function of the shift and update its value) so I do



shift.assign(2.)
gx = myBij.forward(x)


I would expect that gx=y+1, but I see that gx=y... And indeed, myBij.shift still evalues to 1.



If I try to modify the bijector directly, i.e.:



myBij.shift.assign(2.)


I get



AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'assign'


Computing gradients also does not work as expected:



with tf.GradientTape() as tape:
gx = myBij.forward(x)
grad = tape.gradient(gx, shift)


Yields None, as well as this exception when the script ends:



Exception ignored in: <bound method GradientTape.__del__ of <tensorflow.python.eager.backprop.GradientTape object at 0x7f529c4702e8>>
Traceback (most recent call last):
File "~/.local/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 765, in __del__
AttributeError: 'NoneType' object has no attribute 'context'


What am I missing here?



Edit: I got it working with a graph/session, so it seems there is an issue with eager execution...



Note: I have tensorflow version 1.12.0 and tensorflow_probability version 0.5.0










share|improve this question































    2















    In the reference paper for TensorFlow Distributions (now Probability), it is mentioned that TensorFlow Variables can be used to construct Bijector and TransformedDistribution objects, i.e.:



    import tensorflow as tf
    import tensorflow_probability as tfp
    tfd = tfp.distributions

    tf.enable_eager_execution()

    shift = tf.Variable(1., dtype=tf.float32)
    myBij = tfp.bijectors.Affine(shift=shift)

    # Normal distribution centered in zero, then shifted to 1 using the bijection
    myDistr = tfd.TransformedDistribution(
    distribution=tfd.Normal(loc=0., scale=1.),
    bijector=myBij,
    name="test")

    # 2 samples of a normal centered at 1:
    y = myDistr.sample(2)
    # 2 samples of a normal centered at 0, obtained using inverse transform of myBij:
    x = myBij.inverse(y)


    I would now like to modify the shift variable (say, I might compute gradients of some likelihood function as a function of the shift and update its value) so I do



    shift.assign(2.)
    gx = myBij.forward(x)


    I would expect that gx=y+1, but I see that gx=y... And indeed, myBij.shift still evalues to 1.



    If I try to modify the bijector directly, i.e.:



    myBij.shift.assign(2.)


    I get



    AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'assign'


    Computing gradients also does not work as expected:



    with tf.GradientTape() as tape:
    gx = myBij.forward(x)
    grad = tape.gradient(gx, shift)


    Yields None, as well as this exception when the script ends:



    Exception ignored in: <bound method GradientTape.__del__ of <tensorflow.python.eager.backprop.GradientTape object at 0x7f529c4702e8>>
    Traceback (most recent call last):
    File "~/.local/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 765, in __del__
    AttributeError: 'NoneType' object has no attribute 'context'


    What am I missing here?



    Edit: I got it working with a graph/session, so it seems there is an issue with eager execution...



    Note: I have tensorflow version 1.12.0 and tensorflow_probability version 0.5.0










    share|improve this question



























      2












      2








      2


      1






      In the reference paper for TensorFlow Distributions (now Probability), it is mentioned that TensorFlow Variables can be used to construct Bijector and TransformedDistribution objects, i.e.:



      import tensorflow as tf
      import tensorflow_probability as tfp
      tfd = tfp.distributions

      tf.enable_eager_execution()

      shift = tf.Variable(1., dtype=tf.float32)
      myBij = tfp.bijectors.Affine(shift=shift)

      # Normal distribution centered in zero, then shifted to 1 using the bijection
      myDistr = tfd.TransformedDistribution(
      distribution=tfd.Normal(loc=0., scale=1.),
      bijector=myBij,
      name="test")

      # 2 samples of a normal centered at 1:
      y = myDistr.sample(2)
      # 2 samples of a normal centered at 0, obtained using inverse transform of myBij:
      x = myBij.inverse(y)


      I would now like to modify the shift variable (say, I might compute gradients of some likelihood function as a function of the shift and update its value) so I do



      shift.assign(2.)
      gx = myBij.forward(x)


      I would expect that gx=y+1, but I see that gx=y... And indeed, myBij.shift still evalues to 1.



      If I try to modify the bijector directly, i.e.:



      myBij.shift.assign(2.)


      I get



      AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'assign'


      Computing gradients also does not work as expected:



      with tf.GradientTape() as tape:
      gx = myBij.forward(x)
      grad = tape.gradient(gx, shift)


      Yields None, as well as this exception when the script ends:



      Exception ignored in: <bound method GradientTape.__del__ of <tensorflow.python.eager.backprop.GradientTape object at 0x7f529c4702e8>>
      Traceback (most recent call last):
      File "~/.local/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 765, in __del__
      AttributeError: 'NoneType' object has no attribute 'context'


      What am I missing here?



      Edit: I got it working with a graph/session, so it seems there is an issue with eager execution...



      Note: I have tensorflow version 1.12.0 and tensorflow_probability version 0.5.0










      share|improve this question
















      In the reference paper for TensorFlow Distributions (now Probability), it is mentioned that TensorFlow Variables can be used to construct Bijector and TransformedDistribution objects, i.e.:



      import tensorflow as tf
      import tensorflow_probability as tfp
      tfd = tfp.distributions

      tf.enable_eager_execution()

      shift = tf.Variable(1., dtype=tf.float32)
      myBij = tfp.bijectors.Affine(shift=shift)

      # Normal distribution centered in zero, then shifted to 1 using the bijection
      myDistr = tfd.TransformedDistribution(
      distribution=tfd.Normal(loc=0., scale=1.),
      bijector=myBij,
      name="test")

      # 2 samples of a normal centered at 1:
      y = myDistr.sample(2)
      # 2 samples of a normal centered at 0, obtained using inverse transform of myBij:
      x = myBij.inverse(y)


      I would now like to modify the shift variable (say, I might compute gradients of some likelihood function as a function of the shift and update its value) so I do



      shift.assign(2.)
      gx = myBij.forward(x)


      I would expect that gx=y+1, but I see that gx=y... And indeed, myBij.shift still evalues to 1.



      If I try to modify the bijector directly, i.e.:



      myBij.shift.assign(2.)


      I get



      AttributeError: 'tensorflow.python.framework.ops.EagerTensor' object has no attribute 'assign'


      Computing gradients also does not work as expected:



      with tf.GradientTape() as tape:
      gx = myBij.forward(x)
      grad = tape.gradient(gx, shift)


      Yields None, as well as this exception when the script ends:



      Exception ignored in: <bound method GradientTape.__del__ of <tensorflow.python.eager.backprop.GradientTape object at 0x7f529c4702e8>>
      Traceback (most recent call last):
      File "~/.local/lib/python3.6/site-packages/tensorflow/python/eager/backprop.py", line 765, in __del__
      AttributeError: 'NoneType' object has no attribute 'context'


      What am I missing here?



      Edit: I got it working with a graph/session, so it seems there is an issue with eager execution...



      Note: I have tensorflow version 1.12.0 and tensorflow_probability version 0.5.0







      python tensorflow machine-learning tensorflow-probability






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 16:33







      swertz

















      asked Nov 16 '18 at 13:38









      swertzswertz

      214




      214
























          1 Answer
          1






          active

          oldest

          votes


















          1














          If you are using eager mode, you will need to recompute everything from the variable forward. Best to capture this logic in a function;



          import tensorflow as tf
          import tensorflow_probability as tfp
          tfd = tfp.distributions

          tf.enable_eager_execution()

          shift = tf.Variable(1., dtype=tf.float32)
          def f():
          myBij = tfp.bijectors.Affine(shift=shift)

          # Normal distribution centered in zero, then shifted to 1 using the bijection
          myDistr = tfd.TransformedDistribution(
          distribution=tfd.Normal(loc=0., scale=1.),
          bijector=myBij,
          name="test")

          # 2 samples of a normal centered at 1:
          y = myDistr.sample(2)
          # 2 samples of a normal centered at 0, obtained using inverse
          # transform of myBij:
          x = myBij.inverse(y)
          return x, y
          x, y = f()
          shift.assign(2.)
          gx, _ = f()


          Regarding gradients, you will need to wrap calls to f() in a GradientTape






          share|improve this answer
























          • Thanks, I see! Now if I don't want to simply sample the function again, but (for instance) compute the likelihood of some fixed data using myDistr.log_prob(), and gradient-ascend the shift variable to maximise that likelihood: does it mean I have to re-create the bijection and transformed distribution object for each step? That seems to involve a lot of overhead (especially if the bijection is a complex normalising flow) compared to what is possible in "regular" graph mode...?

            – swertz
            Nov 27 '18 at 20:36






          • 1





            Yes, you should think of using eager mode more or less like using python floats or numpy arrays. If you change the value of shift, and want to compute some chain using its updated value, you have to repeat the computation. The Bijector[s] will have converted shift to a tensor (tf.convert_to_tensor), which induces a read on the variable and returns a tf.Tensor with a fixed .numpy() value acquired at the time it was read. Note that keras does things a little differently, and we are looking at a revamped trainable layers API in TFP.

            – Brian Patton
            Dec 5 '18 at 15:46














          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338975%2fuse-and-modify-variables-in-tensorflow-bijectors%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          If you are using eager mode, you will need to recompute everything from the variable forward. Best to capture this logic in a function;



          import tensorflow as tf
          import tensorflow_probability as tfp
          tfd = tfp.distributions

          tf.enable_eager_execution()

          shift = tf.Variable(1., dtype=tf.float32)
          def f():
          myBij = tfp.bijectors.Affine(shift=shift)

          # Normal distribution centered in zero, then shifted to 1 using the bijection
          myDistr = tfd.TransformedDistribution(
          distribution=tfd.Normal(loc=0., scale=1.),
          bijector=myBij,
          name="test")

          # 2 samples of a normal centered at 1:
          y = myDistr.sample(2)
          # 2 samples of a normal centered at 0, obtained using inverse
          # transform of myBij:
          x = myBij.inverse(y)
          return x, y
          x, y = f()
          shift.assign(2.)
          gx, _ = f()


          Regarding gradients, you will need to wrap calls to f() in a GradientTape






          share|improve this answer
























          • Thanks, I see! Now if I don't want to simply sample the function again, but (for instance) compute the likelihood of some fixed data using myDistr.log_prob(), and gradient-ascend the shift variable to maximise that likelihood: does it mean I have to re-create the bijection and transformed distribution object for each step? That seems to involve a lot of overhead (especially if the bijection is a complex normalising flow) compared to what is possible in "regular" graph mode...?

            – swertz
            Nov 27 '18 at 20:36






          • 1





            Yes, you should think of using eager mode more or less like using python floats or numpy arrays. If you change the value of shift, and want to compute some chain using its updated value, you have to repeat the computation. The Bijector[s] will have converted shift to a tensor (tf.convert_to_tensor), which induces a read on the variable and returns a tf.Tensor with a fixed .numpy() value acquired at the time it was read. Note that keras does things a little differently, and we are looking at a revamped trainable layers API in TFP.

            – Brian Patton
            Dec 5 '18 at 15:46


















          1














          If you are using eager mode, you will need to recompute everything from the variable forward. Best to capture this logic in a function;



          import tensorflow as tf
          import tensorflow_probability as tfp
          tfd = tfp.distributions

          tf.enable_eager_execution()

          shift = tf.Variable(1., dtype=tf.float32)
          def f():
          myBij = tfp.bijectors.Affine(shift=shift)

          # Normal distribution centered in zero, then shifted to 1 using the bijection
          myDistr = tfd.TransformedDistribution(
          distribution=tfd.Normal(loc=0., scale=1.),
          bijector=myBij,
          name="test")

          # 2 samples of a normal centered at 1:
          y = myDistr.sample(2)
          # 2 samples of a normal centered at 0, obtained using inverse
          # transform of myBij:
          x = myBij.inverse(y)
          return x, y
          x, y = f()
          shift.assign(2.)
          gx, _ = f()


          Regarding gradients, you will need to wrap calls to f() in a GradientTape






          share|improve this answer
























          • Thanks, I see! Now if I don't want to simply sample the function again, but (for instance) compute the likelihood of some fixed data using myDistr.log_prob(), and gradient-ascend the shift variable to maximise that likelihood: does it mean I have to re-create the bijection and transformed distribution object for each step? That seems to involve a lot of overhead (especially if the bijection is a complex normalising flow) compared to what is possible in "regular" graph mode...?

            – swertz
            Nov 27 '18 at 20:36






          • 1





            Yes, you should think of using eager mode more or less like using python floats or numpy arrays. If you change the value of shift, and want to compute some chain using its updated value, you have to repeat the computation. The Bijector[s] will have converted shift to a tensor (tf.convert_to_tensor), which induces a read on the variable and returns a tf.Tensor with a fixed .numpy() value acquired at the time it was read. Note that keras does things a little differently, and we are looking at a revamped trainable layers API in TFP.

            – Brian Patton
            Dec 5 '18 at 15:46
















          1












          1








          1







          If you are using eager mode, you will need to recompute everything from the variable forward. Best to capture this logic in a function;



          import tensorflow as tf
          import tensorflow_probability as tfp
          tfd = tfp.distributions

          tf.enable_eager_execution()

          shift = tf.Variable(1., dtype=tf.float32)
          def f():
          myBij = tfp.bijectors.Affine(shift=shift)

          # Normal distribution centered in zero, then shifted to 1 using the bijection
          myDistr = tfd.TransformedDistribution(
          distribution=tfd.Normal(loc=0., scale=1.),
          bijector=myBij,
          name="test")

          # 2 samples of a normal centered at 1:
          y = myDistr.sample(2)
          # 2 samples of a normal centered at 0, obtained using inverse
          # transform of myBij:
          x = myBij.inverse(y)
          return x, y
          x, y = f()
          shift.assign(2.)
          gx, _ = f()


          Regarding gradients, you will need to wrap calls to f() in a GradientTape






          share|improve this answer













          If you are using eager mode, you will need to recompute everything from the variable forward. Best to capture this logic in a function;



          import tensorflow as tf
          import tensorflow_probability as tfp
          tfd = tfp.distributions

          tf.enable_eager_execution()

          shift = tf.Variable(1., dtype=tf.float32)
          def f():
          myBij = tfp.bijectors.Affine(shift=shift)

          # Normal distribution centered in zero, then shifted to 1 using the bijection
          myDistr = tfd.TransformedDistribution(
          distribution=tfd.Normal(loc=0., scale=1.),
          bijector=myBij,
          name="test")

          # 2 samples of a normal centered at 1:
          y = myDistr.sample(2)
          # 2 samples of a normal centered at 0, obtained using inverse
          # transform of myBij:
          x = myBij.inverse(y)
          return x, y
          x, y = f()
          shift.assign(2.)
          gx, _ = f()


          Regarding gradients, you will need to wrap calls to f() in a GradientTape







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 26 '18 at 19:10









          Brian PattonBrian Patton

          1111




          1111













          • Thanks, I see! Now if I don't want to simply sample the function again, but (for instance) compute the likelihood of some fixed data using myDistr.log_prob(), and gradient-ascend the shift variable to maximise that likelihood: does it mean I have to re-create the bijection and transformed distribution object for each step? That seems to involve a lot of overhead (especially if the bijection is a complex normalising flow) compared to what is possible in "regular" graph mode...?

            – swertz
            Nov 27 '18 at 20:36






          • 1





            Yes, you should think of using eager mode more or less like using python floats or numpy arrays. If you change the value of shift, and want to compute some chain using its updated value, you have to repeat the computation. The Bijector[s] will have converted shift to a tensor (tf.convert_to_tensor), which induces a read on the variable and returns a tf.Tensor with a fixed .numpy() value acquired at the time it was read. Note that keras does things a little differently, and we are looking at a revamped trainable layers API in TFP.

            – Brian Patton
            Dec 5 '18 at 15:46





















          • Thanks, I see! Now if I don't want to simply sample the function again, but (for instance) compute the likelihood of some fixed data using myDistr.log_prob(), and gradient-ascend the shift variable to maximise that likelihood: does it mean I have to re-create the bijection and transformed distribution object for each step? That seems to involve a lot of overhead (especially if the bijection is a complex normalising flow) compared to what is possible in "regular" graph mode...?

            – swertz
            Nov 27 '18 at 20:36






          • 1





            Yes, you should think of using eager mode more or less like using python floats or numpy arrays. If you change the value of shift, and want to compute some chain using its updated value, you have to repeat the computation. The Bijector[s] will have converted shift to a tensor (tf.convert_to_tensor), which induces a read on the variable and returns a tf.Tensor with a fixed .numpy() value acquired at the time it was read. Note that keras does things a little differently, and we are looking at a revamped trainable layers API in TFP.

            – Brian Patton
            Dec 5 '18 at 15:46



















          Thanks, I see! Now if I don't want to simply sample the function again, but (for instance) compute the likelihood of some fixed data using myDistr.log_prob(), and gradient-ascend the shift variable to maximise that likelihood: does it mean I have to re-create the bijection and transformed distribution object for each step? That seems to involve a lot of overhead (especially if the bijection is a complex normalising flow) compared to what is possible in "regular" graph mode...?

          – swertz
          Nov 27 '18 at 20:36





          Thanks, I see! Now if I don't want to simply sample the function again, but (for instance) compute the likelihood of some fixed data using myDistr.log_prob(), and gradient-ascend the shift variable to maximise that likelihood: does it mean I have to re-create the bijection and transformed distribution object for each step? That seems to involve a lot of overhead (especially if the bijection is a complex normalising flow) compared to what is possible in "regular" graph mode...?

          – swertz
          Nov 27 '18 at 20:36




          1




          1





          Yes, you should think of using eager mode more or less like using python floats or numpy arrays. If you change the value of shift, and want to compute some chain using its updated value, you have to repeat the computation. The Bijector[s] will have converted shift to a tensor (tf.convert_to_tensor), which induces a read on the variable and returns a tf.Tensor with a fixed .numpy() value acquired at the time it was read. Note that keras does things a little differently, and we are looking at a revamped trainable layers API in TFP.

          – Brian Patton
          Dec 5 '18 at 15:46







          Yes, you should think of using eager mode more or less like using python floats or numpy arrays. If you change the value of shift, and want to compute some chain using its updated value, you have to repeat the computation. The Bijector[s] will have converted shift to a tensor (tf.convert_to_tensor), which induces a read on the variable and returns a tf.Tensor with a fixed .numpy() value acquired at the time it was read. Note that keras does things a little differently, and we are looking at a revamped trainable layers API in TFP.

          – Brian Patton
          Dec 5 '18 at 15:46






















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53338975%2fuse-and-modify-variables-in-tensorflow-bijectors%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          List item for chat from Array inside array React Native

          Thiostrepton

          Caerphilly