Content Overview
- Cases where gradient returns None
- No gradient registered
- Zeros instead of None
Cases where gradient
returns None
When a target is not connected to a source, gradient
will return None
.
x = tf.Variable(2.)
y = tf.Variable(3.)
with tf.GradientTape() as tape:
z = y * y
print(tape.gradient(z, x))
None
Here z
is obviously not connected to x
, but there are several less-obvious ways that a gradient can be disconnected.
1. Replaced a variable with a tensor
In the section on “controlling what the tape watches” you saw that the tape will automatically watch a tf.Variable
but not a tf.Tensor
.
One common error is to inadvertently replace a tf.Variable
with a tf.Tensor
, instead of using Variable.assign
to update the tf.Variable
. Here is an example:
x = tf.Variable(2.0)
for epoch in range(2):
with tf.GradientTape() as tape:
y = x+1
print(type(x).__name__, ":", tape.gradient(y, x))
x = x + 1 # This should be `x.assign_add(1)`
ResourceVariable : tf.Tensor(1.0, shape=(), dtype=float32)
EagerTensor : None
2. Did calculations outside of TensorFlow
The tape can’t record the gradient path if the calculation exits TensorFlow. For example:
x = tf.Variable([[1.0, 2.0],
[3.0, 4.0]], dtype=tf.float32)
with tf.GradientTape() as tape:
x2 = x**2
# This step is calculated with NumPy
y = np.mean(x2, axis=0)
# Like most ops, reduce_mean will cast the NumPy array to a constant tensor
# using `tf.convert_to_tensor`.
y = tf.reduce_mean(y, axis=0)
print(tape.gradient(y, x))
None
3. Took gradients through an integer or string
Integers and strings are not differentiable. If a calculation path uses these data types there will be no gradient.
Nobody expects strings to be differentiable, but it’s easy to accidentally create an int
constant or variable if you don’t specify the dtype
.
x = tf.constant(10)
with tf.GradientTape() as g:
g.watch(x)
y = x * x
print(g.gradient(y, x))
WARNING:tensorflow:The dtype of the watched tensor must be floating (e.g. tf.float32), got tf.int32
None
TensorFlow doesn’t automatically cast between types, so, in practice, you’ll often get a type error instead of a missing gradient.
4. Took gradients through a stateful object
State stops gradients. When you read from a stateful object, the tape can only observe the current state, not the history that lead to it.
A tf.Tensor
is immutable. You can’t change a tensor once it’s created. It has a value, but no state. All the operations discussed so far are also stateless: the output of a tf.matmul
only depends on its inputs.
A tf.Variable
has internal state—its value. When you use the variable, the state is read. It’s normal to calculate a gradient with respect to a variable, but the variable’s state blocks gradient calculations from going farther back. For example:
x0 = tf.Variable(3.0)
x1 = tf.Variable(0.0)
with tf.GradientTape() as tape:
# Update x1 = x1 + x0.
x1.assign_add(x0)
# The tape starts recording from x1.
y = x1**2 # y = (x1 + x0)**2
# This doesn't work.
print(tape.gradient(y, x0)) #dy/dx0 = 2*(x1 + x0)
None
Similarly, tf.data.Dataset
iterators and tf.queue
s are stateful, and will stop all gradients on tensors that pass through them.
No gradient registered
Some tf.Operation
s are registered as being non-differentiable and will return None
. Others have no gradient registered.
The tf.raw_ops
page shows which low-level ops have gradients registered.
If you attempt to take a gradient through a float op that has no gradient registered the tape will throw an error instead of silently returning None
. This way you know something has gone wrong.
For example, the tf.image.adjust_contrast
function wraps raw_ops.AdjustContrastv2
, which could have a gradient but the gradient is not implemented:
image = tf.Variable([[[0.5, 0.0, 0.0]]])
delta = tf.Variable(0.1)
with tf.GradientTape() as tape:
new_image = tf.image.adjust_contrast(image, delta)
try:
print(tape.gradient(new_image, [image, delta]))
assert False # This should not happen.
except LookupError as e:
print(f'{type(e).__name__}: {e}')
LookupError: gradient registry has no entry for: AdjustContrastv2
If you need to differentiate through this op, you’ll either need to implement the gradient and register it (using tf.RegisterGradient
) or re-implement the function using other ops.
Zeros instead of None
In some cases it would be convenient to get 0 instead of None
for unconnected gradients. You can decide what to return when you have unconnected gradients using the unconnected_gradients
argument:
x = tf.Variable([2., 2.])
y = tf.Variable(3.)
with tf.GradientTape() as tape:
z = y**2
print(tape.gradient(z, x, unconnected_gradients=tf.UnconnectedGradients.ZERO))
tf.Tensor([0. 0.], shape=(2,), dtype=float32)
:::info
Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.
:::