If you like what we're working on, please  star us on GitHub. This enables us to continue to give back to the community.

How do I identify a vanishing or exploding gradient problem?

Kayley Marshall
Kayley MarshallAnswered

A gradient is the weighted derivative of the loss function. During the backpropagation phase of neural networks, it is used to update the weights to minimize the loss function.

While we know that problems with the gradients are undesirable and should be avoided wherever possible, how can we determine whether a model is experiencing vanishing or exploding gradients?

Vanishing Gradients Problem

During backpropagation, a vanishing gradient happens when the derivative or slope decreases as it travels backward via each layer.

Extremely slow or exponentially slow weight updates cause training times to become unacceptably lengthy and, in the worst case, may cause the neural network to cease training altogether.

Since the sigmoid and tanh activation function derivatives are between zero and twenty-five percent and zero and one respectively, these functions suffer from the vanishing gradient issue.

Explosive Gradients Problem

An exploding gradient happens when the derivatives or slopes get bigger as we go backward with each layer. Contrast this with the case of vanishing gradients.

This issue arises because of the weights, not the activation function.

Vanishing Gradients vs. Exploding Gradients

Vanishing

  • Parameters of later layers undergo significant transformations, whereas those of early levels undergo very modest transformations if any at all.
  • During training, weights of lower layers may eventually become null.
  • Model training often terminates after a small number of repetitions due to the model’s sluggish learning rate.
  • Weak modeling performance.

Exploding

  • Model weights may suddenly become NaN in the exploding scenario, in contrast to the vanishing scenario’s gradual decrease in weights over time.
  • NaN practices to repair disappearing or bursting gradients lead to model loss as well.
Open source package for ml validation

Build Test Suites for ML Models & Data with Deepchecks

Get StartedOur GithubOur Github

Subscribe to Our Newsletter

Do you want to stay informed? Keep up-to-date with industry news, the latest trends in MLOps, and observability of ML systems.