A gradient is the weighted derivative of the loss function. During the backpropagation phase of neural networks, it is used to update the weights to minimize the loss function.
While we know that problems with the gradients are undesirable and should be avoided wherever possible, how can we determine whether a model is experiencing vanishing or exploding gradients?
Vanishing Gradients Problem
During backpropagation, a vanishing gradient happens when the derivative or slope decreases as it travels backward via each layer.
Extremely slow or exponentially slow weight updates cause training times to become unacceptably lengthy and, in the worst case, may cause the neural network to cease training altogether.
Since the sigmoid and tanh activation function derivatives are between zero and twenty-five percent and zero and one respectively, these functions suffer from the vanishing gradient issue.
Explosive Gradients Problem
An exploding gradient happens when the derivatives or slopes get bigger as we go backward with each layer. Contrast this with the case of vanishing gradients.
This issue arises because of the weights, not the activation function.
Vanishing Gradients vs. Exploding Gradients
- Parameters of later layers undergo significant transformations, whereas those of early levels undergo very modest transformations if any at all.
- During training, weights of lower layers may eventually become null.
- Model training often terminates after a small number of repetitions due to the model’s sluggish learning rate.
- Weak modeling performance.
- Model weights may suddenly become NaN in the exploding scenario, in contrast to the vanishing scenario’s gradual decrease in weights over time.
- NaN practices to repair disappearing or bursting gradients lead to model loss as well.