What is RMSProp?

For optimizing the training of neural networks, RMSprop relies on gradients. Backpropagation has its roots in this idea.

As data travels through very complicated functions, such as neural networks, the resulting gradients often disappear or expand. RMSprop is an innovative stochastic mini-batch learning method.

  • RMSprop (Root Mean Squared Propagation) is an optimization algorithm used in deep learning and other Machine Learning techniques.

It is a variant of the gradient descent algorithm that helps to improve the convergence speed and stability of the model training process.

RMSProp algorithm

Like other gradient descent algorithms, RMSprop works by calculating the gradient of the loss function with respect to the model’s parameters and updating the parameters in the opposite direction of the gradient to minimize the loss. However, RMSProp introduces a few additional techniques to improve the performance of the optimization process.

One key feature is its use of a moving average of the squared gradients to scale the learning rate for each parameter. This helps to stabilize the learning process and prevent oscillations in the optimization trajectory.

The algorithm can be summarized by the following RMSProp formula:

v_t = decay_rate * v_{t-1} + (1 - decay_rate) * gradient^2 
parameter = parameter - learning_rate * gradient / (sqrt(v_t) + epsilon)


  • v_t is the moving average of the squared gradients;
  • decay_rate is a hyperparameter that controls the decay rate of the moving average;
  • learning_rate is a hyperparameter that controls the step size of the update;
  • gradient is the gradient of the loss function with respect to the parameter; and
  • epsilon is a small constant added to the denominator to prevent division by zero.

Adam vs RMSProp

RMSProp is often compared to the Adam (Adaptive Moment Estimation) optimization algorithm, another popular optimization method for deep learning. Both algorithms combine elements of momentum and adaptive learning rates to improve the optimization process, but Adam uses a slightly different approach to compute the moving averages and adjust the learning rates. Adam is generally more popular and widely used than the RMSProp optimizer, but both algorithms can be effective in different settings.

Testing. CI/CD. Monitoring.

Because ML systems are more fragile than you think. All based on our open-source core.

Our GithubInstall Open SourceBook a Demo

RMSProp advantages

  • Fast convergence. RMSprop is known for its fast convergence speed, which means that it can find good solutions to optimization problems in fewer iterations than some other algorithms. This can be especially useful for training large or complex models, where training time is a critical concern.
  • Stable learning. The use of a moving average of the squared gradients in RMSprop helps to stabilize the learning process and prevent oscillations in the optimization trajectory. This can make the optimization process more robust and less prone to diverging or getting stuck in local minima.
  • Fewer hyperparameters. RMSprop has fewer hyperparameters than some other optimization algorithms that make it easier to tune and use in practice. The main hyperparameters in RMSprop are the learning rate and the decay rate, which can be chosen using techniques like grid search or random search.
  • Good performance on non-convex problems. RMSprop tends to perform well on non-convex optimization problems, common in Machine Learning and deep learning. Non-convex optimization problems have multiple local minima, and RMSprop’s fast convergence speed and stable learning can help it find good solutions even in these cases.

Overall, RMSprop is a powerful and widely used optimization algorithm that can be effective for training a variety of Machine Learning models, especially deep learning models.