## Introduction

As a mathematical method for enhancing the accuracy of predictions in data mining and machine learning, backpropagation (backward propagation) is essential.

**Backpropagation is a technique for swiftly calculating derivatives.**

While learning, backpropagation in machine learning is used to compute the gradient descent with regard to weights in artificial neural networks. We compare desired outputs with actual system outputs and then optimize the systems by modifying connection weights to minimize the difference. The backpropagation learning algorithm’s name comes from the fact that the weights are modified backward, from output to input.

Because it was difficult to understand how adjusting weights and biases affected the overall behavior of an artificial neural network, neural network applications were not widely used until the early 2000s when computers offered the essential insight. They have been used in a variety of artificial intelligence (AI) applications, including **natural language processing**, **optical character recognition**, and **image processing**.

When calculating the loss function gradient, backpropagation requires that each input value be associated with a known, intended outcome. This makes backpropagation an example of an unsupervised machine learning technique.

The machine learning backpropagation method has evolved as a key aspect of machine learning applications that use predictive analytics, alongside classifiers such as Naive Bayesian filters and decision trees.

## Backpropagation through time

**Backpropagation works by expressing a complete neural network as a function of a function**

If the neural network can’t be reduced to a single expression of compounded functions, or if it can’t be expressed as a directed acyclic graph, this method can’t be used.

The output of a node at one point in time is transmitted back into the network at the next time point in a recurrent neural network, which processes an incoming time series. Because a recurrent neural network has cycles, it cannot be described as a directed acyclic graph.

A recurrent neural network can, however, be ‘unrolled’ and viewed as a feedforward neural network. A single copy of the original neural network is used to represent each timestep.

Backpropagation across time is achieved by unrolling a recurrent neural network and representing it as a feedforward neural network.

Because backpropagation through time entails replicating the network, it might result in a massive feedforward neural network that is difficult to train, with many chances for the backpropagation algorithm to become trapped in local optima.

Furthermore, interactions between inputs that are separated in time can be difficult for the network to learn since the gradient contributions from the interaction are tiny in comparison to local effects. The vanishing gradient problem can be solved by selecting ReLU activation functions and incorporating regularization into the network.

## Implementation of Backpropagation

When it comes to neural networks, backpropagation is commonly utilized to train practically all types of neural networks. This has contributed to deep learning’s recent rise in popularity. Below are a few examples of how backpropagation can be used.

**Several supervised learning techniques use the backpropagation algorithms in neural networks.**

## Face recognition

As a common deep learning technique for image processing and image recognition, convolutional neural networks use the backpropagation algorithm to train their neural networks.

Face recognition was discussed in detail by Parkhi, Vidaldi, and Zisserman in 2015. They employed an 18-layer convolutional neural network and a celebrity face database.

Backpropagation was used to train the network overall 18 layers at first. Images were fed into the network in batches, the loss function was determined, and gradients were calculated for layer 18 first, then for layer 1, and so on. The network weights were modified after each batch of photos.

After running backpropagation for all 18 layers, the researchers ran an extra training step for layer 18 solely to enhance the network to be able to recognize the intricacies of human faces. A loss function known as a **triplet loss** was chosen by the researchers.

An image of Matt Damon and Brad Pitt is fed into the neural network at the same time, thus the neural network may learn from three photos at once. There’s a penalty in the loss function for the network when it concludes that two photographs of the same person are different, as well as for classifying images of different persons as similar. In the long run, triplets of three images are transmitted through the network, and the loss function is calculated, and the weights of the last layer are updated, as well.

## Speech recognition

For speech recognition, backpropagation has been used in a variety of ways. In Japan, the Sony Corporation has produced an example implementation of a speech recognition system for English and Japanese, which can run on embedded devices. Users can only issue a limited number of commands to the system.

Using a Fast Fourier Transform, an incoming sound stream is divided into time frames. As a feature, the sound strength at different frequencies is fed into a five-layer neural network.

Researchers chose a loss function with softmax cross-entropy to train their five layers on Japanese commands by using backpropagation. Later, they were able to switch the network’s training to English voice recordings and adjust the system to detect orders in English as a result. Using this method, a machine-learning algorithm may be taught to perform a certain task, and then be taught and adjusted to perform a different one.