A series of developments in the field of computer vision have occurred in recent years. We are receiving very promising results on tasks like picture image recognition or classification, specifically from the introduction of deep Convolutional neural networks.
As a result, to perform such complex tasks and enhance recognition accuracy, researchers have tended to build deeper neural networks by adding more layers over time. To improve accuracy and performance you should stack additional layers. The idea behind adding more layers is that these layers would learn increasingly complicated features as time goes on.
However, when we add more layers to the neural network, it gets more difficult to train them, and their accuracy begins to saturate and ultimately decline.
There were numerous approaches to dealing with the vanishing gradient problem, such as adding extra supervision in the form of auxiliary loss, but none seemed to solve the problem completely.
ResNet neural network comes to the rescue and assists in the resolution of this issue.
Residual Network (ResNet) was first introduced in the paper “Deep Residual Learning for Image Recognition”.
The emergence of ResNet or residual networks, which are made up of Residual Blocks, has relieved the challenge of training very deep networks.
The first thing we notice is that there is a direct connection that bypasses several levels in between. The core of residual blocks is a link known as the ‘skip connection.’
The output of the layer is no longer the same due to this skip connection. Without this skip link, the input ‘x’ is multiplied by the layer’s weights, then a bias term is added.
The activation function, f(), is then applied to this term, and the result is H(x). The output has altered since the introduction of the skip connection.
When the dimensions of the input and output differ, which can happen with convolutional and pooling layers, there appears to be a minor issue with this approach. When the dimensions of f(x) differ from x, we can choose one of two approaches:
According to the authors, stacking layers should not affect network performance because we could just stack identity mappings – a layer that does not affect the current network – and the resulting architecture would perform similarly.
As a result, the ResNet model should not have a bigger training error than its shallower equivalents. They believe that allowing the stacked layers to fit a residual mapping rather than the desired underlying mapping is easier. And the leftover block above expressly authorizes it to do so.
The authors improved the residual block and proposed a pre-activation variation of residual block in which gradients can flow freely through shortcut links to any previous layer.
ResNet’s skip connections alleviate the problem of disappearing gradients in deep neural networks by allowing the gradient to flow through an additional shortcut channel. These connections also aid the model by allowing it to learn the identity functions, ensuring that the higher layer performs at least as well as the lower layer, if not better.
Let’s say we have a thin network and a deep network that use the function H to map an input ‘x’ to an output ‘y’ (x). We want the deep network to perform at least as well as the shallow network, with no performance degradation as we’ve seen with ordinary neural networks – without residual blocks.
One approach to achieve this is for additional layers in a deep network to learn the identity function, and so their output equals inputs, preventing performance degradation even with additional layers.
In the optimistic outcome, extra layers of the deep neural network can more accurately approximate the mapping of input ‘x’ to output ‘y’ than its shallower equivalent, lowering the error by a large margin. As a result, we anticipate ResNet to perform similarly to or better than traditional deep neural networks.