Deep learning is one of the most creative and intriguing topics in artificial intelligence, along with NLP, computer vision, and reinforcement learning. Deep learning has a sophisticated architecture that presents a number of difficulties. These deep neural networks contain several layers and are challenging to train because they are sensitive to the initial random weights and setup of the learning algorithm.
This phenomenon in a deep neural network is known as Internal Covariate Shift. This shift happens in the probability model to the network’s layers as a result of the constantly changing system parameters during training.
Due to their enormous numerical values, certain aspects of the input layer’s characteristics may dominate the operation. This leads to a bias in the network since only certain characteristics contribute to the training’s output. Imagine the first character has values between 1 and 5, whereas the second’s ranges from 100 to 10000. Because of the stark difference in magnitude, the second values would predominate the network during training and would be the only one contributing to the output of the model.
The idea of normalization was developed to address these problems.
Normalization is a technique in data processing that adapts the values of numerical columns in a dataset to a comparable scale when the data features have varied ranges. Normalization benefits include:
- Internal covariate shift reduction to enhance training;
- Tuning each feature to a comparable range to avoid or decrease network bias;
- Accelerating the optimization by restricting weights to a restricted range and preventing them from erupting all over the board; and
- Reduce network overfitting by facilitating regularization.
Here are some variations of normalization produced throughout the years:
- Batch Standardization
- Normalisation de la couche
- Instance Normalization
- Group Standardization