What is Convolutional Neural Network?
A convolutional neural network (CNN) is a kind of deep learning network that uses a sequence of convolutional and pooling layers to extract information from an image. These characteristics are then fed into one or more fully connected layers, where a prediction or classification is made.
- A CNN is a popular choice for computer vision and image processing tasks.
In a CNN, an image’s convolutional layers are used to extract features like edges, textures, and sections of objects by applying a convolution operation to the input picture using a set of learnable filters (sometimes called kernels or weights). The filters are slid over the input picture, and a feature map is generated by calculating the dot product of the filter and the region of the image beneath it at each location. Next, the feature maps are fed into pooling layers that perform a max operation to lower the spatial dimensionality of the feature maps and make the CNN deep learning more resilient to tiny translations of the input picture.
Following feature extraction, the features are integrated into one or more fully linked layers before producing a prediction or classification.
The architecture of CNN
A CNN architecture might change depending on the issue or job it’s meant to address. A standard CNN model layout, however, involves the following components:
- The first layer, called the input layer, of a CNN receives the input picture. As a rule, the input picture undergoes some kind of preprocessing to better suit the network’s needs.
- The convolutional layers are the brains of a CNN, where features are extracted from an input picture by applying the convolution operation using learnable filters (also called kernels or weights). These filters, which are learned during training, are generally tiny and square in form and are used to generate a feature map by sliding over the whole input picture. More complicated characteristics may be extracted from an image by stacking many convolutional layers, each generating its own feature map.
- To make the CNN more resistant to minor translations of the input picture, pooling layers are used to decrease the spatial dimensionality of the feature maps. Average pooling, max pooling, or Lp pooling are only some of the most typical pooling operations.
- Normalization layers bring values of feature maps into a predetermined range, either by batch normalization or local response normalization.
- Fully linked layers take the characteristics from the previous layers and use them to produce a prediction or classification.
- The predictions and classifications are created in the CNN’s last layer, the output layer. Depending on the task at hand, it may be equipped with an activation function like softmax for multi-class classification issues, or a sigmoid for binary classification problems.
Overfitting may be avoided by using regularization methods such as dropout, L1, L2, weight decay, and others on several layers.
Keep in mind that this is just a basic architecture and there are many variations depending on the task and the problem, including:
- RNN, LSTM: architectures that explicitly model sequences,
- ResNet: architectures with residual connections,
- Transformer: architectures with attention mechanisms (Transformer),
- YOLO, Faster R-CNN: architectures for specific tasks like object detection, and
- BERT: architectures for language models.
Parameters and CNN
The following information is required to determine the size of a CNN’s machine-learning parameter set:
- Whether or not each convolutional layer employs the same amount of filters.
- Every convolutional layer has its own set of filters whose size is determined by the kernel size.
- Stride employed in each convolutional layer.
- The number of neurons in the most densely linked layers of the network.
You can learn how to calculate parameters for the convolutional neural network using this data.
In the case of a single convolution layer:
- Parameters = (filter size * filter size * number of input channels + 1) * number of filters
And for a solitary, well-connected layer:
- Parameter count = (input count + 1) * output count.