Long Short-Term Memory (LSTM)

What does LSTM mean?

LSTM neural networks (Long Short-Term Memory) are an extension of an RNN (Recurrent Neural Network) that is designed to learn sequence data and its long-term frameworks more accurately than regular RNNs. Simply put, it preserves information. Traditional RNNs have problems with vanishing gradients, whereas these networks do not.

How does LSTM work?

Deep learning long short-term memory blocks are used by the recurrent neural network to offer context for how the program receives inputs and generates outputs. The long short-term memory block is a complex unit that includes weighted inputs, activation functions, prior block inputs, and eventual outputs.

Because the program uses a structure based on short-term memory processes to build longer-term memory, the unit is named a long short-term memory block. These systems are frequently employed in natural language processing, for example.

  • Deep learning tasks such as speech recognition, natural language processing, stock market prediction, and handwriting recognition utilize the LSTM model

The recurrent neural network makes use of long short-term memory blocks to evaluate a single word or phoneme in the context of others in a string, where memory can help filter and categorize these types of inputs. In general, Long Short-term Memory neural network is a well-known and widely used idea in the development of recurrent neural networks.

LSTM Architecture

Cell state and its regulators are the primary components of a traditional LSTM architecture. The network’s memory unit is the cell state. The cell state stores information that can be written to, read from or stored in a previous cell state via open and close gates.

Information from previous steps can also enter the cell state and carry relevant information throughout the sequence’s processing. Analog control gates are comparable with multiplication by a tanh or sigmoid functions used to implement them. These gates function similarly in determining which information is permitted to enter.

Using a recurrent neural network learning process, the gates will pick which data to remember and which to discard throughout training.

A single classic LSTM machine learning model is made up of a cell state and three gates: a forget gate, an input, and an output gate. The LSTM’s special recipe is the gating technique within each cell. A tanh activation function is used to process the input at each time step and the hidden layer from the previous time step to produce a new hidden state and output in a standard RNN cell.


In an essence:

  • The forget gate determines what is important to retain from the previous cell state. In simple terms, it removes information from the cell state that is no longer helpful. It’s the first block represented in the architecture. The sigmoid activation function is used to pass information from the current input and the previous hidden state. The gate receives two inputs – input at a specific time and prior cell output, which are multiplied with weight matrices before bias is added. The result is sent into an activation function, which outputs a binary value. If the output value is close to 0, it means to forget, and if it is close to 1, it means to keep.
  • The input gate decides what piece of information in the current cell state of the LSTM unit should be updated. It serves as a source of information for the cell state. It has two parts: first, we use a sigmoid function to determine which values will be updated based on the prior hidden state and current input. Then, to govern the network, feed the same two inputs. The tanh function produces a vector with values ranging from -1 to +1, containing all possible values. To extract meaningful information, the values of the vector and the controlled values are multiplied.

The output gate determines the current hidden state, which is then passed to the following LSTM unit. The hidden state is utilized for prediction and contains information from prior inputs. The current concealed state is regulated by the output gate. The sigmoid function is given the previous hidden state and the current input. The current hidden state is obtained by multiplying this output with the output of the tanh function. The vector’s values and the regulated values are multiplied and provided as an output and input to the next cell.