Introduction to PyTorch LSTM

An artificial recurrent neural network in deep learning where time series data is used for classification, processing, and making predictions of the future so that the lags of time series can be avoided is called LSTM or long short-term memory in PyTorch. This is mostly used for predicting the sequence of events for time-bound activities in speech recognition, machine translation, etc. Initially, the text data should be preprocessed where it gets consumed by the neural network, and the network tags the activities. Then, the text must be converted to vectors as LSTM takes only vector inputs.

What is PyTorch LSTM?

It is important to know about Recurrent Neural Networks before working in LSTM. RNN remembers the previous output and connects it with the current sequence so that the data flows sequentially. LSTM remembers a long sequence of output data, unlike RNN, as it uses the memory gating mechanism for the flow of data. Therefore, it is important to remove non-lettering characters from the data for cleaning up the data, and more layers must be added to increase the model capacity. It must be noted that the datasets must be divided into training, testing, and validation datasets. And checkpoints help us to manage the data without training the model always.

How to work with PyTorch LSTM?

First, we should create a new folder to store all the code being used in LSTM.

$ mkdir code-input

Create a LSTM model inside the directory.

import torch
from torch import nn
class Rods(nn.Module):
    def __init__(self, dataset):
        super(Rods, self).__init__()
        self.lstm_size = 128
        self.embedding_dim = 128
        self.num_layers = 3
        n_vocab = len(dataset.uniq_words)
        self.embedding = nn.Embedding(
            num_embeddings=n_vocab,
            embedding_dim=self.embedding_dim,
        )
        self.lstm = nn.LSTM(
            input_size=self.lstm_size,
            hidden_size=self.lstm_size,
            num_layers=self.num_layers,
            dropout=0.2,
        )
        self.fc = nn.Linear(self.lstm_size, n_vocab)
    def forward(self, x, prev_state):
        embed = self.embedding(x)
        output, state = self.lstm(embed, prev_state)
        logits = self.fc(output)
        return logits, state
    def init_state(self, sequence_length):
        return (torch.zeros(self.num_layers, sequence_length, self.lstm_size),
                torch.zeros(self.num_layers, sequence_length, self.lstm_size))

Word indexes are converted to word vectors using embedded models. Here LSTM carries the data from one segment to another, keeping the sequence moving and generating the data.

Types of Sequential Data with examples

Sequence data is mostly used to measure any activity based on time. For example, how stocks rise over time or how customer purchases from supermarkets based on their age, and so on. Various values are arranged in an organized fashion, and we can collect data faster. First, we have strings as sequential data that are immutable sequences of unicode points. Next are the lists those are mutable sequences where we can collect data of various similar items. Tuples again are immutable sequences where data is stored in a heterogeneous fashion. Next is a range representing numbers and bytearray objects where bytearray and common bytes are stored.

String:

string = "Hello World!"

Lists:

Continents = ['Asia', 'Africa', 'North America', 'South America', 'Antarctica', 'Europe', 'Australia']

Tuple:

Numbers = (12, 15, 17, 18, 20)

Bytes:

a = "Doppleganger"
a_bytes = a.encode('utf-8')

Time series is considered as special sequential data where the values are noted based on time. We have univariate and multivariate time series data. Univariate represents stock prices, temperature, ECG curves, etc., while multivariate represents video data or various sensor readings from different authorities. Stock price or the weather is the best example of Time series data.

Importance of PyTorch LSTM

LSTM is an improved version of RNN where we have one to one and one-to-many neural networks. The problems are that they have fixed input lengths, and the data sequence is not stored in the network. Also, the parameters of data cannot be shared among various sequences. We can get the same input length when the inputs mainly deal with numbers, but it is difficult when it comes to strings. Hence, it is difficult to handle sequential data with neural networks.

Recurrent neural networks solve some of the issues by collecting the data from both directions and feeding it to the network. But here, we have the problem of gradients which can be solved mostly with the help of LSTM. There are gated gradient units in LSTM that help to solve the RNN issues of gradients and sequential data, and hence users are happy to use LSTM in PyTorch instead of RNN or traditional neural networks.

Two main problems help

LSTM helps to solve two main issues of RNN, such as vanishing gradient and exploding gradient. When the values in the repeating gradient is less than one, a vanishing gradient occurs. When computations happen repeatedly, the values tend to become smaller. This is also called long-term dependency, where the values are not remembered by RNN when the sequence is long. Here LSTM helps in the manner of forgetting the irrelevant details, doing calculations to store the data based on the relevant information, self-loop weight and git must be used to store information, and output gate is used to fetch the output values from the data.

Exploding gradients occur when the values in the gradient are greater than one. Gradient clipping can be used here to make the values smaller and work along with other gradient values. Self-looping in LSTM helps gradient to flow for a long time, thus helping in gradient clipping. The scaling can be changed in LSTM so that the inputs can be arranged based on time.

PyTorch LSTM Example

Code:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)
lstm = nn.LSTM(3, 3)  
inputs = [torch.randn(1, 4) for _ in range(6)]  
hidden = (torch.randn(1, 1, 4),
          torch.randn(1, 1, 4))
for i in inputs:
    out, hidden = lstm(i.view(1, 1, -1), hidden)
inputs = torch.cat(inputs).view(len(inputs), 1, -1)
hidden = (torch.randn(1, 1, 4), torch.randn(1, 1, 4))  
out, hidden = lstm(inputs, hidden)
print(out)
print(hidden)

Conclusion

It is important to know the working of RNN and LSTM even if the usage of both is less due to the upcoming developments in transformers and attention-based models. Gating mechanisms are essential in LSTM so that they store the data for a long time based on the relevance in data usage.