Updated April 6, 2023

Introduction to PyTorch SoftMax

There are many categorical targets in machine learning algorithms, and the Softmax function helps us to encode the same by working with PyTorch. The neural network’s output is normalized using the Softmax function, where Luce’s choice axiom is used to figure out the probability distribution of output classes so that the activation function works well. A multinomial probability distribution is predicted normally using the Softmax function, which acts as the activation function of the output layers in a neural network.

Popular Course in this category

3DS MAX ARCHITECTURE - Specialization | 4 Course Series | 3 Mock Tests

What is PyTorch Softmax?

Softmax is mostly used in classification problems with different classes where a membership is required to label the classes when more classes are involved. It helps in using any arbitrary values as these values are changed to probabilities and used in Machine Learning as exponentials of the numbers. Any scores or logics are turned into numbers and thus, the probabilities are working with the activation function.

PyTorch Softmax Function

The softmax function is defined as

Softmax(x_i) = exp(x_i)/∑_jexp(x_j)

The elements always lie in the range of [0,1], and the sum must be equal to 1. So the function looks like this.

torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)

The first step is to call torch.softmax() function along with dim argument as stated below.

import torch
a = torch.randn(6, 9, 12)
b = torch.softmax(a, dim=-4)

Dim argument helps to identify which axis Softmax must be used to manage the dimensions. We can also use Softmax with the help of class like given below.

import torch.nn as tornn
sftmx = tornn.Softmax(dim=-4)
b = sftmx(a)

This code is complicated, and hence developers prefer to use this only when Softmax is treated as a single layer for code clarification.

Dimension to use

There are two parameters in Softmax: input and dim. All input should have the Softmax operation when dim is specified, and the sum must be equal to 1.

sum = torch.sum(input, dim = 2)
softmax(input, dim = 2)

A 4d tensor of shape (a1, a2, a3, a4) is transformed into the matrix (a1*a2*a3, a4). Now, if we need the value along the row or column transformed to 1, then Softmax is easy to do it.

softmax(input, dim = 0)
softmax(input, dim = 1)
a1 = nn.Softmax(dim=0)

All the elements along the zeroth coordinate in the tensor are normalized when the input is given. The coordinate is varied along the dimension, and each single element is considered for this normalization. This continues as a loop where the data is collected, and the values are normalized to 1. If the input is one dimensional, Softmax will continue with dimension 0, whereas if the input is 2D, the function will make the normalizations to 1. When the input is three dimensional, the function continues with 0, and when the input is four-dimensional, the function has the value to 1. Now, if the input is 5D, which happens in rare cases, the Softmax function throws an error.

Source Code

import torch
from torch import nn
import function 
import get_ndata_name
__all__ = ['Softmax']
class EdgeSoftmax(nn.Module):
def __init__(self):
        super(EdgeSoftmax, self).__init__()
        self._logits_name = "_logs"
        self._max_logits_name = "_max_logs"
        self._normalizer_name = "_normals"
def forward(self, logs, graph):
        self._logits_name = get_ndata_name(graph, self._logits_name)
        self._max_logits_name = get_ndata_name(graph, self._max_logits_name)
        self._normalizer_name = get_ndata_name(graph, self._normalizer_name)
        graph.edata[self._logits_name] = logs
        graph.update_all(fn.copy_edge(self._logits_name, self._logits_name),
                         fn.max(self._logits_name, self._max_logits_name))
        graph.apply_edges(
            lambda edges: {self._logits_name : th.exp(edges.data[self._logits_name] -
                                                      edges.dst[self._max_logits_name])})
        graph.update_all(fn.copy_edge(self._logits_name, self._logits_name),
                        fn.sum(self._logits_name, self._normalizer_name))
        return graph.edata.pop(self._logits_name), graph.ndata.pop(self._normalizer_name)
    def __repr__(self):
        return 'Softmax()'

Another source code for geometric.utils is given below.

from torch_scatter import scatter_max, scatter_add
from .num_nodes import maybe_num_nodes
def softmax(src, index, nodes=None):
    nodes = maybe_num_nodes(index, nodes)
    result = src - scatter_max(src, index, dim=0, dim_size=num_nodes)[0][index]
    result = result.exp()
    result = result / (
        scatter_add(out, index, dim=0, dim_size= nodes)[index] + 1e-16)
    return result

PyTorch SoftMax example

This example does relation name mapping from dictionaries based on the sentences and numbers using sentence encoders.

def __init__(self, encoder, numbers, rel2id):
        super().__init__()
        self. encoder = encoder
        self.numbers= numbers
        self.fc = nn.Linear(self.encoder.hidden_size, numbers)
        self.softmax = nn.Softmax(-1)
        self.rel2id = rel2id
        self.id2rel = {}
        self.drop = nn.Dropout()
        for rel, id in rel2id.items():
            self.id2rel[id] = rel

We are converting the layers using ReLu and other neural networks.

def __init__(self,block, block_list):
        super(ResNet,self).__init__()
        self.head_conv = nn.Sequential(
            nn.Conv2d(4,32,6,1,4,bias=False),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),)
        self.maxpool_1 = nn.MaxPool2d(5,4,3)
        b_ = block.expansion
        self.layer_1 = self._make_layer(block,32,32*b_,block_list[0],1)
        self.layer_2 = self._make_layer(block,32*b_,64*b_,block_list[1],2)
        self.layer_3 = self._make_layer(block,64*b_,128*b_,block_list[2],2)
        self.layer_4 = self._make_layer(block,128*b_,256*b_,block_list[3],2)
        self.avgpool_1 = nn.AdaptiveAvgPool2d((1,1))
        self.fc_1 = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256*b_,1000),
            nn.Softmax(dim = 1),)
        self._initialization()

We can use detect and modulelist features in the Softmax function.

def __init__(self, phases, sizes, base, extras, head, numbers):
        super(SSD, self).__init__()
        self.phases = phases
        self.numbers = numbers
        if(sizes==150):
            self.cfg = (coco, voc150)[numbers == 11]
        else:
            self.cfg = (coco, voc512)[numbers == 11]
        self.priorbox = PriorBox(self.cfg)
        self.priors = Variable(self.priorbox.forward(), volatile=True)
        self.sizes = sizes

        self.vgg = nn.ModuleList(base)
        self.L2Norm = L2Norm(150, 10)
        self.extras = nn.ModuleList(extras)

        self.loc = nn.ModuleList(head[0])
        self.conf = nn.ModuleList(head[1])

        if phases == 'test':
            self.softmax = nn.Softmax(dim=-1)
            self.detect = Detect(num_classes, 0, 100, 0.01, 0.30)

When we should create independent slot embeddings, we can use the below code in Softmax functions. 
def __init__(self, language, shared_embed, vocab, hidden, drpt, slots, nb_gate):
        super(Generator, self).__init__()
        self.vocab = vocab
        self.language = language
        self.embed = shared_embed
        self.dropout = nn.Dropout(drpt)
        self.gru = nn.GRU(hidden_size, hidden_size, drpt=drpt)
        self.nb_gate = nb_gate
        self.hidden = hidden
        self.W_ratio = nn.Linear(3 * hidden, 1)
        self.softmax = nn.Softmax(dim=1)
        self.sigmoid = nn.Sigmoid()
        self.slots = slots

        self.W_gate = nn.Linear(hidden, nb_gate)

        self.slot_w2i = {}
        for slot in self.slots:
            if slot.split("-")[1] not in self.slot_w2i.keys():
                self.slot_w2i[slot.split("-")[1]] = len(self.slot_w2i)
            if slot.split("-")[2] not in self.slot_w2i.keys():
                self.slot_w2i[slot.split("-")[2]] = len(self.slot_w2i)
        self.Slot_embed = nn.Embedding(len(self.slot_w2i), hidden)
        self.Slot_embed.weight.data.normal_(0, 0.1)

This is an example of Database optimization.

def __init__(self, Nwrd, N_height, N_dpth, max_column_num, max_token_num, gpu):
        super(Seq2SQLCondPredictor, self).__init__()
        print "Seq2SQL where prediction"
        self.N_height = N_height
        self.max_token_num = max_token_num
        self.max_column_num = max_column_num
        self.gpu = gpu

        self.cond_lstm = nn.LSTM(input_size=N_wrd, hidden =N_height/2,
                numbered_layers=N_dpth, batch_first=True,
                dropout=0.3, bidirectional=True)
        self.cond_decoder = nn.LSTM(input_size=self.max_token_num,
                hidden=N_height, numbered_layers=N_dpth,
                batch_first=True, dropout=0.3)

        self.cond_out_g = nn.Linear(N_height, N_height)
        self.cond_out_h = nn.Linear(N_height, N_height)
        self.cond_out = nn.Sequential(nn.Tanh(), nn.Linear(N_height, 1))

        self.softmax = nn.Softmax()

Conclusion

In neural networks, it is difficult to work with several layers in the system, and thus the result will be chaos, and the real values cannot be scored easily. In this case, Softmax really helps to find out the values by making the dimension always equal to one and setting the probabilities.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage

Introduction to PyTorch SoftMax

What is PyTorch Softmax?

PyTorch Softmax Function

Dimension to use

PyTorch SoftMax example

Conclusion

Recommended Articles

Follow us!

APPS

Blog

Courses

Email