Updated April 6, 2023

## Introduction to PyTorch SoftMax

There are many categorical targets in machine learning algorithms, and the Softmax function helps us to encode the same by working with PyTorch. The neural network’s output is normalized using the Softmax function, where Luce’s choice axiom is used to figure out the probability distribution of output classes so that the activation function works well. A multinomial probability distribution is predicted normally using the Softmax function, which acts as the activation function of the output layers in a neural network.

### What is PyTorch Softmax?

Softmax is mostly used in classification problems with different classes where a membership is required to label the classes when more classes are involved. It helps in using any arbitrary values as these values are changed to probabilities and used in Machine Learning as exponentials of the numbers. Any scores or logics are turned into numbers and thus, the probabilities are working with the activation function.

### PyTorch Softmax Function

The softmax function is defined as

_{i}) = exp(x

_{i})/∑

_{j}exp(x

_{j})

The elements always lie in the range of [0,1], and the sum must be equal to 1. So the function looks like this.

`torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)`

The first step is to call torch.softmax() function along with dim argument as stated below.

```
import torch
a = torch.randn(6, 9, 12)
b = torch.softmax(a, dim=-4)
```

Dim argument helps to identify which axis Softmax must be used to manage the dimensions. We can also use Softmax with the help of class like given below.

```
import torch.nn as tornn
sftmx = tornn.Softmax(dim=-4)
b = sftmx(a)
```

This code is complicated, and hence developers prefer to use this only when Softmax is treated as a single layer for code clarification.

### Dimension to use

There are two parameters in Softmax: input and dim. All input should have the Softmax operation when dim is specified, and the sum must be equal to 1.

```
sum = torch.sum(input, dim = 2)
softmax(input, dim = 2)
```

A 4d tensor of shape (a1, a2, a3, a4) is transformed into the matrix (a1*a2*a3, a4). Now, if we need the value along the row or column transformed to 1, then Softmax is easy to do it.

```
softmax(input, dim = 0)
softmax(input, dim = 1)
a1 = nn.Softmax(dim=0)
```

All the elements along the zeroth coordinate in the tensor are normalized when the input is given. The coordinate is varied along the dimension, and each single element is considered for this normalization. This continues as a loop where the data is collected, and the values are normalized to 1. If the input is one dimensional, Softmax will continue with dimension 0, whereas if the input is 2D, the function will make the normalizations to 1. When the input is three dimensional, the function continues with 0, and when the input is four-dimensional, the function has the value to 1. Now, if the input is 5D, which happens in rare cases, the Softmax function throws an error.

**Source Code**

```
import torch
from torch import nn
import function
import get_ndata_name
__all__ = ['Softmax']
class EdgeSoftmax(nn.Module):
def __init__(self):
super(EdgeSoftmax, self).__init__()
self._logits_name = "_logs"
self._max_logits_name = "_max_logs"
self._normalizer_name = "_normals"
def forward(self, logs, graph):
self._logits_name = get_ndata_name(graph, self._logits_name)
self._max_logits_name = get_ndata_name(graph, self._max_logits_name)
self._normalizer_name = get_ndata_name(graph, self._normalizer_name)
graph.edata[self._logits_name] = logs
graph.update_all(fn.copy_edge(self._logits_name, self._logits_name),
fn.max(self._logits_name, self._max_logits_name))
graph.apply_edges(
lambda edges: {self._logits_name : th.exp(edges.data[self._logits_name] -
edges.dst[self._max_logits_name])})
graph.update_all(fn.copy_edge(self._logits_name, self._logits_name),
fn.sum(self._logits_name, self._normalizer_name))
return graph.edata.pop(self._logits_name), graph.ndata.pop(self._normalizer_name)
def __repr__(self):
return 'Softmax()'
```

Another source code for geometric.utils is given below.

```
from torch_scatter import scatter_max, scatter_add
from .num_nodes import maybe_num_nodes
def softmax(src, index, nodes=None):
nodes = maybe_num_nodes(index, nodes)
result = src - scatter_max(src, index, dim=0, dim_size=num_nodes)[0][index]
result = result.exp()
result = result / (
scatter_add(out, index, dim=0, dim_size= nodes)[index] + 1e-16)
return result
```

### PyTorch SoftMax example

This example does relation name mapping from dictionaries based on the sentences and numbers using sentence encoders.

```
def __init__(self, encoder, numbers, rel2id):
super().__init__()
self. encoder = encoder
self.numbers= numbers
self.fc = nn.Linear(self.encoder.hidden_size, numbers)
self.softmax = nn.Softmax(-1)
self.rel2id = rel2id
self.id2rel = {}
self.drop = nn.Dropout()
for rel, id in rel2id.items():
self.id2rel[id] = rel
```

We are converting the layers using ReLu and other neural networks.

```
def __init__(self,block, block_list):
super(ResNet,self).__init__()
self.head_conv = nn.Sequential(
nn.Conv2d(4,32,6,1,4,bias=False),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),)
self.maxpool_1 = nn.MaxPool2d(5,4,3)
b_ = block.expansion
self.layer_1 = self._make_layer(block,32,32*b_,block_list[0],1)
self.layer_2 = self._make_layer(block,32*b_,64*b_,block_list[1],2)
self.layer_3 = self._make_layer(block,64*b_,128*b_,block_list[2],2)
self.layer_4 = self._make_layer(block,128*b_,256*b_,block_list[3],2)
self.avgpool_1 = nn.AdaptiveAvgPool2d((1,1))
self.fc_1 = nn.Sequential(
nn.Flatten(),
nn.Linear(256*b_,1000),
nn.Softmax(dim = 1),)
self._initialization()
```

We can use detect and modulelist features in the Softmax function.

```
def __init__(self, phases, sizes, base, extras, head, numbers):
super(SSD, self).__init__()
self.phases = phases
self.numbers = numbers
if(sizes==150):
self.cfg = (coco, voc150)[numbers == 11]
else:
self.cfg = (coco, voc512)[numbers == 11]
self.priorbox = PriorBox(self.cfg)
self.priors = Variable(self.priorbox.forward(), volatile=True)
self.sizes = sizes
self.vgg = nn.ModuleList(base)
self.L2Norm = L2Norm(150, 10)
self.extras = nn.ModuleList(extras)
self.loc = nn.ModuleList(head[0])
self.conf = nn.ModuleList(head[1])
if phases == 'test':
self.softmax = nn.Softmax(dim=-1)
self.detect = Detect(num_classes, 0, 100, 0.01, 0.30)
When we should create independent slot embeddings, we can use the below code in Softmax functions.
def __init__(self, language, shared_embed, vocab, hidden, drpt, slots, nb_gate):
super(Generator, self).__init__()
self.vocab = vocab
self.language = language
self.embed = shared_embed
self.dropout = nn.Dropout(drpt)
self.gru = nn.GRU(hidden_size, hidden_size, drpt=drpt)
self.nb_gate = nb_gate
self.hidden = hidden
self.W_ratio = nn.Linear(3 * hidden, 1)
self.softmax = nn.Softmax(dim=1)
self.sigmoid = nn.Sigmoid()
self.slots = slots
self.W_gate = nn.Linear(hidden, nb_gate)
self.slot_w2i = {}
for slot in self.slots:
if slot.split("-")[1] not in self.slot_w2i.keys():
self.slot_w2i[slot.split("-")[1]] = len(self.slot_w2i)
if slot.split("-")[2] not in self.slot_w2i.keys():
self.slot_w2i[slot.split("-")[2]] = len(self.slot_w2i)
self.Slot_embed = nn.Embedding(len(self.slot_w2i), hidden)
self.Slot_embed.weight.data.normal_(0, 0.1)
```

This is an example of Database optimization.

```
def __init__(self, Nwrd, N_height, N_dpth, max_column_num, max_token_num, gpu):
super(Seq2SQLCondPredictor, self).__init__()
print "Seq2SQL where prediction"
self.N_height = N_height
self.max_token_num = max_token_num
self.max_column_num = max_column_num
self.gpu = gpu
self.cond_lstm = nn.LSTM(input_size=N_wrd, hidden =N_height/2,
numbered_layers=N_dpth, batch_first=True,
dropout=0.3, bidirectional=True)
self.cond_decoder = nn.LSTM(input_size=self.max_token_num,
hidden=N_height, numbered_layers=N_dpth,
batch_first=True, dropout=0.3)
self.cond_out_g = nn.Linear(N_height, N_height)
self.cond_out_h = nn.Linear(N_height, N_height)
self.cond_out = nn.Sequential(nn.Tanh(), nn.Linear(N_height, 1))
self.softmax = nn.Softmax()
```

### Conclusion

In neural networks, it is difficult to work with several layers in the system, and thus the result will be chaos, and the real values cannot be scored easily. In this case, Softmax really helps to find out the values by making the dimension always equal to one and setting the probabilities.

### Recommended Articles

We hope that this EDUCBA information on “PyTorch SoftMax” was beneficial to you. You can view EDUCBA’s recommended articles for more information.