EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 600+ Courses All in One Bundle
  • Login
Home Software Development Software Development Tutorials PyTorch Tutorial PyTorch DataLoader
Secondary Sidebar
PyTorch Tutorial
  • PyTorch
    • PyTorch Image Classification
    • PyTorch Random
    • PyTorch Variable
    • PyTorch Activation Function
    • Python Formatted String
    • PyTorch GPU
    • PyTorch CUDA
    • PyTorch DataLoader
    • PyTorch LSTM
    • PyTorch Pad
    • PyTorch OpenCL
    • PyTorch Lightning
    • PyTorch SoftMax
    • PyTorch Flatten
    • PyTorch gan
    • PyTorch max
    • PyTorch pip
    • PyTorch Parameter
    • PyTorch Load Model
    • PyTorch Distributed
    • PyTorch BERT
    • PyTorch interpolate
    • PyTorch JIT
    • PyTorch expand
    • PyTorch AMD
    • PyTorch GRU
    • PyTorch rnn
    • PyTorch permute
    • PyTorch argmax
    • PyTorch SGD
    • PyTorch nn
    • PyTorch One Hot Encoding
    • PyTorch Tensors
    • What is PyTorch?
    • PyTorch MSELoss()
    • PyTorch NLLLOSS
    • PyTorch MaxPool2d
    • PyTorch Pretrained Models
    • PyTorch Squeeze
    • PyTorch Reinforcement Learning
    • PyTorch zero_grad
    • PyTorch norm
    • PyTorch VAE
    • PyTorch Early Stopping
    • PyTorch requires_grad
    • PyTorch MNIST
    • PyTorch Conv2d
    • Dataset Pytorch
    • PyTorch tanh
    • PyTorch bmm
    • PyTorch profiler
    • PyTorch unsqueeze
    • PyTorch adam
    • PyTorch backward
    • PyTorch concatenate
    • PyTorch Embedding
    • PyTorch Tensor to NumPy
    • PyTorch Normalize
    • PyTorch ReLU
    • PyTorch Autograd
    • PyTorch Transpose
    • PyTorch Object Detection
    • PyTorch Autoencoder
    • PyTorch Loss
    • PyTorch repeat
    • PyTorch gather
    • PyTorch sequential
    • PyTorch U-NET
    • PyTorch Sigmoid
    • PyTorch Neural Network
    • PyTorch Quantization
    • PyTorch Ignite
    • PyTorch Versions
    • PyTorch TensorBoard
    • PyTorch Dropout
    • PyTorch Model
    • PyTorch optimizer
    • PyTorch ResNet
    • PyTorch CNN
    • PyTorch Detach
    • Single Layer Perceptron
    • PyTorch vs Keras
    • torch.nn Module

PyTorch DataLoader

Introduction to PyTorch DataLoader

The data loading process is done in a parallel mode where collecting the batch details is carried out automatically with the help of PyTorch, which is called PyTorch DataLoader. This helps in doing the data loading process faster than ever with less memory in place. DataLoader has both dataset and sampler within itself so that an iterable can be formed in the dataset. We can do single loading or multi-process loading based on the amount of data and the speed required for the process and can be combined with map-style or iterable-style of the datasets where the loading order can be customized.

PyTorch DataLoader

What is PyTorch DataLoader?

We can load batched or non-batched data where the data is batched automatically. We have map-style datasets where __getitem()__ and __len()__ protocol must be implemented that represents a map from either index or data samples and make it look like a protocol for users. This map–style datasets look for an IDX label in the dataset and read it from the disk for the users. Now, iterable – style dataset represents __iter()__ protocol in the dataset where it looks for an iterable size of data in the samples. We can call these datasets iter(dataset), and this can be fetched from any disks or folders as map-style datasets.

Data loading order is different here, and users can customize the same depending on the usage. Hence dynamic batch size can be used here along with the batch reading of files. Any sampler can be used to represent the datasets and the commonly used sampler is the stochastic gradient descent sampler. A shuffle argument can be used to shuffle the order or do the same in a sequential manner.

All in One Software Development Bundle(600+ Courses, 50+ projects)
Python TutorialC SharpJavaJavaScript
C Plus PlusSoftware TestingSQLKali Linux
Price
View Courses
600+ Online Courses | 50+ projects | 3000+ Hours | Verifiable Certificates | Lifetime Access
4.6 (86,560 ratings)

Complete guide PyTorch DataLoader

If the data to be loaded is unstructured, we should be careful in using proper libraries for loading the same. DataLoader helps in loading and iterating the data, whatever the data might be. This makes everyone to use DataLoader in PyTorch. The first step is to import DataLoader from utilities.

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

from torch.utils.data import DataLoader

We should give the name of the dataset, batch size, and several other functions as given below.

DataLoader(
    dataset,
    batch_size=1,
    shuffle=False,
    num_workers=0,
    collate_fn=None,
    pin_memory=False,
 )

The total number of training samples is represented by batch_size, and the shuffle parameter helps us know whether the data is shuffled based on batches. Multiprocessing can be made by making num_workers any value more than 0. This gives the number of multiprocessing tasks in the dataset, and hence the user can work accordingly in the dataset.

Num_workers = 0 makes us understand that the process is the main process, and if it is 1, then the process is having another process running, and it can be slow. When we use map-styled datasets, it is important to use the argument collate_fn, which helps us know whether data merging is happening. If we want to load data as CUDA tensors, we can use the pin_memory argument in the command and set the same to true. This helps to load the data as copies of tensors.

Custom Text Dataset Class

We can create custom datasets using PyTorch.

import torch
from torch.utils.data import Dataset, DataLoader

We can create datasets based on our choice in any class. Here the class is called textdataset.

class TextDataset(Dataset):
    def __init__(self, content, titles):
        self.titles = titles
        self.content = content

Two variables are needed in the dataset. Here they are titles and content. These variables should be used as functions as given above. Thus, we can measure the length of the dataset and know the index of the dataset as well.

def __len__(self):
        return len(self.titles)
def __getitem__(self, idx):
        label = self.titles[idx]
        text = self.content[idx]
        sample = {"Text": content, "Class": title}
        return sample

Collection of sample and then dataset construction is done in the above code.

content = ['India', 'China', 'SriLanka', 'Nepal', 'Afghanistan']
titles = ['Peninsula', 'Country', 'Island', 'Country', 'Country']

Now we are initializing the data we have created using the class.

TextData = TextDataset(text_titles_df['Content'],
text_titles_df['Titles'])

The data is ready for use with all the given details.

How to create a PyTorch DataLoader?

You should create a dataset class in the code like below.

class Mynewdata(T.utils.data.Dataset):
# code should be written here to load data
my_datas = Mynewdata ("my_train_data.txt")
my_loadr = torch.utils.data.DataLoader(my_datas, 10, True)
for (idx, batch) in enumerate(my_loadr):

Here we are using a batch size as 10 for data in any order.

import numpy as np
import torch as T
item = T.device("cpu")
class mypeople(T.utils.data.Dataset):
  def __init__(self, src_file, num_rows=None):
    self.x_data = T.tensor(x_tmp,
      dtype=T.float32).to(item)
    self.y_data = T.tensor(y_tmp,
      dtype=T.long).to(item)
  def __len__(self):
    return len(self.x_data) 
  def __getitem__(self, idx):
    if T.is_tensor(idx):
      idx = idx.tolist()
    preds = self.x_data[idx, 0:7]
    pol = self.y_data[idx]
    sample = \
      { 'predictors' : preds, 'political' : pol }
    return sample
if __name__ == "__main__":
  main()

Now, the dataset and dataloader must be created using code.

train_file = ".\\people_train.txt"
train_datas = mypeople(train_file, num_rows=8)
bat_size = 3
train_loadr = T.utils.data.DataLoader(train_datas,
  batch_size=bat_size, shuffle=True)

PyTorch DataLoader  examples

1. We can use built-in datasets for PyTorch DataLoader. The MNIST dataset is considered here, where data normalization is done as there are digits. Iter function can be used to download the images and use it for further processing.

transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])
trainerset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True) dataiter = iter(trainloader)
images, labels = dataiter.next()
print(images.shape)
print(labels.shape)
plt.imshow(images[1].numpy().squeeze(), cmap='Greys_r')

2. We can also use custom datasets for DataLoader, where random numbers are selected in the dataset and loaded in the DataLoader.

from torch.utils.data import DataLoader
loader = DataLoader(dataset,batch_size=15, shuffle=True, num_workers=5 )
for i, batch in enumerate(loader):
        print(i, batch)

Batches are divided into 15, and workers are assigned as 5 numbers. This provides us with the output for the dataset in 15 batches.

Conclusion

DataLoader helps in arranging the data well and hence by making all the data to be analyzed easily using PyTorch. Moreover, custom datasets can be created easily, and it is always advised to go with custom datasets as we can manipulate the data based on our requirements. These are the fundamentals for using DataLoader in PyTorch.

Recommended Articles

This is a guide to PyTorch DataLoader. Here we discuss How to create a PyTorch DataLoader along with the examples. You may also have a look at the following articles to learn more –

  1. Python Raw String
  2. PySpark GroupBy Agg
  3. exec Python
  4. How to Install Python on Linux
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Java Tutorials
  • Python Tutorials
  • All Tutorials
Certification Courses
  • All Courses
  • Software Development Course - All in One Bundle
  • Become a Python Developer
  • Java Course
  • Become a Selenium Automation Tester
  • Become an IoT Developer
  • ASP.NET Course
  • VB.NET Course
  • PHP Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Software Development Course

C# Programming, Conditional Constructs, Loops, Arrays, OOPS Concept

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more