EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

PyTorch DataLoader

Home » Software Development » Software Development Tutorials » Python Tutorial » PyTorch DataLoader

PyTorch DataLoader

Introduction to PyTorch DataLoader

The data loading process is done in a parallel mode where collecting the batch details is carried out automatically with the help of PyTorch, which is called PyTorch DataLoader. This helps in doing the data loading process faster than ever with less memory in place. DataLoader has both dataset and sampler within itself so that an iterable can be formed in the dataset. We can do single loading or multi-process loading based on the amount of data and the speed required for the process and can be combined with map-style or iterable-style of the datasets where the loading order can be customized.

What is PyTorch DataLoader?

We can load batched or non-batched data where the data is batched automatically. We have map-style datasets where __getitem()__ and __len()__ protocol must be implemented that represents a map from either index or data samples and make it look like a protocol for users. This map–style datasets look for an IDX label in the dataset and read it from the disk for the users. Now, iterable – style dataset represents __iter()__ protocol in the dataset where it looks for an iterable size of data in the samples. We can call these datasets iter(dataset), and this can be fetched from any disks or folders as map-style datasets.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Data loading order is different here, and users can customize the same depending on the usage. Hence dynamic batch size can be used here along with the batch reading of files. Any sampler can be used to represent the datasets and the commonly used sampler is the stochastic gradient descent sampler. A shuffle argument can be used to shuffle the order or do the same in a sequential manner.

Complete guide PyTorch DataLoader

If the data to be loaded is unstructured, we should be careful in using proper libraries for loading the same. DataLoader helps in loading and iterating the data, whatever the data might be. This makes everyone to use DataLoader in PyTorch. The first step is to import DataLoader from utilities.

from torch.utils.data import DataLoader

We should give the name of the dataset, batch size, and several other functions as given below.

DataLoader(
dataset,
batch_size=1,
shuffle=False,
num_workers=0,
collate_fn=None,
pin_memory=False,
)

The total number of training samples is represented by batch_size, and the shuffle parameter helps us know whether the data is shuffled based on batches. Multiprocessing can be made by making num_workers any value more than 0. This gives the number of multiprocessing tasks in the dataset, and hence the user can work accordingly in the dataset.

Popular Course in this category
Sale
Python Training Program (40 Courses, 13+ Projects)40 Online Courses | 13 Hands-on Projects | 215+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.8 (14,251 ratings)
Course Price

View Course

Related Courses
Programming Languages Training (41 Courses, 13+ Projects, 4 Quizzes)Angular JS Training Program (9 Courses, 7 Projects)

Num_workers = 0 makes us understand that the process is the main process, and if it is 1, then the process is having another process running, and it can be slow. When we use map-styled datasets, it is important to use the argument collate_fn, which helps us know whether data merging is happening. If we want to load data as CUDA tensors, we can use the pin_memory argument in the command and set the same to true. This helps to load the data as copies of tensors.

Custom Text Dataset Class

We can create custom datasets using PyTorch.

import torch
from torch.utils.data import Dataset, DataLoader

We can create datasets based on our choice in any class. Here the class is called textdataset.

class TextDataset(Dataset):
def __init__(self, content, titles):
self.titles = titles
self.content = content

Two variables are needed in the dataset. Here they are titles and content. These variables should be used as functions as given above. Thus, we can measure the length of the dataset and know the index of the dataset as well.

def __len__(self):
return len(self.titles)
def __getitem__(self, idx):
label = self.titles[idx] text = self.content[idx] sample = {"Text": content, "Class": title}
return sample

Collection of sample and then dataset construction is done in the above code.

content = ['India', 'China', 'SriLanka', 'Nepal', 'Afghanistan'] titles = ['Peninsula', 'Country', 'Island', 'Country', 'Country'] Now we are initializing the data we have created using the class.
TextData = TextDataset(text_titles_df['Content'],
text_titles_df['Titles'])

The data is ready for use with all the given details.

How to create a PyTorch DataLoader?

You should create a dataset class in the code like below.

class Mynewdata(T.utils.data.Dataset):
# code should be written here to load data
my_datas = Mynewdata ("my_train_data.txt")
my_loadr = torch.utils.data.DataLoader(my_datas, 10, True)
for (idx, batch) in enumerate(my_loadr):

Here we are using a batch size as 10 for data in any order.

import numpy as np
import torch as T
item = T.device("cpu")
class mypeople(T.utils.data.Dataset):
def __init__(self, src_file, num_rows=None):
self.x_data = T.tensor(x_tmp,
dtype=T.float32).to(item)
self.y_data = T.tensor(y_tmp,
dtype=T.long).to(item)
def __len__(self):
return len(self.x_data)
def __getitem__(self, idx):
if T.is_tensor(idx):
idx = idx.tolist()
preds = self.x_data[idx, 0:7] pol = self.y_data[idx] sample = \
{ 'predictors' : preds, 'political' : pol }
return sample
if __name__ == "__main__":
main()

Now, the dataset and dataloader must be created using code.

train_file = ".\\people_train.txt"
train_datas = mypeople(train_file, num_rows=8)
bat_size = 3
train_loadr = T.utils.data.DataLoader(train_datas,
batch_size=bat_size, shuffle=True)

PyTorch DataLoader  examples

1. We can use built-in datasets for PyTorch DataLoader. The MNIST dataset is considered here, where data normalization is done as there are digits. Iter function can be used to download the images and use it for further processing.

transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])
trainerset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True) dataiter = iter(trainloader)
images, labels = dataiter.next()
print(images.shape)
print(labels.shape)
plt.imshow(images[1].numpy().squeeze(), cmap='Greys_r')

2. We can also use custom datasets for DataLoader, where random numbers are selected in the dataset and loaded in the DataLoader.

from torch.utils.data import DataLoader
loader = DataLoader(dataset,batch_size=15, shuffle=True, num_workers=5 )
for i, batch in enumerate(loader):
print(i, batch)

Batches are divided into 15, and workers are assigned as 5 numbers. This provides us with the output for the dataset in 15 batches.

Conclusion

DataLoader helps in arranging the data well and hence by making all the data to be analyzed easily using PyTorch. Moreover, custom datasets can be created easily, and it is always advised to go with custom datasets as we can manipulate the data based on our requirements. These are the fundamentals for using DataLoader in PyTorch.

Recommended Articles

This is a guide to PyTorch DataLoader. Here we discuss How to create a PyTorch DataLoader along with the examples. You may also have a look at the following articles to learn more –

  1. Python Raw String
  2. PySpark GroupBy Agg
  3. exec Python
  4. CGI in Python

All in One Data Science Bundle (360+ Courses, 50+ projects)

360+ Online Courses

50+ projects

1500+ Hours

Verifiable Certificates

Lifetime Access

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Python Tutorial
  • PyTorch
    • PyTorch Image Classification
    • PyTorch Random
    • PyTorch Variable
    • PyTorch Activation Function
    • Python Formatted String
    • PyTorch GPU
    • PyTorch CUDA
    • PyTorch DataLoader
    • PyTorch LSTM
    • PyTorch Pad
    • PyTorch OpenCL
    • PyTorch Lightning
    • PyTorch SoftMax
    • PyTorch Flatten
    • PyTorch gan
    • PyTorch max
    • PyTorch pip
    • PyTorch Parameter
    • PyTorch Load Model
    • PyTorch Distributed
    • PyTorch BERT
    • PyTorch interpolate
    • PyTorch JIT
    • PyTorch expand
    • PyTorch AMD
    • PyTorch GRU
    • PyTorch rnn
    • PyTorch permute
    • PyTorch argmax
    • PyTorch SGD
    • PyTorch nn
    • PyTorch One Hot Encoding
  • Basics Part I
    • Introduction To Python
    • What Is Python
    • Careers in Python
    • Advantages of Python
    • Uses of Python
    • Python SQL Server
    • Python Features
    • Heap Data Structure Python
    • Scrapy cloud
    • Scrapy Python
    • Scrapy XPath
    • Python Fast And python psyco
    • Python ImportError
    • Python Recursion
    • Python Reduce
    • Local Variable in Python
    • Sparse Matrix in Python
    • Benefits and Limitations of Using Python
    • What can I do with?Python
    • Is Python a scripting language
    • clock() in Python
    • Is Python Object Oriented
    • Is Python Open Source
    • Python Socket Programming
    • Python HTTP Server
    • Useful Tips on Python Programming
    • Python You Should Be Using It
    • Python Web Development
    • Exclusive Things About Python Socket Programming (Basics)
    • Python Programming Beginners Tutorails
    • Practical Python Programming for Non-Engineers
    • Python Programming for the Absolute Beginner
    • Data Engineer with Python
    • Versions of? Python
  • Basic Part II
    • Comments in Python
    • sprintf Python
    • Finally in Python
    • Python Multiline Comment
    • Python Data Types
    • Python Variables
    • Python Variable Types
    • Python Global Variable
    • Python Variable Scope
    • Python Private Variables
    • Python Default Arguments
    • Python Command-line Arguments
    • Python try except
    • Coroutines python
    • Indentation in Python
    • Object in Python
    • Weka Python
    • Counting Sort Python
    • Hash table in Python
    • Radix Sort in Python
    • Hierarchical Cluster Python
    • Dataset in Python
    • Flask in Python
    • Python Keywords
    • Python Literals
    • Pointers in Python
    • Iterators in Python
    • Python Declare Variable
    • Python Modules
    • Python Private Method
    • Python dateutil
    • Python float to int
    • Python not equal
    • Python libraries list
    • Random forest in python
    • Data Manipulation with Python
    • Text to Speech in Python
    • Python Throw Exception
    • Python strftime
    • Python Widgets
    • Mean Shift Clustering Python
    • Sublime Text Shortcuts
    • Python User Input
    • Python Enumerate
    • Python Commands
    • Type Casting in Python
    • Python UUID
    • Abstraction in Python
    • Python Identifiers
    • Python Constants
    • What is NumPy in Python?
    • Cheat Sheet Python
  • Frameworks
    • Python Frameworks
    • Python Compilers
    • Python Editors
    • Best Compiler for Python
    • Python IDE for Windows
    • Python IDE on Linux
    • Python pseudocode
    • Iterator in Python
  • Installation
    • How To Install Python
    • Install Python on Linux
    • Install Python on Windows
    • Install Anaconda Python
  • Operator
    • Python Operators
    • Operator Precedence in Python
    • Arithmetic Operators in Python
    • Python Comparison Operators
    • Logical Operators in Python
    • Assignment Operators in Python
    • Unary Operators in Python
    • Python Ternary Operator
    • String Operators in Python
    • Python Int to String
    • Python String to Float
    • Python? string manipulation
    • Boolean Operators in Python
    • Identity Operators in Python
    • Python Bitwise Operator
    • Python Remainder Operator
    • Python object type
    • Python object serialization
    • Flask unit testing
    • Unit Testing in Python
    • Knapsack Problem Python
    • exec Python
    • Python Modulus Operator
  • Control Statement
    • Conditional Statements in Python
    • Control Statements in Python
    • If Condition in Python
    • If Else in Python
    • If Statement in Python
    • If Else Statement in Python
    • else if Statement in Python
    • Nested IF Statement in Python
    • Break Statement in Python
    • Python Switch Statement
    • Python Break Statement
  • Loops
    • Loops in Python
    • For Loop in Python
    • While Loop in Python
    • Do While Loop in Python
    • Python Nested Loops
    • Python Infinite Loop
    • Python?Event Loop
  • Sorting
    • Sorting in Python
    • Sorting Algorithms in Python
    • Bubble Sort in Python
    • Merge Sort in Python
    • Heap Sort in Python
    • Quick Sort in Python
    • Python Sorted Function
    • Sort string in Python
    • Insertion sort in Python
    • Shell sort in Python
    • Bucket Sort Python
  • Function
    • Python Built-in Functions
    • Math Functions in Python
    • Python @property decorator
    • Python String Functions
    • Python User Defined Functions
    • Trigonometric Functions in Python
    • Python Input Function
    • Python Input String
    • Python String Operations
    • Python String Contains
    • Byte to String Python
    • Python Stream
    • Python List to String
    • Python Multiline String
    • Python Regex
    • Python Regex Tester
    • Python regex replace
    • Python File Methods
    • Python Check if File Exists
    • Python Import CSV
    • Python Read CSV File
    • Python write CSV file
    • Python Delete File
    • Python File readline
    • Python if main
    • Python Main Method
    • List Method in Python
    • Python List Functions
    • Python List Comprehension
    • Python List Length
    • Python Lists Methods
    • Python Add List
    • Python List extend
    • Python Doubly Linked List
    • Recursive Function in Python
    • Copy List in Python
    • Python Copy File
    • Python Range Function
    • Python Substring
    • Python list remove()
    • Python List Index
    • Python Set Function
    • Python len Function
    • Python eval()
    • Python rstrip()
    • Pandas DataFrame.apply()
    • Python Counter
    • ord Function in Python
    • strip Function in Python
    • Split Function in Python
    • Python Round Function
    • Python Map Function
    • Python String Join
    • Python format() Function
    • Python Contextlib
    • Python Compare Strings
    • Python Return Value
    • Python List count
    • Filter in Python
    • Python Slice String
    • Python Absolute Value
    • Python Trim String
    • Python Type Function
    • Lowercase in Python
    • Python xrange
    • Python yield
    • Python Find String
    • Python Test Empty String
    • Max Function in Python
    • Python Power Function
    • pop() in Python
    • Python argparse
    • Python Pickle
    • Python Zip Function
    • Python Z Test
    • Python Split String
    • super() in Python
    • Python Extend
    • Python @staticmethod
    • Python Timezone
    • Timestamp to Date in Python
    • Python Timeit
    • Timsort Python
    • Python Property()
    • Python deepcopy
    • Python Dump
    • Python wait()
    • Statistical Analysis in Python
    • Python String Replace
    • Python PEP8
    • Python Filter Function
    • Python if then else
    • Lambda in Python
    • Python BeautifulSoup
    • BeautifulSoup Install
    • Python Sleep
    • Python Function Generator
    • Python @classmethod decorator
    • Python Endswith
    • Python BufferedReader
    • Python Async
    • Python Parser
    • Python SystemExit
    • Python pip
    • Python kwargs
  • Array
    • Arrays in Python
    • Python string to array
    • 2D Arrays In Python
    • 3d Arrays in Python
    • Multidimensional Array in Python
    • Python Array Functions
    • String Array in Python
    • Python Sort Array
    • Python Array Length
  • Inheritance
    • Inheritance in? Python
    • Single Inheritance in Python
    • Multiple Inheritance in Python
    • Multilevel Inheritance in Python
    • Interface in Python
  • Exception
    • Python Exception Handling
    • Custom Exception in Python
    • Indentation Error in Python
    • Python Memory Error
    • Python IOError
    • Python EOFError
    • Python NotImplementedError
    • Python TypeError
    • Python ValueError
    • Python AssertionError
    • Python Unicode Error
    • Python NameError
    • Python StopIteration
    • Python OverflowError
    • Python KeyboardInterrupt
  • Advanced
    • Scope in Python
    • Python Itertools
    • Python 3 xrange
    • Python Join List
    • OrderedDict in Python
    • Python Collections
    • Constructor in Python
    • Destructor in Python
    • Python Overloading
    • Python User Defined Exception
    • statsmodels Python
    • Tkinter Pack
    • Tkinter Scale
    • Tkinter Table
    • Overriding in Python
    • Function Overloading in Python
    • Method Overloading in Python
    • Operator Overloading in Python
    • Python NOT Operator
    • Method Overriding in Python
    • Encapsulation in Python
    • Static Method in Python
    • classmethod in Python
    • Assert in Python
    • Polymorphism in Python
    • Python References
    • Python Virtualenv
    • Python mkdir
    • Logistic Regression in Python
    • Dictionary in Python
    • Python Directories
    • Regular Expression in Python
    • Python Import Module
    • Python OS Module
    • Python Sys Module
    • Python Generators
    • Abstract Class in Python
    • Python File Operations
    • Sequences in Python
    • Stack in Python
    • Queue in Python
    • Deque in Python
    • Tuples in Python
    • Python Magic Method
    • Python Sets
    • Python Set Methods
    • Priority Queues in Python
    • Python Create Directory
    • Reverse Engineering with Python
    • Underscore in Python
    • Serverless Python
    • String Formatting in Python
    • f String in Python
    • Python isinstance
    • String Length Python
    • Python Concurrency
    • Python List
    • Python Initialize List
    • Python Unique List
    • Python Sort List
    • Selection Sort in Python
    • Python Reverse List
    • Python Empty List
    • List Comprehensions Python
    • List Operations in Python
    • Python Repository
    • Python Database Connection
    • Python SQLite
    • Data Analysis with Python
    • Python Language
    • Python SQL
    • Python SQL Library
    • Python SQLite Create Database
    • Send Mail in Python
    • Bash Scripting and Python
    • Violent Python Book
    • NLP in Python
    • Matplotlib In Python
    • Gray Hat Python: Security
    • Python Subprocess
    • Python bokeh
    • Python pillow resize image
    • Python xlrd
    • Python Projects
    • Python Threading Timer
    • Python Threadpool
    • Python Statistics Module
    • How to Call a Function in Python?
    • Python Curl
    • JSON in Python
    • Python JSON to string
    • Python json.dumps
    • Python Turtle
    • Python testing framework
    • Python Unit Test
    • pass Keyword in Python
    • Tokenization in Python
    • Random Module in Python
    • Python Multiprocessing
    • Python getattr
    • Collection Module in Python
    • Print Statement in Python
    • Python Countdown Timer
    • Python Context Manager
    • File Handling in Python
    • Python Event Handler
    • Python Print Table
    • Python Docstring
    • Python Dictionary Keys
    • Python Iterator Dictionary
    • Python Class Attributes
    • Python Dictionary Methods
    • Namedtuple Python
    • OpenCV Python
    • OpenCV erode
    • OpenCV save image
    • Traceback in Python
    • Decorator in Python
    • Python Pygame
    • Python Class Constants
    • Python Validation
    • Python Switch Case
    • Linked List in Python
    • DFS Algorithm in Python
    • Priority queue algorithm
    • Tree Traversal Python
    • AVL Tree Python
    • Binary Search Tree Python
    • Binary tree in Python
    • Binary search in Python
    • BFS Algorithm Python
    • Python Rest Server
    • Python Yield vs Return
    • Python Pickle vs JSON
    • Python Read Excel File
    • Seaborn
    • Seaborn Histogram
    • Seaborn heatmap
    • Seaborn barplot
    • Seaborn Scatter Plot
  • Tkinter
    • Python Tkinter
    • Tkinter Widgets
    • Tkinter background image
    • Tkinter button color
    • Tkinter place
    • Python Tkinter Button
    • Python Tkinter Canvas
    • Tkinter Frame
    • Tkinter LabelFrame
    • Python Tkinter Label
    • Tkinter Text
    • Tkinter Scrollbar
    • Tkinter Listbox
    • Tkinter Spinbox
    • Tkinter Checkbutton
    • Tkinter Menu
    • Tkinter Menubutton
    • Tkinter OptionMenu
    • Tkinter Messagebox
    • Tkinter Grid
    • Python Tkinter Entry
    • Tkinter after
    • Tkinter Colors
    • Tkinter Font
    • Tkinter PhotoImage
    • Tkinter TreeView
    • Tkinter Notebook
    • Tkinter Combobox
    • Tkinter Bind
    • Tkinter Icon
    • Tkinter Window Size
    • Tkinter Color Chart
    • Tkinter Slider
    • Tkinter Calculator
    • Tkinter geometry
    • Tkinter image
    • Tkinter input box
    • Tkinter mainloop
  • Programs
    • Patterns in Python
    • Star Patterns in Python
    • Swapping in Python
    • Factorial in Python
    • Fibonacci Series in Python
    • Reverse Number in Python
    • Binary number in Python
    • Palindrome in Python
    • Random Number Generator in Python
    • Prime Numbers in Python
    • Armstrong Number in Python
    • Perfect Number in Python
    • Strong Number in Python
    • Leap Year Program in Python
    • Anagram Program in Python
    • Square Root in Python
    • Python Reverse String
    • Python Object to String
    • Python string append
    • Python Raw String
    • Python Object to JSON
    • Python Classmethod vs Staticmethod
  • Python 3
    • Python 3 Commands
    • Python 3 input
    • Python 3 JSON
    • Python 3 string
    • Python 3 try-except
    • Python 3 RegEx
    • Python 3 Object-Oriented Programming
    • Python 3 zip
    • Python 3 Exception
    • Python 3 write to file
    • Python 3 Functions
    • Python 3 List Function
    • Python 3 While Loop
    • Python 3 Global Variable
    • Python 3 String Methods
    • Python 3 interpreter
    • Python 3 REPL
    • Python 3 else if
    • Python 3 basics
    • Python 3 cheat sheet
    • Python 3 Print
    • Python 3 For Loop
    • Python 3 range
    • Python 3 Dictionary
    • Python 3 has_key
    • Python 3 URLlib
  • NLTK
    • What is NLTK
    • NLTK Stemming
    • NLTK word_tokenize
    • NLTK WordNet
  • SpaCy
    • SpaCy ner
    • SpaCy Tokenizer
    • SpaCy NLP
    • SpaCy models
  • Interview Question
    • Python Interview Questions And Answers

Related Courses

Python Certification Course

Programming Languages Courses

Angular JS Certification Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

Special Offer - Python Certification Course Learn More