EDUCBA

EDUCBA

MENUMENU
  • Blog
  • Free Courses
  • All Courses
  • All in One Bundle
  • Login
Home Software Development Software Development Tutorials PyTorch Tutorial PyTorch Distributed

PyTorch Distributed

PyTorch Distributed

Introduction to PyTorch Distributed

We have the torch. distributed in PyTorch, where researchers and developers can make their computations in parallel mode across each process and clusters in machine learning processes. Here, message passing semantics is leveraged, which helps any process to communicate with other processes through the messages. Different communication backends are used, and the communication doesn’t need to be from the same machine. But, of course, multiple processes should be started in parallel, and coordination tools in the cluster must be enabled.

PyTorch Distributed Overview

There are three main components in the torch. First, distributed as distributed data-parallel training, RPC-based distributed training, and collective communication. Distributed data-parallel training (DDP) is multiple training programs where the model is replicated in each process, and each model will have different input data fed into it. Model replicas are synchronized where DDP does gradient communication, and it helps to fasten the training with the computations of the gradient.

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

RPC helps in general training structures that cannot be used in DDP. This includes distributed data parallelism, parameter server, and those models where DDP cannot be combined with other training. In addition, remote object lifetime can be managed easily in RPC, and this can be used to extend autograd where machine boundaries can be extended.

Tensors sent across each process in a single group are supported in collective communication. We have collective communication APIs and P2P communication APIs here. Many training scenarios can be handled using DDP and RPC. Here we can manage the averages of the model parameters by making distributed parameter averaging, and here DDP need not be used to communicate with the gradients. Communications can be decoupled easily from each model, and there will be fine control over the communication in both DDP and RPC.

PyTorch Distributed Models

Code:

import os
import sys
import tempfile
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.optim as optim
import torch.multiprocessing as mp

from torch.nn.parallel import DistributedDataParallel as DDP
def setup_model(rank01, world_size):
    os.environ['MASTER_ADDR'] = 'localhost'
    os.environ['MASTER_PORT'] = '12675'
    dist.init_process_group("gloo", rank=rank, world_size=world_size)

def cleanup_model():
    dist.destroy_process_group()
class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.net01 = nn.Linear(5, 5)
        self.relu = nn.ReLU()
        self.net02 = nn.Linear(5, 2)

    def forward_pass(self, a):
        return self.net02(self.relu(self.net01(a)))

def demo (rank01, world_size):
    print(f"Running basic DDP on rank01 {rank01}.")
    setup(rank01, world_size)
    model = Model().to(rank01)
    ddp_model = DDP(model, device_ids=[rank01])

    loss_function = nn.MSELoss()
    optimize = optim.SGD(ddp_model.parameters(), lr=0.01)

    optimize.zero_grad()
    out = ddp_model(torch.randn(10, 10))
    labels = torch.randn(10, 5).to(rank01)
    loss_function(out, labels).backward()
    optimize.step()

    cleanup()

def run_demo(demo_function, world_size):
    mp.spawn(demo_function,
             args=(world_size,),
             nprocs=world_size,
             join=True)
def checkpoint(rank01, world_size):
    print(f"Running checkpoint example on rank01 {rank01}.")
    setup(rank01, world_size)

    model = Model().to(rank01)
    ddp_model = DDP(model, device_ids=[rank01])

    loss_function = nn.MSELoss()
    optimize = optim.SGD(ddp_model.parameters(), lr=0.01)

    CHECKPOINT  = tempfile.gettempdir() + "/model.checkpoint"
    if rank == 0:
	torch.save(ddp_model.state_dict(), CHECKPOINT)
dist.barrier()
    map = {'cuda:%d' % 0: 'cuda:%d' % rank}
    ddp_model.load_state_dict(
        torch.load(CHECKPOINT, map_location=map))

    optimize.zero_grad()
    out = ddp_model(torch.randn(10, 10))
    labels = torch.randn(10, 5).to(rank01)
    loss_function = nn.MSELoss()
    loss_function(out, labels).backward()
    optimize.step()

    if rank == 0:
        os.remove(CHECKPOINT)

    cleanup()
class Model(nn.Module):
    def __init__(self, dev0, dev1):
        super(Model, self).__init__()
        self.dev0 = dev00
        self.dev1 = dev01
        self.net01 = torch.nn.Linear(5, 5).to(dev00)
        self.relu01 = torch.nn.ReLU()
        self.net02 = torch.nn.Linear(5, 5).to(dev01)

    def forward_pass(self, x):
        a = a.to(self.dev00)
        a = self.relu(self.net01(a))
        a = a.to(self.dev01)
        return self.net02(a)
def demo_model(rank01, world_size):
    print(f"Running with model parallel example on rank01 {rank01}.")
    setup(rank01, world_size)

    dev00 = (rank * 2) % world_size
    dev01 = (rank * 2 + 1) % world_size
    mp_model01 = Model(dev00, dev01)
    mp_model = DDP(model)

    loss_function = nn.MSELoss()
    optimize = optim.SGD(ddp_ model.parameters(), lr=0.01)

    optimize.zero_grad()
    out = ddp_model(torch.randn(10, 10))
    labels = torch.randn(10, 5).to(dev01)
    loss_function(outputs, labels).backward_pass()
    optimize.step()

    cleanup()

PyTorch Distributed Strategy

  • In data parallel training, the following points should be noted. First, if training speed doesn’t matter and if the data and model can fit inside one GPU, it is better to use one device training. Single machine multi-GPU can be used where multiple GPUs can be used in a single machine. This helps in fastening the training with fewer code changes in the system. Third, Multimachine and distributed data-parallel can be used if more codes can be written and if the application must be used in various machine boundaries. If errors are present in the model, it is better to use the torch.distributed.elastic, where resources leaving and joining dynamically, will not cause problems.
  • We have a data-parallel package in torch with fewer codes and multiple GPUs. Only one line needs to be changed in the code. Model is replicated in all the forward pass, and there is a GIL convention in the code. However, it does not offer better performance.
  • RPC has four main categories such as RRef, Distributed Autograd, Distributed Optimizer, and RPC. Given function can be run in a remote worker using RPC, whereas RRef can manage the lifetime of the same. Parameters are updated using Distributed Optimizer using gradients from the autograd engine.

Add the Distributed Initialization

We should initialize the package first so that all the processes will be done together.

torch.distributed.is_available()

We must initialize the process group and the distributed package. Store, rank, and world_size should be stored separately, and the init method should be specified. We should also store the URL parameters and specify the rank and world_size. Following parameters should be needed to initialize the distributed package. Init_method, rank, and world_size parameters are optional, which describes the initialization of the process group, the number of processes in the job, and the rank of the current process.

Build time configurations are needed in the initialization process where parameters such as mpi, gloo, and nccl are used. Each process should have separate access for GPU because it might create deadlocks while using the same GPU for all processes. The timeout parameter is also an optional one where the default value is given as 30 minutes. Connection information is exchanged using the store parameter, and a group name is passed using the group_name parameter. Finally, process group options are captured using pg_options.

Conclusion

Autograd can be enabled in PyTorch Distributed that helps to do collective communications between all the modules and classes. The autograd graph captures functions, and there will not be a deadlock as the functions are managed carefully by autograd. All the functions work perfectly in a backward pass now, and this is the best way to get the work done.

Recommended Articles

We hope that this EDUCBA information on “PyTorch Distributed” was beneficial to you. You can view EDUCBA’s recommended articles for more information.

  1. What is PyTorch?
  2. PyTorch Versions
  3. Python List to String
  4. Python Raw String
All in One Excel VBA Bundle
500+ Hours of HD Videos
15 Learning Paths
120+ Courses
Verifiable Certificate of Completion
Lifetime Access
Financial Analyst Masters Training Program
1000+ Hours of HD Videos
43 Learning Paths
250+ Courses
Verifiable Certificate of Completion
Lifetime Access
All in One Data Science Bundle
1500+ Hour of HD Videos
80 Learning Paths
360+ Courses
Verifiable Certificate of Completion
Lifetime Access
All in One Software Development Bundle
3000+ Hours of HD Videos
149 Learning Paths
600+ Courses
Verifiable Certificate of Completion
Lifetime Access
Primary Sidebar
All in One Software Development Bundle3000+ Hours of HD Videos | 149 Learning Paths | 600+ Courses | Verifiable Certificate of Completion | Lifetime Access
Financial Analyst Masters Training Program1000+ Hours of HD Videos | 43 Learning Paths | 250+ Courses | Verifiable Certificate of Completion | Lifetime Access
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Java Tutorials
  • Python Tutorials
  • All Tutorials
Certification Courses
  • All Courses
  • Software Development Course - All in One Bundle
  • Become a Python Developer
  • Java Course
  • Become a Selenium Automation Tester
  • Become an IoT Developer
  • ASP.NET Course
  • VB.NET Course
  • PHP Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2023 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA
Free Software Development Course

Web development, programming languages, Software testing & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more