Skip to content

kishoryd/workshop

Repository files navigation

Deep Learning Training Examples with PyTorch

This repository provides a comprehensive guide and practical examples for training deep learning models using PyTorch across various parallelism strategies. Whether you are working on single-GPU training or scaling to multi-GPU setups with Distributed Data Parallel (DDP) or Fully Sharded Data Parallel (FSDP), these examples will guide you through the process.


Contents

01. Introduction to Deep Learning

  • Foundational concepts of deep learning and PyTorch.
  • Basics of tensors, datasets, and model building.

02. Single-GPU Training

  • Efficiently training models on a single GPU.
  • Profiling tools and techniques to optimize performance.

03. Multi-GPU Training with Data Parallelism (DP)

  • Scaling models across multiple GPUs using torch.nn.DataParallel.
  • Profiling and optimizing data parallel workloads.

04. Distributed Data Parallel (DDP) Training

  • Leveraging torch.nn.parallel.DistributedDataParallel for efficient multi-GPU training.
  • Setting up process groups, distributed samplers, and profiling DDP workloads.

05. Fully Sharded Data Parallel (FSDP) Training

  • Training large models with memory efficiency using Fully Sharded Data Parallel (FSDP).
  • Fine-tuning large-scale models like CodeLlama with gradient checkpointing and parameter sharding.

Resources



NOTE


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published