This repository provides a comprehensive guide and practical examples for training deep learning models using PyTorch across various parallelism strategies. Whether you are working on single-GPU training or scaling to multi-GPU setups with Distributed Data Parallel (DDP) or Fully Sharded Data Parallel (FSDP), these examples will guide you through the process.
- Foundational concepts of deep learning and PyTorch.
- Basics of tensors, datasets, and model building.
- Efficiently training models on a single GPU.
- Profiling tools and techniques to optimize performance.
- Scaling models across multiple GPUs using
torch.nn.DataParallel
. - Profiling and optimizing data parallel workloads.
- Leveraging
torch.nn.parallel.DistributedDataParallel
for efficient multi-GPU training. - Setting up process groups, distributed samplers, and profiling DDP workloads.
- Training large models with memory efficiency using Fully Sharded Data Parallel (FSDP).
- Fine-tuning large-scale models like CodeLlama with gradient checkpointing and parameter sharding.
- PyTorch Documentation
- Basics_Pytorch
- Data_Parallel
- Distributed_Data_parallel
- Hugging Face Transformers
- PyTorch FSDP Tutorial
- If you are already familiar with deep learning with PyTorch, you can skip 01. Introduction to Deep Learning and go directly to 02. Single-GPU Training.