Skip to content

Latest commit

 

History

History
11 lines (7 loc) · 710 Bytes

README.md

File metadata and controls

11 lines (7 loc) · 710 Bytes

Finetune_llama2_Megatron

Building a llama fine-tuning script from scratch using PyTorch and the transformers API, with support for four optional parameters: gradient checkpoint, mixed precision, data parallelism, and tensor parallelism. Avoid using ColossalAI/Megatron/DeepSpeed. Referring to existing code is allowed.

The loss curve: img_v3_0284_f9313a8e-9e61-41fc-b35f-5d2e7e991aag

Multiple nodes scripts:

torchrun --nnodes 2 --node_rank=0 --master_addr=10.90.1.166 --nproc_per_node=8 finetune.py torchrun --nnodes 2 --node_rank=1 --master_addr=10.90.1.166 --nproc_per_node=8 finetune.py