Skip to content

wangbluo/Finetune_llama2_Megatron

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finetune_llama2_Megatron

Building a llama fine-tuning script from scratch using PyTorch and the transformers API, with support for four optional parameters: gradient checkpoint, mixed precision, data parallelism, and tensor parallelism. Avoid using ColossalAI/Megatron/DeepSpeed. Referring to existing code is allowed.

The loss curve: img_v3_0284_f9313a8e-9e61-41fc-b35f-5d2e7e991aag

Multiple nodes scripts:

torchrun --nnodes 2 --node_rank=0 --master_addr=10.90.1.166 --nproc_per_node=8 finetune.py torchrun --nnodes 2 --node_rank=1 --master_addr=10.90.1.166 --nproc_per_node=8 finetune.py

About

Using megatron style to do TP training.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published