Building a llama fine-tuning script from scratch using PyTorch and the transformers API, with support for four optional parameters: gradient checkpoint, mixed precision, data parallelism, and tensor parallelism. Avoid using ColossalAI/Megatron/DeepSpeed. Referring to existing code is allowed.
Multiple nodes scripts:
torchrun --nnodes 2 --node_rank=0 --master_addr=10.90.1.166 --nproc_per_node=8 finetune.py torchrun --nnodes 2 --node_rank=1 --master_addr=10.90.1.166 --nproc_per_node=8 finetune.py