A containerized pipeline for training and fine-tuning LLaMA models using DeepSpeed and Hugging Face's TRL library.
This project provides a streamlined workflow for training LLaMA models with the following features:
- Configurable model architecture and training parameters
- Support for both pre-training and instruction fine-tuning
- Distributed training using DeepSpeed
- Automatic data preprocessing and tokenization
- Hugging Face Hub integration for model hosting
The training pipeline can be configured through a JSON configuration file. Reference the config file for available parameters:
Key parameters include:
batch-size
: Training batch sizeepochs
: Number of training epochslearning-rate
: Model learning ratemax-seq-length
: Maximum sequence lengthinput-dataset
: Dataset for pre-traininginstruct-dataset
: Dataset for instruction fine-tuningoutput-repo
: Target repository for saving modelsinstruct-finetune-bool
: Toggle between pre-training and instruction fine-tuning
- Docker
- NVIDIA GPU with CUDA support
Dependencies are managed through the requirements text.
- Clone this repository
git clone https://github.com/nroggendorff/train-llama.git && cd train-llama
-
Configure your training parameters in
config.json
-
Build the Docker image:
docker buildx build . -t train-llama
- Run training:
docker run --gpus all train-llama
The project includes automated workflows for:
- Building and pushing Docker images
- Triggering training runs on external compute resources
- Managing model versions through pull requests
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- The pipeline uses Git LFS for handling large files
- Models are automatically pushed to Hugging Face Hub when
push-to-hub
is enabled - Training can be resumed from checkpoints by adjusting the
init
parameter