Train LLaMA

A containerized pipeline for training and fine-tuning LLaMA models using DeepSpeed and Hugging Face's TRL library.

Overview

This project provides a streamlined workflow for training LLaMA models with the following features:

Configurable model architecture and training parameters
Support for both pre-training and instruction fine-tuning
Distributed training using DeepSpeed
Automatic data preprocessing and tokenization
Hugging Face Hub integration for model hosting

Configuration

The training pipeline can be configured through a JSON configuration file. Reference the config file for available parameters:

Key parameters include:

batch-size: Training batch size
epochs: Number of training epochs
learning-rate: Model learning rate
max-seq-length: Maximum sequence length
input-dataset: Dataset for pre-training
instruct-dataset: Dataset for instruction fine-tuning
output-repo: Target repository for saving models
instruct-finetune-bool: Toggle between pre-training and instruction fine-tuning

Requirements

Docker
NVIDIA GPU with CUDA support

Dependencies are managed through the requirements text.

Usage

Clone this repository

git clone https://github.com/nroggendorff/train-llama.git && cd train-llama

Configure your training parameters in config.json
Build the Docker image:

docker buildx build . -t train-llama

Run training:

docker run --gpus all train-llama

GitHub Actions

The project includes automated workflows for:

Building and pushing Docker images
Triggering training runs on external compute resources
Managing model versions through pull requests

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Notes

The pipeline uses Git LFS for handling large files
Models are automatically pushed to Hugging Face Hub when push-to-hub is enabled
Training can be resumed from checkpoints by adjusting the init parameter

Name		Name	Last commit message	Last commit date
Latest commit History 930 Commits
.github/workflows		.github/workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.json		config.json
config.py		config.py
prep.py		prep.py
requirements.txt		requirements.txt
run.py		run.py
train.py		train.py
trainer.sh		trainer.sh
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Train LLaMA

Overview

Configuration

Requirements

Usage

GitHub Actions

License

Contributing

Notes

About

Releases

Packages

Contributors 2

Languages

License

nroggendorff/train-llama

Folders and files

Latest commit

History

Repository files navigation

Train LLaMA

Overview

Configuration

Requirements

Usage

GitHub Actions

License

Contributing

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages