Skip to content

A hub for training and evaluating LLMs, following the multitask paradigm, in the recommender system domain!

License

Notifications You must be signed in to change notification settings

swapUniba/LaikaLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LaikaLLM logo

Documentation Hugging Face Pytorch WandB Docker

LaikaLLM

[Documentation][Sample Experiments]

LaikaLLM is a software, for researchers, that helps in setting up a repeatable, reproducible, replicable protocol for training and evaluating multitask LLM for recommendation!

Features:

  • Two different model family implemented at the moment of writing (T5 and GPT2)
  • Fully vectorized Ranking (NDCG, MAP, HitRate, ...) and Error (RMSE, MAE) metrics
  • Fully integrated with WandB monitoring service
  • Full use of transformers and datasets libraries
  • Easy to use (via .yaml configuration or Python api)
  • Fast (Intended to be used for consumer gpus)
  • Fully modular and easily extensible!

The goal of LaikaLLM is to be the starting point, a hub, for all developers which want to evaluate the capability of LLM models in the recommender system domain with a keen eye on devops best practices!

Want a glimpse of LaikaLLM? This is an example configuration which runs the whole experiment pipeline, starting from data pre-processing, to evaluation:

exp_name: to_the_moon
device: cuda:0
random_seed: 42

data:
  AmazonDataset:
    dataset_name: toys

model:
  T5Rec:
    name_or_path: "google/flan-t5-base"
  n_epochs: 10
  train_batch_size: 32
  train_tasks:
    - SequentialSideInfoTask
    - RatingPredictionTask

eval:
  eval_batch_size: 16
  eval_tasks:
    SequentialSideInfoTask:
      - hit@1
      - hit@5
      - map@5
    RatingPredictionTask:
      - rmse

The whole pipeline can then be executed by simply invoking python laikaLLM.py -c config.yml!

If you want to have a full view on how experiments are visualized in WandB and more, head up to sample_experiments!

Motivation

The adoption of LLM in the recommender system domain is a new research area, thus it's difficult to find pre-made and well-built software designed specifically for LLM.

With LaikaLLM the idea is to fill that gap, or at least "start the conversation" about the importance of developing accountable experiment pipelines

Installation

Via Docker Image

Simply pull the latest LaikaLLM Docker Image which includes every preliminary step to run the project, including setting PYTHONHASHSEED and CUBLAS_WORKSPACE_CONFIG for reproducibility purposes

From source

LaikaLLM requires Python 3.10 or later, and all packages needed are listed in requirements.txt

  • Torch with cuda 11.7 has been set as requirement for reproducibility purposes, but feel free to change the cuda version with the most appropriate for your use case!

To install LaikaLLM:

  1. Clone this repository and change work directory:
git clone https://github.com/Silleellie/LaikaLLM.git
cd LaikaLLM
  1. Install the requirements:
pip install -r requirements.txt
  1. Start experimenting!
  • Use LaikaLLM via Python API or via .yaml config!

NOTE: It is highly suggested to set the following environment variables to obtain 100% reproducible results of your experiments:

export PYTHONHASHSEED=42
export CUBLAS_WORKSPACE_CONFIG=:16:8

You can check useful info about the above environment variables here and here

Usage

Note: when using LaikaLLM, the working directory should be set to the root of the repository!

LaikaLLM can be used in two different ways:

  • .yaml config
  • Python API

Both use cases follow the data-model-evaluate logic, in code and project structure, but also in the effective usage of LaikaLLM

In the documentation there are extensive examples for both use cases, what follows is a small example of the same experiment using the .yaml config and the Python API.

In this simple experiment, we will:

  1. Use the toys Amazon Dataset and add 'item' and 'user' prefixes to each item and user ids
  2. Train the distilgpt2 model on the SequentialSideInfoTask
  3. Evaluate results using hit@10 and hit@5

Yaml config

  • Define your custom params.yml:

    exp_name: simple_exp
    device: cuda:0
    random_seed: 42
    
    data:
      AmazonDataset:
        dataset_name: toys
        add_prefix_items_users: true
    
    model:
      GPT2Rec:
        name_or_path: "distilgpt2"
      n_epochs: 10
      train_batch_size: 8
      train_tasks:
        - SequentialSideInfoTask
    
    eval:
      eval_batch_size: 4
      eval_tasks:
        SequentialSideInfoTask:
          - hit@10
          - hit@5
  • After defining the above params.yml, simply execute the experiment with python laikaLLM.py -c params.yml

    • The model trained and the evaluation results will be saved into models and reports/metrics

Python API

from src.data.datasets.amazon_dataset import AmazonDataset
from src.data.tasks.tasks import SequentialSideInfoTask
from src.evaluate.evaluator import RecEvaluator
from src.evaluate.metrics.ranking_metrics import Hit
from src.model.models.gpt import GPT2Rec
from src.model.trainer import RecTrainer

if __name__ == "__main__":
    
    # data phase
    ds = AmazonDataset("toys", add_prefix_items_users=True)
    
    ds_splits = ds.get_hf_datasets()
    
    train_split = ds_splits["train"]
    val_split = ds_splits["validation"]
    test_split = ds_splits["test"]
    
    # model phase
    model = GPT2Rec("distilgpt2",
                    training_tasks_str=["SequentialSideInfoTask"],
                    all_unique_labels=list(ds.all_items))
    
    trainer = RecTrainer(model,
                         n_epochs=10,
                         batch_size=8,
                         train_sampling_fn=ds.sample_train_sequence,
                         output_dir="models/simple_experiment")
    
    trainer.train(train_split)
    
    # eval phase
    evaluator = RecEvaluator(model, eval_batch_size=4)
    
    evaluator.evaluate_suite(test_split,
                             tasks_to_evaluate={SequentialSideInfoTask(): [Hit(k=10), Hit(k=5)]},
                             output_dir="reports/metrics/simple_experiment")

Credits

A heartfelt "thank you" to P5 authors which, with their work, inspired the idea of this repository and for making available a preprocessed version of the Amazon Dataset which in this project I've used as starting point for further manipulation.

Yes, the cute logo is A.I. generated. So thank you DALL-E 3!

Project Organization

β”œβ”€β”€ πŸ“ data                          <- Directory containing all data generated/used
β”‚   β”œβ”€β”€ πŸ“ processed                     <- The final, canonical data sets used for training/validation/evaluation
β”‚   └── πŸ“ raw                           <- The original, immutable data dump
β”‚
β”œβ”€β”€ πŸ“ mkdocs                        <- Directory containing source code for the online documentation
|
β”œβ”€β”€ πŸ“ models                        <- Directory where trained and serialized models will be stored
β”‚
β”œβ”€β”€ πŸ“ reports                       <- Where metrics will be stored after performing the evaluation phase
β”‚   └── πŸ“ metrics                          
β”‚
β”œβ”€β”€ πŸ“ sample_experiments            <- Config and results of multiple experiment runs made with LaikaLLM
β”‚
β”œβ”€β”€ πŸ“ src                           <- Source code of the project
β”‚   β”œβ”€β”€ πŸ“ data                          <- All scripts related to datasets and tasks
β”‚   β”‚   β”œβ”€β”€ πŸ“ datasets                  <- All datasets implemented
β”‚   β”‚   β”œβ”€β”€ πŸ“ tasks                     <- All tasks implemented
β”‚   β”‚   β”œβ”€β”€ πŸ“„ abstract_dataset.py       <- The interface that all datasets should implement
β”‚   β”‚   β”œβ”€β”€ πŸ“„ abstract_task.py          <- The interface that all tasks should implement
β”‚   β”‚   └── πŸ“„ main.py                   <- Script used to perform the data phase when using LaikaLLM via .yaml
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ evaluate                  <- Scripts to evaluate the trained models
β”‚   β”‚   β”œβ”€β”€ πŸ“ metrics                   <- Scripts containing different metrics to evaluate the predictions generated
β”‚   β”‚   β”œβ”€β”€ πŸ“„ abstract_metric.py        <- The interface that all metrics should implement
β”‚   β”‚   β”œβ”€β”€ πŸ“„ evaluator.py              <- Script containing the Evaluator class used for performing the eval phase
β”‚   β”‚   └── πŸ“„ main.py                   <- Script used to perform the eval phase when using LaikaLLM via .yaml
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“ model                     <- Scripts to define and train models
β”‚   β”‚   β”œβ”€β”€ πŸ“ models                    <- Scripts containing all the models implemented
β”‚   β”‚   β”œβ”€β”€ πŸ“„ abstract_model.py         <- The interface that all models should implement
β”‚   β”‚   β”œβ”€β”€ πŸ“„ main.py                   <- Script used to perform the eval phase when using LaikaLLM via .yaml
β”‚   β”‚   └── πŸ“„ trainer.py                <- Script containing the Trainer class used for performing the train phase
β”‚   β”‚
β”‚   β”œβ”€β”€ πŸ“„ __init__.py               <- Makes src a Python module
β”‚   β”œβ”€β”€ πŸ“„ utils.py                  <- Contains utils function for the project
β”‚   └── πŸ“„ yml_parse.py              <- Script responsible for coordinating the parsing of the .yaml file
β”‚
β”œβ”€β”€ πŸ“ tests                         <- Package containing all tests for the source code
|
β”œβ”€β”€ πŸ“„ laikaLLM.py                   <- Script to invoke via command line to use LaikaLLM via .yaml
β”œβ”€β”€ πŸ“„ LICENSE                       <- MIT License
β”œβ”€β”€ πŸ“„ params.yml                    <- The example .yaml config for starting using LaikaLLM
β”œβ”€β”€ πŸ“„ README.md                     <- The top-level README for developers using this project
└── πŸ“„ requirements.txt              <- The requirements file for reproducing the environment (src package)

Project based on the cookiecutter data science project template. #cookiecutterdatascience