Train Large Text Prediction Models with Lightning

Run • Lightning AI • Docs

Use Lightning train large language model for text generation, with as many parameters as you want (up to billions!).

You can do this:

using multiple GPUs
across multiple machines
with the most advanced and efficient training techniques
on your own data
all without any infrastructure hassle!

All handled easily with the Lightning Apps framework.

Prediction Example

A typical example of text prediction could look like this:

Prompt: Please be aware of the

Prediction: situation

Run

To run paste the following code snippet in a file app.py:

#! pip install git+https://github.com/Lightning-AI/lightning-LLMs git+https://github.com/Lightning-AI/LAI-Text-Prediction-Component
#! curl https://cs.stanford.edu/people/karpathy/char-rnn/shakespeare_input.txt --create-dirs -o ${HOME}/data/shakespeare/input.txt -C -


import lightning as L
import os, torch
from lightning_gpt import models
from lit_llms.tensorboard import (
    DriveTensorBoardLogger,
    MultiNodeLightningTrainerWithTensorboard,
)

from lai_textpred import default_callbacks, gpt_20b, WordDataset, error_if_local


class WordPrediction(L.LightningWork):
    def __init__(self, *args, tb_drive, **kwargs):
        super().__init__(*args, **kwargs)
        self.tensorboard_drive = tb_drive

    def run(self):
        error_if_local()

        # -------------------
        # CONFIGURE YOUR DATA
        # -------------------
        with open(os.path.expanduser("~/data/shakespeare/input.txt")) as f:
            text = f.read()
        train_dataset = WordDataset(text, 5)
        train_loader = torch.utils.data.DataLoader(
            train_dataset, batch_size=160, num_workers=4, shuffle=True
        )

        # --------------------
        # CONFIGURE YOUR MODE
        # --------------------
        model = models.DeepSpeedMinGPT(
            vocab_size=train_dataset.vocab_size,
            block_size=int(train_dataset.block_size),
            fused_adam=False,
            model_type=None,
            **gpt_20b,
        )

        # -----------------
        # RUN YOUR TRAINING
        # -----------------
        trainer = L.Trainer(
            max_epochs=2,
            limit_train_batches=250,
            precision=16,
            strategy="deepspeed_stage_3_offload",
            callbacks=default_callbacks(),
            log_every_n_steps=5,
            logger=DriveTensorBoardLogger(save_dir=".", drive=self.tensorboard_drive),
        )
        trainer.fit(model, train_loader)


app = L.LightningApp(
    MultiNodeLightningTrainerWithTensorboard(
        WordPrediction,
        num_nodes=3,
        cloud_compute=L.CloudCompute("gpu-fast-multi"),
    )
)

Running on the cloud

lightning run app app.py --cloud

Don't want to use the public cloud? Contact us at [email protected] for early access to run on your private cluster (BYOC)!

Running locally

This example is optimized for the cloud. It is therefore not possible to run this app locally. Please refer to our text classification example for a similar app to run locally.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github		.github
lai_textpred		lai_textpred
tests		tests
.gitignore		.gitignore
.lightningignore		.lightningignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Train Large Text Prediction Models with Lightning

Prediction Example

Run

Running on the cloud

Running locally

About

Releases

Packages

Contributors 5

Languages

License

Lightning-Universe/Text-Prediction_component

Folders and files

Latest commit

History

Repository files navigation

Train Large Text Prediction Models with Lightning

Prediction Example

Run

Running on the cloud

Running locally

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages