Skip to content

[NeurIPS 2024] Source code for "Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models"

License

Notifications You must be signed in to change notification settings

hui-po-wang/LM-GC

Repository files navigation



Logo

The official implementation of "Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models" published at NeurIPS 2024.

[Preprint]

Overview

Product Screenshot

This project provides the source code of LM-GC, the first LLM-powered gradient compressor.

Here are takeaways:

  • We demonstrate that large language models (LLMs) hold significant potential as prior models for gradients, a concept that has been widely applied in other modalities but gradients.
  • We introduce a novel serialization method that converts IEEE 754 floating points into the hexadecimal format, enabling LLMs to comprehend and achieve state-of-the-art lossless gradient compression.
  • Our LLM-based prior model could unlock new applications for gradients similar to those in other modalities, such as super-resolution, denoising, generation, and more.

If you find the project interesting, don't forget to star and cite our work:

@article{wang2024language,
  title={Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models},
  author={Wang, Hui-Po and Fritz, Mario},
  journal={Advances in Neural Information Processing Systems},
  year={2024}
}

Getting Started

Prerequisites

  • torch ≥ 2.12.0
  • transformers ≥ 4.40.1
  • torchac
  • flash attention ≥ 2.5.8 via pip install flash-attn --no-build-isolation for NVIDIA GPUs

or

  • install via pip
    pip install -r requirements.txt

After setting up the HuggingFace access token here, ideally, the codebase will download language models automatically via HuggingFace except for LLAMA2. See More LLMs for more information.

Quickstart

We provide a quick demo here. Please refer to Usage for the detailed usage.

cd scripts
# compress gradients of a ConvNet trained on TinyImageNet using TinyLLAMA
bash pipeline.sh 

Usage

It takes three steps to reproduce the experiments in the paper, including (1) training neural networks to collect gradients, (2) serializing and tokenizing raw gradients, (3) running LLMs and arithmetic (LM-GC).

1. Gradient collection

This step trains a network (e.g., a ConvNet on TinyImageNet in the following example) and collects gradients for compression later. See scripts/run_collect.sh for more details.

DATASET='tinyimagenet' # cifar10 # mnist
ARCH="convnet" # vgg16 # resnet18 # vit
for i in 0 1 2
do
    python -u train_and_collect_grad.py -cfg settings/gradient_collection/$DATASET-$ARCH.yaml --tag $i --grad-interval 400 --download
done

2. Serialization and tokenization

For convenience, we process the data before conducting arithmetic encoding. The data is serialized and tokenized here. We create three preprocessed datasets here. See scripts/serialization.sh for more details.

NUM_SUBSAMPLE=10
DATASET='tinyimagenet' # cifar10 # mnist
ARCH="convnet" # vgg16 # resnet18 # vit
TYPE="grad"
COMPRESSOR="tinyllama" # llama2-7b # openllama3b
SEP="hex-none" # hex-space # hex-comma+space # iso # hex-semicolon
BPG=4 # 8
for i in 1 2 3
do
  python -u tokenize_dataset.py --cfg settings/compression/cifar10-$SEP.yaml \ 
    --data-path exps/$DATASET-$ARCH/0/grads/ --bytes-per-group $BPG \
    --compressor $COMPRESSOR --exhaustive-listing --num-subsample $NUM_SUBSAMPLE \
    --output-name $ARCH-$DATASET-$COMPRESSOR-$SEP-$NUM_SUBSAMPLE-$TYPE-$BPG-$i 
done

3. Run compression

The processed data from the previous step is now divided into several disjoint windows. By default, LLMs see a set of 2048 (including 1 BOS token) tokens every time. The experiments are repeated three times. See scripts/compress.sh for more details.

HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1
NUM_SUBSAMPLE=10
DATASET='tinyimagenet' # cifar10 # mnist
ARCH="convnet" # vgg16 # resnet18 # vit
TYPE="grad"
COMPRESSOR="tinyllama" # llama2-7b # openllama3b
SEP="hex-none" # hex-space # hex-comma+space # iso # hex-semicolon
BATCHSIZE=4 # depending on your GPUs
BPG=4 # 8
for i in 1 2 3
do
  python -u compress.py -cfg settings/compression/cifar10-$SEP.yaml --compressor $COMPRESSOR --dataset tokenized_dataset \
    --data-path ./tokenized_datasets/$ARCH-$DATASET-$COMPRESSOR-$SEP-$NUM_SUBSAMPLE-$TYPE-$BPG-$i.pkl --batch-size $BATCHSIZE
done

Options

More LLMs

More models to compress

Ablation study

  • Bytes per group
  • Context window size

TO-DO

  • prepare pipeline.sh
  • sanity check
  • how to add more LLMs
  • Provide a runnable encode/decode example
  • Baseline codec

License

Distributed under the MIT License. See MIT License for more information.

Acknowledgments

This project is partially built upon Deepmind's work, and the readme file template comes from makeread.me.

About

[NeurIPS 2024] Source code for "Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published