The official implementation of "Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models" published at NeurIPS 2024.
[Preprint]
This project provides the source code of LM-GC, the first LLM-powered gradient compressor.
Here are takeaways:
- We demonstrate that large language models (LLMs) hold significant potential as prior models for gradients, a concept that has been widely applied in other modalities but gradients.
- We introduce a novel serialization method that converts IEEE 754 floating points into the hexadecimal format, enabling LLMs to comprehend and achieve state-of-the-art lossless gradient compression.
- Our LLM-based prior model could unlock new applications for gradients similar to those in other modalities, such as super-resolution, denoising, generation, and more.
If you find the project interesting, don't forget to star and cite our work:
@article{wang2024language,
title={Language Models as Zero-shot Lossless Gradient Compressors: Towards General Neural Parameter Prior Models},
author={Wang, Hui-Po and Fritz, Mario},
journal={Advances in Neural Information Processing Systems},
year={2024}
}
- torch ≥ 2.12.0
- transformers ≥ 4.40.1
- torchac
- flash attention ≥ 2.5.8 via
pip install flash-attn --no-build-isolation
for NVIDIA GPUs
or
- install via
pip
pip install -r requirements.txt
After setting up the HuggingFace access token here, ideally, the codebase will download language models automatically via HuggingFace except for LLAMA2. See More LLMs for more information.
We provide a quick demo here. Please refer to Usage for the detailed usage.
cd scripts
# compress gradients of a ConvNet trained on TinyImageNet using TinyLLAMA
bash pipeline.sh
It takes three steps to reproduce the experiments in the paper, including (1) training neural networks to collect gradients, (2) serializing and tokenizing raw gradients, (3) running LLMs and arithmetic (LM-GC).
This step trains a network (e.g., a ConvNet on TinyImageNet in the following example) and collects gradients for compression later. See scripts/run_collect.sh
for more details.
DATASET='tinyimagenet' # cifar10 # mnist
ARCH="convnet" # vgg16 # resnet18 # vit
for i in 0 1 2
do
python -u train_and_collect_grad.py -cfg settings/gradient_collection/$DATASET-$ARCH.yaml --tag $i --grad-interval 400 --download
done
For convenience, we process the data before conducting arithmetic encoding. The data is serialized and tokenized here. We create three preprocessed datasets here. See scripts/serialization.sh
for more details.
NUM_SUBSAMPLE=10
DATASET='tinyimagenet' # cifar10 # mnist
ARCH="convnet" # vgg16 # resnet18 # vit
TYPE="grad"
COMPRESSOR="tinyllama" # llama2-7b # openllama3b
SEP="hex-none" # hex-space # hex-comma+space # iso # hex-semicolon
BPG=4 # 8
for i in 1 2 3
do
python -u tokenize_dataset.py --cfg settings/compression/cifar10-$SEP.yaml \
--data-path exps/$DATASET-$ARCH/0/grads/ --bytes-per-group $BPG \
--compressor $COMPRESSOR --exhaustive-listing --num-subsample $NUM_SUBSAMPLE \
--output-name $ARCH-$DATASET-$COMPRESSOR-$SEP-$NUM_SUBSAMPLE-$TYPE-$BPG-$i
done
The processed data from the previous step is now divided into several disjoint windows. By default, LLMs see a set of 2048 (including 1 BOS token) tokens every time. The experiments are repeated three times. See scripts/compress.sh
for more details.
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1
NUM_SUBSAMPLE=10
DATASET='tinyimagenet' # cifar10 # mnist
ARCH="convnet" # vgg16 # resnet18 # vit
TYPE="grad"
COMPRESSOR="tinyllama" # llama2-7b # openllama3b
SEP="hex-none" # hex-space # hex-comma+space # iso # hex-semicolon
BATCHSIZE=4 # depending on your GPUs
BPG=4 # 8
for i in 1 2 3
do
python -u compress.py -cfg settings/compression/cifar10-$SEP.yaml --compressor $COMPRESSOR --dataset tokenized_dataset \
--data-path ./tokenized_datasets/$ARCH-$DATASET-$COMPRESSOR-$SEP-$NUM_SUBSAMPLE-$TYPE-$BPG-$i.pkl --batch-size $BATCHSIZE
done
- Bytes per group
- Context window size
- prepare
pipeline.sh
- sanity check
- how to add more LLMs
- Provide a runnable encode/decode example
- Baseline codec
Distributed under the MIT License. See MIT License for more information.
This project is partially built upon Deepmind's work, and the readme file template comes from makeread.me.