v0.4.0
🚀 LLM Foundry v0.4.0
LLM Foundry is an efficient codebase for training, evaluating, and deploying Large Language Models (LLMs) and serves as the foundation for the MPT-7B and MPT-30B models.
In addition to the usual bug fixes and performance improvements, we've added lots of new features!
New Features
Automatic sequence packing (#683)
You can now specify packing_ratio: auto
under your finetuning dataset, to automatically profile and select a good packing ratio to efficiently pack your sequences together on the fly during finetuning. This can dramatically reduce the amount of compute wasted on padding tokens.
Flash Attention 2 (#651, #666, #672)
We now support using Flash Attention 2 both in MPT and in any model that supports Flash Attention 2 via the Transformers library. See the training instructions to learn how to use the different versions of Flash Attention.
New PyTorch, Composer, Streaming, and Transformers versions (#648, #672, #736)
As always, we've updated to new versions of the core dependencies of LLM Foundry, bringing better performance, new features, and support for new models (codellama and mistral in particular).
Easy Databricks model deployment (#618)
We've made it much easier to go from a training run to a served model using Databricks model serving. To make use of this feature, you need to specify both an MLFlowLogger
and a HuggingFaceCheckpointer
for your run.
The MLFlowLogger
should have a Unity Catalog model registry prefix in the form of catalog.schema
. This specifies where to register your models to. For example,
loggers:
mlflow:
experiment_name: /Users/[email protected]/my_experiment_name
tracking_uri: databricks
model_registry_prefix: catalog.schema
model_registry_uri: databricks-uc
The HuggingFaceCheckpointer
should specify the name you want to register the model under. For example,
callbacks:
hf_checkpointer:
save_interval: 1ep # Save Hugging Face formatted checkpoints each epoch
save_folder: s3://bucket/path/to/my/checkpoints
mlflow_registered_model_name: my_model_name # Final model will be registered to catalog.schema.my_model_name
MPT model configurations
We've added a few new options when training with the MPT architecture in LLM Foundry.
- Rotary embeddings (#675)
- (Un)Tied word embeddings (#728)
- Fine grained activation checkpointing (#720)
Evaluation Improvements
We've released v0.1 of our Eval Gauntlet (#674, #748)! This adds many new benchmarks, chain-of-thought, and a new safety category. Check out the README for full details!
In addition, we've made a few improvements to our evaluation options, with more to come!
- Allow specifying multiple evaluation datasets to compute cross entropy and perplexity on during training (#603)
- Easier versions of the HumanEval dataset, which can be useful for comparing smaller models (#645)
- More options for averaging the results of the Eval Gauntlet (#640)
New pretraining benchmarks (#543)
Added H100 profiling results to our benchmarking table.
Quality of life improvements
- Improved Generate callback with more logging options. Use the
Generate
callback to log generations from your model over the course of training. (#631) - Count number of tokens during training excluding padding tokens. Previously this count included padding tokens. (#676)
- Use the PyTorch profiler to profile your training runs. (#678)
- A convenience script for using the much faster Hugging Face
snapshot_download
to download models from the Hugging Face Hub. (#708) - New AWS specific Docker images with LLM Foundry dependencies pre-installed. (#731)
Experimental features
Inverse square root learning rate scheduler (#657)
We've added experimental support for the inverse square root learning rate scheduler.
Breaking changes
Updated Streaming defaults (#723)
We've upgraded to the latest Streaming version, including vastly improved default settings for partitioning and shuffling. This means that if you were using the defaults, you will get different results after upgrading. The new defaults should be more performant for the large majority of use cases. See the Streaming release notes for more details.
Removed support for PrefixLM for Bloom and OPT models (#704)
We occasionally remove unused experimental parts of the code base to focus on new features and better support for existing features, and we've removed support for PrefixLM applied to Bloom and OPT models in this release.
What's Changed
- Multi eval dataset logging by @snarayan21 in #603
- Merge release 0.3.0 back to main by @dakinggg in #635
- Add tmp path retention policy by @j316chuck in #641
- Add flag to disable train metrics by @mvpatel2000 in #642
- Update pins to latest version that were missed by @dakinggg in #646
- Fix overriding of rope_scaling config by @dakinggg in #644
- Add 2.1 images to docker workflow and tests by @dakinggg in #648
- Fixes to lion8b test for torch 2.1 by @dakinggg in #649
- Only log "changing autoresume" when actually changing by @aspfohl in #653
- Fix lion8b error correction with torch 2.1 by @dblalock in #656
- Clean up processes between distributed gpu tests by @j316chuck in #660
- Revert "Clean up processes between distributed gpu tests (#660)" by @j316chuck in #662
- Switch ordering of foundry gpu tests by @j316chuck in #665
- Change batch size on coding tasks to 1 to avoid OOM by @bmosaicml in #654
- Add images with flash attention 2 by @dakinggg in #651
- Fix yaml change by @dakinggg in #667
- Revert actions change by @dakinggg in #668
- Inverse Square Root LR Schedule by @mansheej in #657
- Add test suite for flash attention 2 by @dakinggg in #666
- Adding Simplified Coding Tasks by @mcarbin in #645
- Fix typo in image name by @dakinggg in #669
- Point to composer.callback.Generate by @aspfohl in #631
- Do not update past_key_values in place by @irenedea in #652
- Fix small typos in the eval readme by @maxisawesome in #671
- Convert to DataSpec and add token counts that include padding by @dakinggg in #676
- Add support for automatically registering models to UC at the end of training by @dakinggg in #618
- add
load_strict_model_weights
as an optional config parameter by @AllenHW in #655 - Small changes to HF repo update script by @dakinggg in #680
- Add profiler support in llm foundry by @j316chuck in #678
- Update_pretrain_benchmarks by @crinard in #543
- add |---| to render tables correctly by @crinard in #686
- Adding Mosaic logger + logging data validated event by @jjanezhang in #670
- Tiktoken wrapper add_eos_token option by @rajammanabrolu in #681
- Attempt to fix flaky test by @dakinggg in #688
- Allow flash attention 2 and upgrade to transformers 4.34.1 by @dakinggg in #672
- Fix mlflow model logging bug by @dakinggg in #692
- Add fixtures by @irenedea in #673
- Make default for cuda_load_lazy false by @irenedea in #694
- Update README.md by @j316chuck in #693
- Pad tiktoken vocab so that additional_special_tokens works by @dakinggg in #695
- Remove live logs to be consistent with Composer by @mvpatel2000 in #698
- Change gauntlet avging by @bmosaicml in #640
- Remove prefixlm support for OPT and Bloom by @dakinggg in #704
- Fix attention patch compatibility for llama2 by @irenedea in #705
- Add test coverage for lion and lion8b checkpoint interop by @dblalock in #679
- Improvement in README.md and TUTORIAL.md by @tmsagarofficial in #699
- Make TiktokenTokenizerWrapper picklable by @irenedea in #700
- Add num_proc to map and filter calls by @dakinggg in #706
- Fix HF local module copy contention with a meta init on local rank 0 by @dakinggg in #710
- Add support for auto packing ratio by @irenedea in #683
- Remove HumanEval tasks from ICL eval by @tbarton16 in #715
- Allow logging metadata by @dakinggg in #714
- Run HF dataset processing on local rank 0 first by @dakinggg in #716
- Add Hugging Face model download script by @jerrychen109 in #708
- Adding support for Rotary Position Embeddings by @ShashankMosaicML in #675
- Add databricks dependency by @irenedea in #717
- Set persistent_workers = False for packing profiling by @dakinggg in #718
- raise timeout for GPU tests by @mvpatel2000 in #719
- change default overwrite to True by @dakinggg in #724
- Attempt to fix a very occasional hang in datasets map/filter by @dakinggg in #725
- Add Unity Catalog support to HF checkpointer by @dakinggg in #721
- Combine filters into one, to avoid datasets error by @dakinggg in #729
- Fix logging verbosity in HF model download script and repair symlinks by @jerrychen109 in #727
- Gate the dist calls in build_tokenizer by @dakinggg in #732
- Create AWS docker image for fine tuning by @j316chuck in #731
- Make TiktokenTokenizerWrapper compatible with convert_composer_to_hf.py by @irenedea in #730
- Enable
tie_word_embeddings
config setting to enable / disable weight tied embeddings by @vchiley in #728 - add act checkpoint at sub layer level by @cli99 in #720
- Better defaults for StreamingDataset subclasses by @snarayan21 in #723
- Rename log message by @b-chu in #734
- Remove tokenizer_name field by @dakinggg in #735
- Fix pairwise attention comparison in test by @sashaDoubov in #737
- Fix passed metadata to mlflow logging by @wenfeiy-db in #713
- HF script explicitly casts precision by @mvpatel2000 in #741
- Bump to composer 0.17 by @dakinggg in #736
- Patch os cpu count to avoid extra multiprocessing inside pytest which sometimes hangs by @dakinggg in #745
- Reenable tests that were accidentally disabled by @dakinggg in #746
- Gauntlet v0.1 by @bmosaicml in #674
- Remove extra test suite by @dakinggg in #743
- Fix typo in workflow file by @dakinggg in #750
- Fix 1.13 tests by @dakinggg in #751
- Pin Chat format to TiktokenTokenizerWrapper by @rajammanabrolu in #752
- Catch exception raised in hf prep properly by @j316chuck in #749
- Gauntlet v0.1.0 yaml fixes by @bmosaicml in #748
- Fix flash attention GQA bug to use the dynamic size of the key/value tensors - used for eval/inference by @sashaDoubov in #756
New Contributors
- @mansheej made their first contribution in #657
- @mcarbin made their first contribution in #645
- @maxisawesome made their first contribution in #671
- @AllenHW made their first contribution in #655
- @crinard made their first contribution in #543
- @jjanezhang made their first contribution in #670
- @rajammanabrolu made their first contribution in #681
- @tmsagarofficial made their first contribution in #699
- @tbarton16 made their first contribution in #715
- @ShashankMosaicML made their first contribution in #675
- @cli99 made their first contribution in #720
- @b-chu made their first contribution in #734
- @wenfeiy-db made their first contribution in #713
Full Changelog: v0.3.0...v0.4.0