v0.8.0
🚀 LLM Foundry v0.8.0
New Features
Megablocks support (#1102)
Support for training optimized MoE models at large scale.
Check out the megablocks documentation for more information on building state of the art MoE models.
Expanded Registries (#1080, #1093, #1094, #1095, #1096, #1165)
We've expanded support for registries to include, dataloaders, FFN layers, attention layers, norms, and parameter initialization functions.
Check out the README for detailed instructions and code examples!
Support for ShareGPT chat format (#1098)
We now support the ShareGPT format for finetuning.
Breaking Changes and Deprecations
We have updated the minimum supported PyTorch version to torch 2.3 (#1152).
In Context Learning Code Evaluation (#1181)
We've removed the code_evaluation
task from the allowed in context learning task types, and we've deleted the InContextLearningCodeEvaluationDataset
and InContextLearningCodeEvalAccuracy
classes.
Question-Answering
We've removed the question_answering
task type. Please use the generation_task_with_answers
task instead.
What's Changed
- Update README by @hanlint in #1069
- Expose more exception attributes by @jjanezhang in #1071
- Output eval logging batch by @maxisawesome in #961
- Add expandeable segments flag by @dakinggg in #1075
- Check the user provided eos / bos token id against the tokenizer eos / bos token id by @ShashankMosaicML in #1039
- Triton RMSNorm by @josejg in #1050
- Fix tiktoken vocab size by @dakinggg in #1081
- Doing the loss reduction in foundry instead of in the loss functions. by @ShashankMosaicML in #1079
- Decrease log verbosity with no bias by @mvpatel2000 in #1082
- Upgrade hf chat by @j316chuck in #1061
- Fixes for streaming and auto packing by @dakinggg in #1083
- Background mlflow model registration by @irenedea in #1078
- Update README.md to include DBRX blog under "Latest News" by @lupesko in #1085
- Decrease transformers file size for mlflow by @dakinggg in #1087
- log packing ratio progress by @milocress in #1070
- Bump HF version by @b-chu in #1091
- Fix typo in expandable_segments by @mammothb in #1088
- Bump transformers to 4.39.3 by @dakinggg in #1086
- Fix yaml typo by @dakinggg in #1092
- Fix for overriding nested configs by @dakinggg in #1089
- cleaned up HF/MPT conversion test by @milocress in #1048
- Update yamls for 0.7.0 by @dakinggg in #1097
- Norms registry by @dakinggg in #1080
- fixing evaluator microbatch size by @ShashankMosaicML in #1100
- Updating the streaming version in setup.py by @ShashankMosaicML in #1103
- MegaBlocks release by @mvpatel2000 in #1102
- Remove torch compile from GLU by @josejg in #1101
- Update config_moe_args.py by @vchiley in #1104
- Add remote code option to allow execution of DBRX tokenizer by @b-chu in #1106
- Fix overwriting FP8 act ckpt flag in the train script by @cli99 in #1107
- Support ShareGPT chat format by @samhavens in #1098
- FC layer registry by @dakinggg in #1093
- Attention layer registry by @dakinggg in #1094
- Dbrx finetune yaml requires save folder specified to enable autoresume by @mvpatel2000 in #1108
- Revert "Update config_moe_args.py" by @vchiley in #1111
- rm new_group todo by @vchiley in #1112
- Migrate ICL classes to foundry by @bmosaicml in #936
- FFN layer registry by @dakinggg in #1095
- Param init registry by @dakinggg in #1096
- Add missing init file by @dakinggg in #1113
- Update tests to not rely on mistral by @dakinggg in #1117
- Bump transformers to 4.40 by @dakinggg in #1118
- add
.json
to SUPPORTED_EXTENSIONS by @eitanturok in #1114 - Add option for subclasses to convert model and tokenizer in hf checkpointer by @dakinggg in #1121
- Bump Composer to 0.21.3 by @b-chu in #1122
- catch misconfigured hf dataset by @milocress in #1123
- Pin mlflow by @dakinggg in #1124
- Change main to a dev version by @dakinggg in #1126
- Fix deprecation versions by @dakinggg in #1129
- Clean up the publicly exported API by @dakinggg in #1128
- Fix HF checkpointer + mlflow bugs by @dakinggg in #1125
- Update JSONL sources in eval README by @emmanuel-ferdman in #1110
- Mlflow datasets by @KuuCi in #1119
- Strict key checking for dataset by @b-chu in #1131
- First initialize dist with gloo by @dakinggg in #1133
- Fix saving of generation_config for Llama-3 by @eldarkurtic in #1134
- Bump datasets version by @dakinggg in #1138
- Revert "First initialize dist with gloo (#1133)" by @dakinggg in #1139
- Barrier immediately after initialize dist with logs by @dakinggg in #1140
- Add new FT instructions by @b-chu in #1143
- Upgrade ci-testing by @mvpatel2000 in #1145
- Fix typos in callbacks with configs by @dakinggg in #1146
- Remove olmo as a dependency by @snarayan21 in #1148
- build inner model by @milocress in #1147
- fix DatasetConstants.splints default value to protect dictionary overwriting by @ivan-kud in #1144
- Bump flash attention version by @dakinggg in #1150
- Torch 2.3 part 1 - build the images by @dakinggg in #1149
- Torch 2.3 upgrade Part 2 - CI by @dakinggg in #1151
- Comment out 2.3 tests by @dakinggg in #1155
- Fix yaml lint by @dakinggg in #1156
- Move sentencepiece import by @aspfohl in #1157
- Bump composer version to 0.22.0 by @snarayan21 in #1160
- Uncomment GPU tests by @milocress in #1162
- Depend on coverage by @milocress in #1163
- fix dep group in torch 2.3 ci by @dakinggg in #1164
- Bump min torch version to 2.3.0 by @dakinggg in #1152
- Add line splitting and other linting by @b-chu in #1161
- refactoring dataloader into registries. by @ShashankMosaicML in #1165
- Migrate eval output logging to foundry by @maxisawesome in #1166
- Fix import and mocking by @dakinggg in #1169
- minor fix to
llmfoundry.data.utils.get_text_collator
by @ShashankMosaicML in #1170 - Fix config access for DBRX by @dakinggg in #1177
New Contributors
- @lupesko made their first contribution in #1085
- @mammothb made their first contribution in #1088
- @eitanturok made their first contribution in #1114
- @emmanuel-ferdman made their first contribution in #1110
- @ivan-kud made their first contribution in #1144
Full Changelog: v0.7.0...v0.8.0