🚀 LLM Foundry v0.8.0

New Features

Megablocks support (#1102)

Support for training optimized MoE models at large scale.

Check out the megablocks documentation for more information on building state of the art MoE models.

Expanded Registries (#1080, #1093, #1094, #1095, #1096, #1165)

We've expanded support for registries to include, dataloaders, FFN layers, attention layers, norms, and parameter initialization functions.

Check out the README for detailed instructions and code examples!

Support for ShareGPT chat format (#1098)

We now support the ShareGPT format for finetuning.

Breaking Changes and Deprecations

We have updated the minimum supported PyTorch version to torch 2.3 (#1152).

In Context Learning Code Evaluation (#1181)

We've removed the code_evaluation task from the allowed in context learning task types, and we've deleted the InContextLearningCodeEvaluationDataset and InContextLearningCodeEvalAccuracy classes.

Question-Answering

We've removed the question_answering task type. Please use the generation_task_with_answers task instead.

What's Changed

Update README by @hanlint in #1069
Expose more exception attributes by @jjanezhang in #1071
Output eval logging batch by @maxisawesome in #961
Add expandeable segments flag by @dakinggg in #1075
Check the user provided eos / bos token id against the tokenizer eos / bos token id by @ShashankMosaicML in #1039
Triton RMSNorm by @josejg in #1050
Fix tiktoken vocab size by @dakinggg in #1081
Doing the loss reduction in foundry instead of in the loss functions. by @ShashankMosaicML in #1079
Decrease log verbosity with no bias by @mvpatel2000 in #1082
Upgrade hf chat by @j316chuck in #1061
Fixes for streaming and auto packing by @dakinggg in #1083
Background mlflow model registration by @irenedea in #1078
Update README.md to include DBRX blog under "Latest News" by @lupesko in #1085
Decrease transformers file size for mlflow by @dakinggg in #1087
log packing ratio progress by @milocress in #1070
Bump HF version by @b-chu in #1091
Fix typo in expandable_segments by @mammothb in #1088
Bump transformers to 4.39.3 by @dakinggg in #1086
Fix yaml typo by @dakinggg in #1092
Fix for overriding nested configs by @dakinggg in #1089
cleaned up HF/MPT conversion test by @milocress in #1048
Update yamls for 0.7.0 by @dakinggg in #1097
Norms registry by @dakinggg in #1080
fixing evaluator microbatch size by @ShashankMosaicML in #1100
Updating the streaming version in setup.py by @ShashankMosaicML in #1103
MegaBlocks release by @mvpatel2000 in #1102
Remove torch compile from GLU by @josejg in #1101
Update config_moe_args.py by @vchiley in #1104
Add remote code option to allow execution of DBRX tokenizer by @b-chu in #1106
Fix overwriting FP8 act ckpt flag in the train script by @cli99 in #1107
Support ShareGPT chat format by @samhavens in #1098
FC layer registry by @dakinggg in #1093
Attention layer registry by @dakinggg in #1094
Dbrx finetune yaml requires save folder specified to enable autoresume by @mvpatel2000 in #1108
Revert "Update config_moe_args.py" by @vchiley in #1111
rm new_group todo by @vchiley in #1112
Migrate ICL classes to foundry by @bmosaicml in #936
FFN layer registry by @dakinggg in #1095
Param init registry by @dakinggg in #1096
Add missing init file by @dakinggg in #1113
Update tests to not rely on mistral by @dakinggg in #1117
Bump transformers to 4.40 by @dakinggg in #1118
add .json to SUPPORTED_EXTENSIONS by @eitanturok in #1114
Add option for subclasses to convert model and tokenizer in hf checkpointer by @dakinggg in #1121
Bump Composer to 0.21.3 by @b-chu in #1122
catch misconfigured hf dataset by @milocress in #1123
Pin mlflow by @dakinggg in #1124
Change main to a dev version by @dakinggg in #1126
Fix deprecation versions by @dakinggg in #1129
Clean up the publicly exported API by @dakinggg in #1128
Fix HF checkpointer + mlflow bugs by @dakinggg in #1125
Update JSONL sources in eval README by @emmanuel-ferdman in #1110
Mlflow datasets by @KuuCi in #1119
Strict key checking for dataset by @b-chu in #1131
First initialize dist with gloo by @dakinggg in #1133
Fix saving of generation_config for Llama-3 by @eldarkurtic in #1134
Bump datasets version by @dakinggg in #1138
Revert "First initialize dist with gloo (#1133)" by @dakinggg in #1139
Barrier immediately after initialize dist with logs by @dakinggg in #1140
Add new FT instructions by @b-chu in #1143
Upgrade ci-testing by @mvpatel2000 in #1145
Fix typos in callbacks with configs by @dakinggg in #1146
Remove olmo as a dependency by @snarayan21 in #1148
build inner model by @milocress in #1147
fix DatasetConstants.splints default value to protect dictionary overwriting by @ivan-kud in #1144
Bump flash attention version by @dakinggg in #1150
Torch 2.3 part 1 - build the images by @dakinggg in #1149
Torch 2.3 upgrade Part 2 - CI by @dakinggg in #1151
Comment out 2.3 tests by @dakinggg in #1155
Fix yaml lint by @dakinggg in #1156
Move sentencepiece import by @aspfohl in #1157
Bump composer version to 0.22.0 by @snarayan21 in #1160
Uncomment GPU tests by @milocress in #1162
Depend on coverage by @milocress in #1163
fix dep group in torch 2.3 ci by @dakinggg in #1164
Bump min torch version to 2.3.0 by @dakinggg in #1152
Add line splitting and other linting by @b-chu in #1161
refactoring dataloader into registries. by @ShashankMosaicML in #1165
Migrate eval output logging to foundry by @maxisawesome in #1166
Fix import and mocking by @dakinggg in #1169
minor fix to llmfoundry.data.utils.get_text_collator by @ShashankMosaicML in #1170
Fix config access for DBRX by @dakinggg in #1177

New Contributors

@lupesko made their first contribution in #1085
@mammothb made their first contribution in #1088
@eitanturok made their first contribution in #1114
@emmanuel-ferdman made their first contribution in #1110
@ivan-kud made their first contribution in #1144

Full Changelog: v0.7.0...v0.8.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.8.0