Release v1.7.0 · allenai/OLMo-core

What's new

Added 🎉

Added key_mapping argument to olmo_core.distributed.checkpoint.load_model_and_optim_state()
for loading checkpoints with different key names.
Added load_key_mapping field to the trainer, same idea as the new key_mapping argument above.
Added an implementation of nGPT called NormalizedTransformer.
Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
Added an API for scaling RoPE embeddings.
Added a ModelLadder API.

Changed ⚠️

The w_out and norm top-level children of the Transformer model are now wrapped together in an lm_head module. Training scripts will have backwards compatibility with older checkpoints due to the load_key_mapping explained above.

Fixed ✅

(Optimization) Mark model input sizes as dynamic for torch.compile() to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput.
Made HTTPS and GCS IO functions more robust.
Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.

Commits

62d2c9e (chore) prepare for release v1.7.0
cb77039 mark model ladder as a beta feature
08c8073 Adapt conversion script to work with OLMo2 models (#116)
8e716b5 Add model ladder building blocks (#114)
1647f78 Add some more tests for nGPT (#113)
37e0e88 improve docs
d68d47a Make nn configs more flexible (#112)
0bcc840 RoPE scaling, document how to convert HuggingFace checkpoints (#111)
7655a3b Add template variable to ppl validation file manifest (#110)
ca44cf4 Implement nGPT (#108)
c47df7c make IO functions more robust (#109)
4f2c8ef Update README.md
57b38ad Mark model input as dynamically sized (#105)
776e235 remove duplicate script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.7.0

What's new

Added 🎉

Changed ⚠️

Fixed ✅

Commits