What's new
Added π
- Added
key_mapping
argument toolmo_core.distributed.checkpoint.load_model_and_optim_state()
for loading checkpoints with different key names. - Added
load_key_mapping
field to the trainer, same idea as the newkey_mapping
argument above. - Added an implementation of nGPT called
NormalizedTransformer
. - Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
- Added an API for scaling RoPE embeddings.
- Added a
ModelLadder
API.
Changed β οΈ
- The
w_out
andnorm
top-level children of theTransformer
model are now wrapped together in anlm_head
module. Training scripts will have backwards compatibility with older checkpoints due to theload_key_mapping
explained above.
Fixed β
- (Optimization) Mark model input sizes as dynamic for
torch.compile()
to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput. - Made HTTPS and GCS IO functions more robust.
- Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.
Commits
62d2c9e (chore) prepare for release v1.7.0
cb77039 mark model ladder as a beta feature
08c8073 Adapt conversion script to work with OLMo2 models (#116)
8e716b5 Add model ladder building blocks (#114)
1647f78 Add some more tests for nGPT (#113)
37e0e88 improve docs
d68d47a Make nn configs more flexible (#112)
0bcc840 RoPE scaling, document how to convert HuggingFace checkpoints (#111)
7655a3b Add template variable to ppl validation file manifest (#110)
ca44cf4 Implement nGPT (#108)
c47df7c make IO functions more robust (#109)
4f2c8ef Update README.md
57b38ad Mark model input as dynamically sized (#105)
776e235 remove duplicate script