Skip to content

Releases: allenai/OLMo-core

v1.7.0

27 Nov 22:31
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added key_mapping argument to olmo_core.distributed.checkpoint.load_model_and_optim_state()
    for loading checkpoints with different key names.
  • Added load_key_mapping field to the trainer, same idea as the new key_mapping argument above.
  • Added an implementation of nGPT called NormalizedTransformer.
  • Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
  • Added an API for scaling RoPE embeddings.
  • Added a ModelLadder API.

Changed ⚠️

  • The w_out and norm top-level children of the Transformer model are now wrapped together in an lm_head module. Training scripts will have backwards compatibility with older checkpoints due to the load_key_mapping explained above.

Fixed ✅

  • (Optimization) Mark model input sizes as dynamic for torch.compile() to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput.
  • Made HTTPS and GCS IO functions more robust.
  • Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.

Commits

62d2c9e (chore) prepare for release v1.7.0
cb77039 mark model ladder as a beta feature
08c8073 Adapt conversion script to work with OLMo2 models (#116)
8e716b5 Add model ladder building blocks (#114)
1647f78 Add some more tests for nGPT (#113)
37e0e88 improve docs
d68d47a Make nn configs more flexible (#112)
0bcc840 RoPE scaling, document how to convert HuggingFace checkpoints (#111)
7655a3b Add template variable to ppl validation file manifest (#110)
ca44cf4 Implement nGPT (#108)
c47df7c make IO functions more robust (#109)
4f2c8ef Update README.md
57b38ad Mark model input as dynamically sized (#105)
776e235 remove duplicate script

v1.6.3

15 Nov 19:14
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added olmo_core.distributed.checkpoint.get_checkpoint_metadata() function.
  • (BETA) Added flag to compile the optimizer step. So far only tested with AdamW. May not work with other optimizers.

Fixed ✅

  • Old ephemeral checkpoints won't be removed until after the latest ephemeral checkpoint is saved successfully.
  • Made GCS uploads more robust.
  • Fixed single-node training on Google Augusta cluster.
  • numpy.random.dirichlet() does not always sum to 1.0, so allow for a small tolerance in validating domain weights.

Commits

9c52bea (chore) prepare for release v1.6.3
ad5e9e5 Upgrade flash-attn to v2.7.0 (#104)
b9e9193 [beta] Enable compiling optimizer step (tested with AdamW) (#103)
fdbb76e Use allclose for comparing sum of small numbers (#102)
3284742 make GCS uploads more robust (#101)
63b3f43 Update isort requirement from <5.13,>=5.12 to >=5.12,<5.14 (#93)
dcbd988 update docs and theme version
6615ba9 Bump actions/download-artifact from 3 to 4 (#100)
2e2b35b Add function to get checkpoint metadata
c0e47cc clean up Dockerfile (#99)
6300bc7 replace printing table with logging table (#98)
e522886 Don't prematurely delete old ephemeral checkpoints (#97)
dea10fd Bump actions/upload-artifact from 3 to 4 (#90)
c2fe2db skip another test when creds missing
3ea9fa2 Bump softprops/action-gh-release from 1 to 2 (#87)
5a5c17f Bump actions/checkout from 3 to 4 (#91)
9c99b9c skip some tests when missing relevant credentials (#96)
53efa8c Bump actions/setup-python from 4 to 5 (#88)
d548d3b Bump actions/cache from 3 to 4 (#86)
ab80395 add depandabot config

v1.6.2

08 Nov 18:29
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added option to disable GarbageCollectorCallback, not that you'd want to do this usually, but I needed to run an experiment to show how important that callback is.

Fixed ✅

  • Fixed a bug where some default callbacks could be added twice if given a different name by the user.
  • Fixed a bug where some Trainer bookkeeping tasks may not complete before .fit() returns.

Commits

2384472 (chore) prepare for release v1.6.2
f721fa1 Ensure all bookkeeping tasks complete (#85)
26a2c63 Some callback improvements (#84)

v1.6.1

07 Nov 00:28
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added retries field to BeakerLaunchConfig.
  • Allow running on Augusta cluster with existing train scripts.
  • Added olmo_core.utils.logging_configured() function to check if logging has been configured.

Fixed ✅

  • Fixed a potential distributed deadlock bug when training without a separate CPU-only bookkeeping backend.
  • Removed some unnecessary host-device syncs in olmo_core.distributed.utils.
  • Added Trainer(Config).async_bookkeeping field to toggle async bookkeeping.

Commits

cae88f5 (chore) prepare for release v1.6.1
83db5f7 Some fixes/improvements around synchronous bookkeeping operations (#83)
c435c94 increase timeout for CI checks
4a56200 update cluster list (#82)
e27ba74 Update throughput numbers, add logging_configured() util function (#81)
bec0a3c Allow running on Augusta cluster (#80)
c7c3a5a Set env vars for Augusta cluster
b9351e2 Add retries field to BeakerLaunchConfig (#79)

v1.6.0

01 Nov 21:51
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added option to compile the trainer's loss function (Trainer.compile_loss).
  • Added SourceMixtureDataset for composing a training mixture based on ratios of source datasets.
  • Added NumpyFSLDatasetMixture for constructing a NumpyDatasetBase from a SourceMixtureDataset. Note this is only supported for FSL datasets.
  • Added tests for SourceMixture* and NumpyFSLDatasetMixture.
  • Added DownstreamEvaluatorCallbackConfig class for running in-loop downstream eval via OLMo-in-loop-evals.

Changed ⚠️

  • Moved some types into olmo_core.data.types to avoid some circular dependencies.

Fixed ✅

  • Made GCS client more robust by automatically retrying timeout errors for most operations.

Commits

29e1276 (chore) prepare for release v1.6.0
da39e97 Add note about optional dependencies
81b1249 Missed _bust_index_cache in one spot (#78)
00d34f6 Add option to compile loss function, move logits FP32 casting into loss function (#77)
4928f82 Adds mixing loader for FSL datasets (#70)
ecb0686 Allow stopping the experiment on keyboard int
41400c4 Add Llama 8B config (#76)
282c120 Update Docker build (#75)
55d261e Make GCS client more robust (#74)
3fe59b6 Add a callback for downstream evals, update Docker builds (#73)
ecd523e include release chore commit in release notes

v1.5.0

23 Oct 17:36
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added Google Cloud support for list_directory() and clear_directory().
  • Added CometCallback for logging training runs to Comet.ml.
  • Added DataMixBase class, to allow extending to new data mix groups.
  • Added support for MoE-based models.
  • Added method DataLoaderBase.get_mock_batch().
  • Trainer now starts with a dry-run of a fake batch created by DataLoaderBase.get_mock_batch().
  • Added Callback.pre_backward(), .pre_eval_batch(), and .post_eval_batch() methods.
  • Added Trainer.model_forward(), .get_losses(), and .eval_batch() methods.
  • Added a new TransformerActivationCheckpointingMode, "selected_ops" (requires torch 2.5 or newer).

Changed ⚠️

  • BeakerLaunchConfig.setup_steps should now include steps to clone your repo (which it will by default). This change allows support for private repos.

Fixed ✅

  • prepare_cli_environment() now calls add_cached_path_clients().
  • Removed an unnecessary host-device sync.

Commits

984eb26 Update README.md
0f0d282 Update README.md
310866e Add FP8 numbers for the 13B
425f7db Add "selected_ops" transformer AC mode (#71)
d90292e Move transformer config components to its own submodule
4d3b231 Add support for MoE models with megablocks (#60)
6e32043 Add Google Cloud support for more io functions (#69)
5af60ba Avoid an unnecessary host-device sync when created initial loss tensors (#68)
ad4c8bb Switch to comet callback in official train scripts
d90f5da Add comet API key to launch config
0c75ef6 Do a dry-run batch before starting training (#67)
71bc5c8 Add save_state_dict function
9c25aed Update the Comet.ml callback (#66)
6e4ee4e Add BaseDataMix class (#65)
54a74c3 Add a Comet.ml trainer callback (#64)
9ba0e63 Update base image with newer torch and flash-attn versions (#63)
97172fc avoid omegaconf interpolation in setup steps
48892ee include clone commands in setup steps (#62)

v1.4.0

02 Oct 18:22
Compare
Choose a tag to compare

What's new

Changed ⚠️

  • Updated default layer norm epsilon for OLMo models from 1e-5 to 1e-6 to match latest model.
  • Renamed FSLDataLoader to NumpyFSLDataLoader.
  • Renamed VSLDataLoader to NumpyVSLDataLoader.
  • The trainer now takes a data_loader: DataLoaderBase instead of a dataset: NumpyDatasetBase.

Commits

55343dd fix loading training state dict
b921299 Allow unknown number of batches with data loaders
87f1e89 fix restarts for custom data loader
767c550 Add example of custom data loader
6237f7d Trainer now takes a data loader instead of a dataset (#59)
f6fc369 update default LN eps to match latest OLMo model (#58)
db522d1 allow loading via pickling
7d26589 make VSL curr config more flexible

v1.3.2

27 Sep 22:19
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added Config.validate(), Config.replace(), and Config.apply() methods.
  • Trainer now records sequence length as a metric.

Fixed ✅

  • Ensure additional cached-path clients are added in the process pool workers from some dataset preparation methods.
  • Fixed label_mask tensor created by NumpyPaddedFSLDataset.
  • Removed redundant warning messages about CUDA alloc retries.
  • Fixed non-deterministic deadlock bug with async checkpointing.

Commits

a0a680d keep old beaker images instead of deleting them
25a71f4 Minor training improvements (#57)
f32d0bf Improve formatting of throughput table in readme (#56)
f7f3709 Add some more Config methods
916ecf9 Fix label mask tensor created by NumpyPaddedFSLDataset (#55)
b29838a Ensure additional scheme clients are added in worker procs

v1.3.1

26 Sep 18:33
Compare
Choose a tag to compare

What's new

Fixed ✅

  • Fixed the name given to evaluator metrics logged.

Commits

40057ff fix eval metric name

v1.3.0

26 Sep 16:05
Compare
Choose a tag to compare

What's new

Added 🎉

  • Added torchao to the Docker/Beaker images.
  • Added support for torchao float8 training via the Float8HandlerCallback.
  • Added Callback.post_attach() method.

Commits

8c03ca8 Add support for float8 training via torchao (#54)
2f253f8 Minor updates to precision settings for official configs, add torchao to Docker/Beaker image (#53)
7e3ddd4 increase batch size for 13B