Releases: allenai/OLMo-core
v1.7.0
What's new
Added 🎉
- Added
key_mapping
argument toolmo_core.distributed.checkpoint.load_model_and_optim_state()
for loading checkpoints with different key names. - Added
load_key_mapping
field to the trainer, same idea as the newkey_mapping
argument above. - Added an implementation of nGPT called
NormalizedTransformer
. - Added an example showing how to convert a HuggingFace Llama 3.2 checkpoint into the right format for OLMo-core.
- Added an API for scaling RoPE embeddings.
- Added a
ModelLadder
API.
Changed ⚠️
- The
w_out
andnorm
top-level children of theTransformer
model are now wrapped together in anlm_head
module. Training scripts will have backwards compatibility with older checkpoints due to theload_key_mapping
explained above.
Fixed ✅
- (Optimization) Mark model input sizes as dynamic for
torch.compile()
to avoid recompile during evals or variable-sequence / batch size training. This doesn't seem to hurt throughput. - Made HTTPS and GCS IO functions more robust.
- Fixed a bug where we were always getting dolma2 tokenized validation data when generating config with DataMix.v3_small_ppl_validation.
Commits
62d2c9e (chore) prepare for release v1.7.0
cb77039 mark model ladder as a beta feature
08c8073 Adapt conversion script to work with OLMo2 models (#116)
8e716b5 Add model ladder building blocks (#114)
1647f78 Add some more tests for nGPT (#113)
37e0e88 improve docs
d68d47a Make nn configs more flexible (#112)
0bcc840 RoPE scaling, document how to convert HuggingFace checkpoints (#111)
7655a3b Add template variable to ppl validation file manifest (#110)
ca44cf4 Implement nGPT (#108)
c47df7c make IO functions more robust (#109)
4f2c8ef Update README.md
57b38ad Mark model input as dynamically sized (#105)
776e235 remove duplicate script
v1.6.3
What's new
Added 🎉
- Added
olmo_core.distributed.checkpoint.get_checkpoint_metadata()
function. - (BETA) Added flag to compile the optimizer step. So far only tested with AdamW. May not work with other optimizers.
Fixed ✅
- Old ephemeral checkpoints won't be removed until after the latest ephemeral checkpoint is saved successfully.
- Made GCS uploads more robust.
- Fixed single-node training on Google Augusta cluster.
numpy.random.dirichlet()
does not always sum to 1.0, so allow for a small tolerance in validating domain weights.
Commits
9c52bea (chore) prepare for release v1.6.3
ad5e9e5 Upgrade flash-attn to v2.7.0 (#104)
b9e9193 [beta] Enable compiling optimizer step (tested with AdamW) (#103)
fdbb76e Use allclose for comparing sum of small numbers (#102)
3284742 make GCS uploads more robust (#101)
63b3f43 Update isort requirement from <5.13,>=5.12 to >=5.12,<5.14 (#93)
dcbd988 update docs and theme version
6615ba9 Bump actions/download-artifact from 3 to 4 (#100)
2e2b35b Add function to get checkpoint metadata
c0e47cc clean up Dockerfile (#99)
6300bc7 replace printing table with logging table (#98)
e522886 Don't prematurely delete old ephemeral checkpoints (#97)
dea10fd Bump actions/upload-artifact from 3 to 4 (#90)
c2fe2db skip another test when creds missing
3ea9fa2 Bump softprops/action-gh-release from 1 to 2 (#87)
5a5c17f Bump actions/checkout from 3 to 4 (#91)
9c99b9c skip some tests when missing relevant credentials (#96)
53efa8c Bump actions/setup-python from 4 to 5 (#88)
d548d3b Bump actions/cache from 3 to 4 (#86)
ab80395 add depandabot config
v1.6.2
What's new
Added 🎉
- Added option to disable
GarbageCollectorCallback
, not that you'd want to do this usually, but I needed to run an experiment to show how important that callback is.
Fixed ✅
- Fixed a bug where some default callbacks could be added twice if given a different name by the user.
- Fixed a bug where some
Trainer
bookkeeping tasks may not complete before.fit()
returns.
Commits
2384472 (chore) prepare for release v1.6.2
f721fa1 Ensure all bookkeeping tasks complete (#85)
26a2c63 Some callback improvements (#84)
v1.6.1
What's new
Added 🎉
- Added
retries
field toBeakerLaunchConfig
. - Allow running on Augusta cluster with existing train scripts.
- Added
olmo_core.utils.logging_configured()
function to check if logging has been configured.
Fixed ✅
- Fixed a potential distributed deadlock bug when training without a separate CPU-only bookkeeping backend.
- Removed some unnecessary host-device syncs in
olmo_core.distributed.utils
. - Added
Trainer(Config).async_bookkeeping
field to toggle async bookkeeping.
Commits
cae88f5 (chore) prepare for release v1.6.1
83db5f7 Some fixes/improvements around synchronous bookkeeping operations (#83)
c435c94 increase timeout for CI checks
4a56200 update cluster list (#82)
e27ba74 Update throughput numbers, add logging_configured()
util function (#81)
bec0a3c Allow running on Augusta cluster (#80)
c7c3a5a Set env vars for Augusta cluster
b9351e2 Add retries
field to BeakerLaunchConfig
(#79)
v1.6.0
What's new
Added 🎉
- Added option to compile the trainer's loss function (
Trainer.compile_loss
). - Added
SourceMixtureDataset
for composing a training mixture based on ratios of source datasets. - Added
NumpyFSLDatasetMixture
for constructing aNumpyDatasetBase
from aSourceMixtureDataset
. Note this is only supported for FSL datasets. - Added tests for
SourceMixture*
andNumpyFSLDatasetMixture
. - Added
DownstreamEvaluatorCallbackConfig
class for running in-loop downstream eval via OLMo-in-loop-evals.
Changed ⚠️
- Moved some types into
olmo_core.data.types
to avoid some circular dependencies.
Fixed ✅
- Made GCS client more robust by automatically retrying timeout errors for most operations.
Commits
29e1276 (chore) prepare for release v1.6.0
da39e97 Add note about optional dependencies
81b1249 Missed _bust_index_cache in one spot (#78)
00d34f6 Add option to compile loss function, move logits FP32 casting into loss function (#77)
4928f82 Adds mixing loader for FSL datasets (#70)
ecb0686 Allow stopping the experiment on keyboard int
41400c4 Add Llama 8B config (#76)
282c120 Update Docker build (#75)
55d261e Make GCS client more robust (#74)
3fe59b6 Add a callback for downstream evals, update Docker builds (#73)
ecd523e include release chore commit in release notes
v1.5.0
What's new
Added 🎉
- Added Google Cloud support for
list_directory()
andclear_directory()
. - Added
CometCallback
for logging training runs to Comet.ml. - Added
DataMixBase
class, to allow extending to new data mix groups. - Added support for MoE-based models.
- Added method
DataLoaderBase.get_mock_batch()
. - Trainer now starts with a dry-run of a fake batch created by
DataLoaderBase.get_mock_batch()
. - Added
Callback.pre_backward()
,.pre_eval_batch()
, and.post_eval_batch()
methods. - Added
Trainer.model_forward()
,.get_losses()
, and.eval_batch()
methods. - Added a new
TransformerActivationCheckpointingMode
, "selected_ops" (requires torch 2.5 or newer).
Changed ⚠️
BeakerLaunchConfig.setup_steps
should now include steps to clone your repo (which it will by default). This change allows support for private repos.
Fixed ✅
prepare_cli_environment()
now callsadd_cached_path_clients()
.- Removed an unnecessary host-device sync.
Commits
984eb26 Update README.md
0f0d282 Update README.md
310866e Add FP8 numbers for the 13B
425f7db Add "selected_ops" transformer AC mode (#71)
d90292e Move transformer config components to its own submodule
4d3b231 Add support for MoE models with megablocks (#60)
6e32043 Add Google Cloud support for more io
functions (#69)
5af60ba Avoid an unnecessary host-device sync when created initial loss tensors (#68)
ad4c8bb Switch to comet callback in official train scripts
d90f5da Add comet API key to launch config
0c75ef6 Do a dry-run batch before starting training (#67)
71bc5c8 Add save_state_dict
function
9c25aed Update the Comet.ml callback (#66)
6e4ee4e Add BaseDataMix class (#65)
54a74c3 Add a Comet.ml trainer callback (#64)
9ba0e63 Update base image with newer torch and flash-attn versions (#63)
97172fc avoid omegaconf interpolation in setup steps
48892ee include clone commands in setup steps (#62)
v1.4.0
What's new
Changed ⚠️
- Updated default layer norm epsilon for OLMo models from
1e-5
to1e-6
to match latest model. - Renamed
FSLDataLoader
toNumpyFSLDataLoader
. - Renamed
VSLDataLoader
toNumpyVSLDataLoader
. - The trainer now takes a
data_loader: DataLoaderBase
instead of adataset: NumpyDatasetBase
.
Commits
55343dd fix loading training state dict
b921299 Allow unknown number of batches with data loaders
87f1e89 fix restarts for custom data loader
767c550 Add example of custom data loader
6237f7d Trainer now takes a data loader instead of a dataset (#59)
f6fc369 update default LN eps to match latest OLMo model (#58)
db522d1 allow loading via pickling
7d26589 make VSL curr config more flexible
v1.3.2
What's new
Added 🎉
- Added
Config.validate()
,Config.replace()
, andConfig.apply()
methods. - Trainer now records sequence length as a metric.
Fixed ✅
- Ensure additional cached-path clients are added in the process pool workers from some dataset preparation methods.
- Fixed
label_mask
tensor created byNumpyPaddedFSLDataset
. - Removed redundant warning messages about CUDA alloc retries.
- Fixed non-deterministic deadlock bug with async checkpointing.
Commits
a0a680d keep old beaker images instead of deleting them
25a71f4 Minor training improvements (#57)
f32d0bf Improve formatting of throughput table in readme (#56)
f7f3709 Add some more Config
methods
916ecf9 Fix label mask tensor created by NumpyPaddedFSLDataset
(#55)
b29838a Ensure additional scheme clients are added in worker procs
v1.3.1
v1.3.0
What's new
Added 🎉
- Added
torchao
to the Docker/Beaker images. - Added support for
torchao
float8
training via theFloat8HandlerCallback
. - Added
Callback.post_attach()
method.
Commits
8c03ca8 Add support for float8 training via torchao (#54)
2f253f8 Minor updates to precision settings for official configs, add torchao to Docker/Beaker image (#53)
7e3ddd4 increase batch size for 13B