Is it possible to store checkpoints in an external storage such as S3? #359

hr0nix · 2023-06-18T12:02:25Z

I wasn't able to find the answers to my questions in the docs, so I'll just ask here:

What storage types other than local filesystem are supported with orbax? For instance, can I use S3?
Is it possible to add my own storage type somehow?

Thanks!

cpgaffney1 · 2023-06-20T15:50:31Z

We support a Google-internal distributed file system as well as Google Cloud storage. No idea if any issues would be encountered with S3, but you could give it a try.

Depending on what issues you encounter, if any, implementing your own TypeHandlers and AggregateHandler would probably be the best approach to customize serialization / deserialization logic if you need to. See here: https://orbax.readthedocs.io/en/latest/api_reference/checkpoint.html. Once implemented, you just register the handlers to start using them.

hr0nix · 2023-06-28T17:00:33Z

One way to support a large number of various filesystems would be to use fsspec for reading/writing weight files. Is that something the orbax/jax team might consider?

andylolu2 · 2024-09-23T22:16:14Z

A temporary workaround is to save to a temp directory and copy the saved content to the remote file system, though this wouldn't work so easily with the checkpoint manager (e.g., only save the last n checkpoints)

cpgaffney1 · 2024-09-25T09:37:44Z

There's a recent change to offer better support for this problem. Previously S3 would not work correctly because atomic rename was not supported, but alternative atomicity logic can be configured using checkpoint/orbax/checkpoint/path/atomicity.py.

selamw1 added type:feature New feature or request checkpoint labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to store checkpoints in an external storage such as S3? #359

Is it possible to store checkpoints in an external storage such as S3? #359

hr0nix commented Jun 18, 2023

cpgaffney1 commented Jun 20, 2023

hr0nix commented Jun 28, 2023

andylolu2 commented Sep 23, 2024

cpgaffney1 commented Sep 25, 2024

Is it possible to store checkpoints in an external storage such as S3? #359

Is it possible to store checkpoints in an external storage such as S3? #359

Comments

hr0nix commented Jun 18, 2023

cpgaffney1 commented Jun 20, 2023

hr0nix commented Jun 28, 2023

andylolu2 commented Sep 23, 2024

cpgaffney1 commented Sep 25, 2024