Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSSpec support for TorchSnapshot? #102

Open
reyoung opened this issue Oct 17, 2022 · 2 comments · May be fixed by #114
Open

FSSpec support for TorchSnapshot? #102

reyoung opened this issue Oct 17, 2022 · 2 comments · May be fixed by #114

Comments

@reyoung
Copy link
Contributor

reyoung commented Oct 17, 2022

🚀 The feature

Use fsspec as TorchSnapshot's backend.

Motivation, pitch

FSSpec is the FileSystem abstraction standard of Python in fact. It supports many backends like s3, gcs, webdav and supports asyncio feature, transparent compression and so on.

FSSpec is also widely adopted by Torch-related communities, like huggingface/datasets uses fsspec as the storage backend.

Both Python users and Torch users are familiar with fsspec , it could be better the TorchSnapshot can adopt fsspec as the storage backend.

Alternatives

There are two major alternatives:

  1. Writing the storage backend abstraction in the TorchSnapshot repo, however
  • there are so many fs backends that may need to be implemented, and each implementation should contain unit tests. It is hard to write and test them all in TorchSnapshot.
  • it is hard to write and merge in-house storage to TorchSnapshot. For example, some companies may have customized distributed file systems for saving checkpoints. Should they check in their implementations to TorchSnapshot? fsspec's plugin can be an individual package, like oss for Alibaba cloud storage.
  1. Use PyFilesystem. I think fsspec is used by more projects and people. The fsspec have documentation about why PyFilesystem is bad. Quoted as

    It might have been conceivable to reuse code in pyfilesystems, which has an established interface and several implementations of its own. However, it supports none of the critical features for cloud and parallel access, and would not be easy to coerce. Following on the success of s3fs and gcsfs, and their use within Dask, it seemed best to have an interface as close to those as possible. See a discussion on the topic.

Additional context

No response

@yifuwang
Copy link
Contributor

Hi @reyoung, thanks for the detailed description and the thorough research! I think it's a completely reasonable feature request.

For context on why we didn't take a hard dependency on fsspec from the beginning:

  • The I/O operations needed by TorchSnapshot are simple and resemble typical put/get APIs provided by cloud storage APIs. The FS abstraction is a superset of what's needed and would be an unnecessary indirection.
  • Relying on fsspec for major cloud storage integration would give us less control over the experience we offer to our users due to the dependency on fsspec plugins.

That said, it makes perfect sense for TorchSnapshot to support fsspec if users wish to use it. My thinking is that we can use fsspec as a fallback of the first-party plugins:

  • webdav://... would dispatch to fsspec's webdav implementation
  • s3://... would dispatch to TorchSnapshot's first party plugin
  • fsspec-s3://... would dispatch to fsspec's s3 implementation

What do you think?

@reyoung
Copy link
Contributor Author

reyoung commented Oct 20, 2022

That said, it makes perfect sense for TorchSnapshot to support fsspec if users wish to use it

@yifuwang

That's cool!

It seems we can just append

elif protocol.startswith('fsspec-'):
  from torchsnapshot.storage_plugins.fsspec import FSSpecPlugin

after here.

But how to pass some parameters into the StoragePlugin? I found s3 use URL and local config file to read AWS tokens. But in actual production environment, we cannot relay on user's local config files.

Maybe we can add **kwargs to function url_to_storage_plugin_in_event_loop, and snapshot's take can take a parameter called storage_kwargs:Optional[Dict[str, Any]]=None to let user passing more args to StoragePlugin?

The changes like #108 .

@shicheng0829 shicheng0829 linked a pull request Oct 21, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants