-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to RemoteStore for fsspec URIs #2198
Default to RemoteStore for fsspec URIs #2198
Conversation
src/zarr/store/common.py
Outdated
elif isinstance(store_like, str): | ||
return StorePath(await LocalStore.open(root=Path(store_like), mode=mode or "r")) | ||
try: | ||
fs, path = fsspec.url_to_fs(store_like, **storage_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: this constructs the actual filesystem class instance, but we have to go back to a URL (and reconstruct the filesystem instance) for the RemoteStore
interface. It'd be nice to avoid that.
src/zarr/store/common.py
Outdated
try: | ||
fs, path = fsspec.url_to_fs(store_like, **storage_options) | ||
except Exception: | ||
# not sure what to do here, but I don't want this to fail... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bad, but in my experience fsspec.url_to_fs
can run pretty much arbitrary code, depending on what packages you have installed and what URI you pass. I really don't want this to fail and cause issues for people who just want a local path. maybe we'd always get an fsspec LocalFileSystem for a plain, non-fsspec URI string, but I'm not sure.
src/zarr/store/common.py
Outdated
else: | ||
raise TypeError | ||
|
||
if path is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I forgot to revert this. It's not used yet.
We call make_store_path
multiple times in a single call by the user (e.g. save_array
calls make_store_path
and passes the result to AsyncArray.create
which also calls make_store_path
). I was thinking about consolidating all the store / path stuff in a single spot, but it ended up being a bit much and I wasn't sure what the tolerance was for expanding the API of methods like AsyncArray.create
even more. Do we want users to do
AsyncArray.create(store=StorePath(RemoteStore(..., **storage_options), path))
or
AsyncArray.create(store="s3://bucket/path.zarr", storage_options=...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have much opinion on the actual change, but is it worth adding warnings everywhere if storage_options
is passed by the user (ie not None), but they haven't passed an fsspec URL so they aren't used?
elif isinstance(store_like, Path): | ||
return StorePath(await LocalStore.open(root=store_like, mode=mode or "r")) | ||
result = StorePath(await LocalStore.open(root=store_like, mode=mode or "r")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can pass **storage_options
to LocalStore
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be missing something, but I don't think that'll work. LocalStore.open
will call LocalStore.__init__
, which just takes root
and mode
, which are passed as regular args here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. I was thinking auto_mkdir
would be passed through but if that's not the case, let's not get distracted here.
src/zarr/store/common.py
Outdated
elif isinstance(store_like, str): | ||
return StorePath(await LocalStore.open(root=Path(store_like), mode=mode or "r")) | ||
storage_options = storage_options or {} | ||
fs, _ = fsspec.url_to_fs(store_like, **storage_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts here:
- It would be nice to avoid creating the
FileSystem
if possible. A lighter approach could attempt to parse the string and look for://
(and notfile://
). - If we do have to create filesystem, it would be nice to reuse it when creating the
RemoteStore
. Something likeRemoteStore.from_filesystem(...)
could be a nice pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below for the way we used to do this. May be worth replicating parts of this
Line 170 in e1d98cd
if "://" in store or "::" in store: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just pushed that refactor. It's a bit bigger than I'd like, but the basic version is that RemoteStores
's __init__
matches what it uses internally: you can only given it an async filesystem.
There's a from_url
and from_path
that handles the other stuff that was previously in __init__
.
I removed RemoteStore._url
. I think that that's safe to remove, but I need to actually think through how it interacted with store.path
…re' into fix/auto-remote-store
* v3: chore: update pre-commit hooks (zarr-developers#2222) fix: validate v3 dtypes when loading/creating v3 metadata (zarr-developers#2209) fix typo in store integration test (zarr-developers#2223) Basic Zarr-python 2.x compatibility changes (zarr-developers#2098) Make Group.arrays, groups compatible with v2 (zarr-developers#2213) Typing fixes to test_indexing (zarr-developers#2193) Default to RemoteStore for fsspec URIs (zarr-developers#2198) Make MemoryStore serialiazable (zarr-developers#2204) [v3] Implement Group methods for empty, full, ones, and zeros (zarr-developers#2210) implement `store.list_prefix` and `store._set_many` (zarr-developers#2064) Fixed codec for v2 data with no fill value (zarr-developers#2207)
This allows for specifying a RemoteStore by an fsspec URI + storage options, as a convenience (and backwards compatibility shim) for creating a
RemoteStore
:Closes #2195