-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zarr.core.Array.chunk_store
changes type after zarr=2.23.3
(after PR 1304)
#1362
Comments
Hi @valeriupredoi - thanks for reporting this potential bug! It looks like #1304 may have caused more problems than it fixed. 🤦 We'd love to be able to act on this, but I would really like to first encourage you to make your bug report more minimal. It would be great if you could remove the references to kerchunk, local file paths, etc. in your example. Here is a great guide for how to do this. |
@rabernat cheers muchly! Absolutely, will do that in a sec - for now we've pinned zarr so no rush at our end, but will try minimize the test for you guys to be able to reproduce the issue w/o extra dependencies and stuff 🍺 |
@rabernat I have now edited the minimal test case - my apologies, I didn't know Zarr is so flexible so that it really doesn't need any data whatsoever (all can be mocked), with the functional types and structure still preserved - well done! 🍻 |
Hi @valeriupredoi - thanks for updating your example! However, it is still not reproducible. When I tried to run your code, I got the following error:
|
hi @rabernat apols, I mentioned in the description but it's not clear - you'll have to create an empty |
I tried this: import json
def open_zarr_group():
with open("test.json", mode="w") as fp:
json.dump({}, fp)
url = fsspec.filesystem("reference", fo="test.json")
mapper = url.get_mapper("")
zarr_group = zarr.open_group(mapper)
print(zarr_group)
print(zarr_group.chunk_store) And got the following error:
|
Does this example really need the reference filesystem? Here is one without it import fsspec
import zarr
fs = fsspec.filesystem("file")
mapper = fs.get_mapper("tmp.zarr")
zarr_group = zarr.open_group(mapper)
print(zarr_group)
# <zarr.hierarchy.Group '/'>
print(zarr_group.chunk_store)
# <zarr.storage.FSStore object at 0x7fe3b15dbfd0> Does this reproduce the problem? |
Indeed that should work! I tried the code in your previous comment (whereby the json file is constructed with the context manager and that worked for me...), but that's true, I don't think one needs the ref FS |
Ok, glad we found a truly minimum reproducer! 😊 Now, can you help me understand what exactly is the problem? The Can you explain why the |
and a really nice proof Zarr supports Null testing nicely 😁
We are treating and using |
Ok I understand. Thanks for re-explaining! In general,
Can you explain why you were using the |
@rabernat that's a very good question - as I understand it, we use it to get the mapping of the input netCDF4 files we input onto the Zarr format, so a full image of the data and metadata as Zarr would see it:
where ie the FS reference for the file. My colleague @bnlawrence can tell you more since he's implemented this method, but that should be the gist of it. If there is any way to grab this dictionary from FSStore, then we'd gladly use that, especially if that was a public function 👍 |
You should be able to access the underlying fsspec filesystem object via the Looking at the source code is probably helpful: Line 1278 in 5ece3e6
👋 @bnlawrence! Lovely to see you around these parts. I'd love to learn what you are up to and how we might collaborate around Zarr / kerchunk / etc. |
cheers @rabernat - I'll have a look, many thanks for your help so far 🍺 |
whoa! Proves out in the new configuration one doesn't need to access the
Cheers, @rabernat 🍺 I'll keep this open if you'd still need it for bookkeeping and/or @bnlawrence's reply 😁 |
We can close the issue but feel free to keep the conversation going! |
Ah sorry @rabernat, @valeriupredoi has just pointed this thread out to me - I have to admit to mostly silencing my github notifications as I can't cope with the information onslaught. For what it's worth, this is part of building a tool to allow the file system (or the S3 server) to do basic reductions on chunks rather than do full chunk loads (to avoid data movement). We have four pieces in flight at the moment: a Python client application (cf-python of course), some Python middleware (that uses this piece of the zarr ecosystem), a Posix server, and an S3 proxy server ... but until we put them all together (due in Q2/Q3 this year), there's not much to show or talk about. We absolutely intend the middleware to be reusable by any client application/library (including xarray), but until we have it working enough (i.e. with the storage clients as well) to sell the idea, we're not talking much :-) |
Very interesting. I would check out https://zarr.dev/zeps/draft/ZEP0005.html, by @hailiangzhang and colleagues at NASA Goddard. It aims to accomplish a very similar thing via a Zarr spec extension. Discussion at zarr-developers/zarr-specs#205 |
Thanks for referring to my proposal @rabernat ! |
Our approach is to push the computation of partial sums etc into the storage itself (which has plenty of compute for such tasks, given that it can do erasure coding etc). What is probably in common with our two approaches is how we would want to tell Dask when it can avoid having to do these calculations itself. @valeriupredoi Perhaps you or @davidhassell could see if you can join that call? If not, maybe we can set something specific up for a conversation - it is a bit niche :-) |
Thanks for your info @bnlawrence ! Yes, there may be something in common between our approaches (at least it looks like we are trying to achieve similar goal:) Do you happen to have any links/resources describing more about your approach? |
Hey guys, just a heads up that the
chunk_store
object resulted from calling thechunk_store()
method on a Zarr array changes type for objects (Zarr arrays) created from loading files withfsspec
afterzarr=2.13.3
, ie after PR #1304 was merged:chunk_store
(see below for reproducible code) isKVStore: <fsspec.mapping.FSMap object at 0x7feba08afee0> at 0x7feba08aff40>
chunk_store
object is<zarr.storage.FSStore object at 0x7fd3ea52bf40>
This causes issues when one wants to call properties of the
KVStore
object like_mutable_mapping
- if this is intended please provide both a mention in the release notes (as a breaking change) and a solution how to still be able to use_mutable_mapping
, if it's a bug - then you know what to do 😁 🍺Here's my minimal code - like super minimal 😁 - and kudos to Zarr for preserving structure in such a Null case
where
test.json
can be an empty JSON file eg{}
Let me know what I can do to help BTW, and many thanks for all your good work! 🍻
The text was updated successfully, but these errors were encountered: