Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty containers in zarr chunk checks #5526

Merged
merged 4 commits into from
Jan 27, 2022
Merged

Handle empty containers in zarr chunk checks #5526

merged 4 commits into from
Jan 27, 2022

Conversation

chrisroat
Copy link
Contributor

Continuation of #5019, which closed during the master/main switch.

@chrisroat
Copy link
Contributor Author

@jhamman Ready for review, though not urgent.

Mostly, I'm curious if this the right way to go about solving the linked issue, or if there is something deeper in xarray to update.

@jhamman
Copy link
Member

jhamman commented Jun 28, 2021

Thanks @chrisroat for the PR. This looks great. I do think we should add an entry in whats-new.rst though. Once that is done, I'll merge this.

@chrisroat
Copy link
Contributor Author

In my original PR, I wrote:

In putting this together, I noted that open_zarr to re-read the data triggers this bug, while open_dataset(..., engine='zarr') does not. I'm not sure if my proposed fix is a band-aid, or if something in open_zarr is the real culprit.

I just want to be sure we think this is the correct fix, and there won't be any unintended consequences. I'm really not familiar with the interaction of chunks between zarr, dask, and xarray. I don't feel 100% sure this is the correct solution, though it fixes the immediate solution.

@jhamman jhamman requested a review from rabernat June 29, 2021 20:04
@jhamman
Copy link
Member

jhamman commented Jun 29, 2021

Thanks @chrisroat - let's see if @rabernat or @shoyer has an opinion here then.

@chrisroat
Copy link
Contributor Author

Is this change still of potential value?

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisroat - thanks so much for your contribution! Sorry that I never reviewed this PR until now.

LGTM.

@@ -93,7 +93,7 @@ def _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name, safe_chunks):

# if there are no chunks in encoding and the variable data is a numpy
# array, then we let zarr use its own heuristics to pick the chunks
if var_chunks is None and enc_chunks is None:
if not var_chunks and not enc_chunks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, could you explain the difference in behavior between var_chunks is None and not var_chunks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I hit this when var_chunks was an empty list instead of None:

>>> var_chunks=[]
>>> not var_chunks
True
>>> var_chunks is None
False
>>> 

I do not know whey the use case I mention above uses an empty list (while the code was designed to expect None). I am not sure if there would be undesired knock-on effects of this change. Thoughts?

I can still add a whats-new if that is desired.

@Illviljan Illviljan added the plan to merge Final call for comments label Jan 23, 2022
@dcherian
Copy link
Contributor

Thanks @chrisroat Sorry this took so long to get merged

@dcherian dcherian merged commit 9235548 into pydata:main Jan 27, 2022
@chrisroat chrisroat deleted the zarr-readwrite-bug branch January 30, 2022 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write/read to zarr subtly changes array with non-dim coord
6 participants