Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent use of 'Zarr format 2 or 3' #2645

Merged
merged 6 commits into from
Jan 6, 2025
Merged

Consistent use of 'Zarr format 2 or 3' #2645

merged 6 commits into from
Jan 6, 2025

Conversation

normanrz
Copy link
Member

@normanrz normanrz commented Jan 4, 2025

I went through the docstrings, docs and user-facing error messages to make consistent use of "Zarr format 2 or 3" instead of v2 and v3. I think this is less confusing, because it clearly refers to the Zarr format spec version. Otherwise, it would be ambiguous whether the format version or library version is meant.

@normanrz normanrz self-assigned this Jan 4, 2025
@normanrz normanrz requested review from jhamman and dstansby January 4, 2025 14:10
@normanrz normanrz added the documentation Improvements to the documentation label Jan 4, 2025
@normanrz normanrz added this to the 3.0.0 milestone Jan 4, 2025
@dstansby
Copy link
Contributor

dstansby commented Jan 5, 2025

I think "zarr format 2/3" works in some places, when talking about the spec, but when talking about data I think it's a bit odd. How about "zarr v2 data" or "zarr v3 data" when talking about actual data? I left an example suggestion for this below to show what I mean.

@normanrz
Copy link
Member Author

normanrz commented Jan 5, 2025

I think "zarr format 2/3" works in some places, when talking about the spec, but when talking about data I think it's a bit odd. How about "zarr v2 data" or "zarr v3 data" when talking about actual data?

Well, the whole point of this PR is to make it consistent.

To me, "v2 data" or "v3 data" doesn't make any sense, because the data are n-dimensional arrays. Only through encoding in the codec pipeline, they become "Zarr format {2,3} arrays". That is probably another inaccuracy that needs fixing.

Copy link
Contributor

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - still not a huge fan of format, but it's an improvement and I don't have any better suggestions!

I left a couple of minor unrelated suggestions that I found when reviewing, feel free to take or leave.

src/zarr/api/synchronous.py Outdated Show resolved Hide resolved
src/zarr/api/synchronous.py Outdated Show resolved Hide resolved
src/zarr/core/array.py Outdated Show resolved Hide resolved
@d-v-b
Copy link
Contributor

d-v-b commented Jan 5, 2025

"zarr format 2" reads less elegantly than "Zarr v2". If we are consistent about always referring to the library as zarr-python, is there still a lot of ambiguity?

And because it's sensible for someone to say "I saved my data using zarr v2", I think it's also sensible for someone to say "zarr v2 data", because they are denoting the stored representation of their n-dimensional arrays + groups.

@normanrz
Copy link
Member Author

normanrz commented Jan 5, 2025

"zarr format 2" reads less elegantly than "Zarr v2".

I agree that is sounds more elegant. But the kwarg for specifying the "version" is zarr_format. In the interest of clarity and consistency, I think "Zarr format" is the best choice.

Co-authored-by: David Stansby <[email protected]>
@d-v-b
Copy link
Contributor

d-v-b commented Jan 5, 2025

Using the keyword argument "zarr_format" makes sense in the context of a function for creating data, but to me that's pretty separate from the context of documentation about the library / format. For the docs, we should use the same language we expect people who save their data in zarr to use. I personally would say things like "I saved my data in zarr v2 and v3", (or simply "zarr 2" and "zarr 3"), definitely not "I saved my data in zarr format 2 and zarr format 3".

I agree that we need to avoid confusion between Zarr (the format) and zarr-python (the library). I think consistently using the tokens "Zarr" and zarr-python to refer to these two things, along with contextual cues as needed, is a better solution than inventing new ways to talk about formatted data. We could even have a section of the docs that says "In order to disambiguate between Zarr the format and zarr-python the library, we always refer to them like this..."

@normanrz
Copy link
Member Author

normanrz commented Jan 5, 2025

To be fair, most changes I did were in docstrings in the contexts of creating arrays.

We could even have a section of the docs that says "In order to disambiguate between Zarr the format and zarr-python the library, we always refer to them like this..."

I assume most users pop in and out of the docs through google searches or direct links. I don't think it would help to have this section, if nobody would find it.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 5, 2025

Even if the language doesn't please everyone, +1 to making things consistent. we can always tweak it later.

@normanrz normanrz enabled auto-merge (squash) January 6, 2025 05:41
@normanrz normanrz merged commit 5c6267e into main Jan 6, 2025
33 checks passed
@dstansby dstansby deleted the docs/zarr-format branch January 6, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements to the documentation
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants