Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User attributes not being saved to file #2079

Closed
dstansby opened this issue Aug 12, 2024 · 9 comments
Closed

User attributes not being saved to file #2079

dstansby opened this issue Aug 12, 2024 · 9 comments
Labels
bug Potential issues with the zarr-python library

Comments

@dstansby
Copy link
Contributor

Zarr version

2.18.2

Numcodecs version

0.12.1

Python Version

3.11

Operating System

macOS

Installation

conda

Description

I'm trying to save user attributes to file, but they don't seem to be saved when calling zarr.save_array.

Steps to reproduce

import numpy as np
import zarr

arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256)
arr.attrs["attr"] = "value"
zarr.save_array("test_attr_arr.zarr", arr)

The resulting zarr array on disk does not have a .zattrs file as I would expect.

Additional output

No response

@dstansby dstansby added the bug Potential issues with the zarr-python library label Aug 12, 2024
@jhamman
Copy link
Member

jhamman commented Aug 12, 2024

I think this is actually the expected behavior. Without save_array the attrs are saved.

import numpy as np
import zarr

arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256)
arr.attrs["attr"] = "value"
print(arr.store[".zattrs"])
# b'{\n    "attr": "value"\n}'

But save_array creates a new array where the arr argument is a numpy-like thing and does not look for attrs. So your last line is creating a new array in a new store.

@dstansby
Copy link
Contributor Author

🤔 so how do I save the attributes to disk?

@jhamman
Copy link
Member

jhamman commented Aug 12, 2024

When you create an array, you need to associate a store (the default, as you have shown, uses a memory store). This works 👇

import numpy as np
import zarr

arr = zarr.ones((512, 512, 512), dtype=np.uint8, chunks=256, store="test_attr_arr.zarr")
arr.attrs["attr"] = "value"

@jhamman
Copy link
Member

jhamman commented Aug 13, 2024

Going to take the ❤️ as permission to close this. Feel free to reopen if you run into more gotchas!

(btw, you'll be able to do this all in one call in v3:

arr = zarr.ones(
    shape=(512, 512, 512),
    dtype=np.uint8,
    chunks=256,
    store="test_attr_arr.zarr",
    attributes={"attr": "value"}
)

@jhamman jhamman closed this as completed Aug 13, 2024
@dstansby
Copy link
Contributor Author

dstansby commented Aug 13, 2024

What I really want to do is work with an array in memory (including adding some attributes), and then save it in one go to disk. So I think it's worth at least documenting how to do that workflow somewhere?

@d-v-b
Copy link
Contributor

d-v-b commented Aug 13, 2024

I think this would be equivalent to copying the memory store to a directory store. Is there not a copy_store routine in v2? With the v2 mutable mapping api, you might be able to to do local_store.update(**memory_store), although users should see something a bit more familiar

@dstansby
Copy link
Contributor Author

Yep, there is a copy_store function.

Clearly my mental model of how zarr (or at least save_array) works wasn't very good, so I might suggest some improvements to the save_array docstring explaining the difference between save_array and copy_store.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 13, 2024

a few thoughts about this flow:

  • it's good to make it dead-simple for users to save numpy arrays to zarr. If zarr v2 allowed specifying attrs at array creation time, then your first attempt would have worked @dstansby, because save_array forwards **kwargs to _create_array under the hood, and that's a route attrs could take, but only if _create_array took attrs, which it does not. A million years ago I had a PR to fix this, but I think with the v3 api joe showed we don't have this problem any more.
  • there are other arrays to consider. in particular, dask arrays come with chunks, and xarray.DataArrays come with optional chunks AND attrs. I think we should have a single entry point for fluently converting generic chunked-array-with-attrs-like objects into Zarr arrays. Over in pydantic-zarr, I implemented a from_array function that takes an array-like input (i.e., shape and dtype are required) and checks if that input has an attrs attribute, or a chunks attribute, or a filters attribute, and so on for all the zarr array attributes, to create the resulting (model) zarr array (and this attribute inference can be overridden with a concrete value). I think we should do something similar in zarr-python. This gets a bit more complicated for array-like objects that have different "syntax" for the same semantics, e.g. the dask array chunks attribute is an explicit list of chunk sizes. We might need to define this variation via protocols, and dispatch on the shape of incoming array-like objects. Work to be done for sure, but I think this is something that a lot of users would appreciate.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 13, 2024

This issue made me think about array creation routines, I wrote up some ideas here: #2083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants