Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N5 attributes.json lacks n5 key for version #200

Open
dchen116 opened this issue Oct 7, 2024 · 5 comments
Open

N5 attributes.json lacks n5 key for version #200

dchen116 opened this issue Oct 7, 2024 · 5 comments

Comments

@dchen116
Copy link

dchen116 commented Oct 7, 2024

N5 datasets saved by tensorstore do not include a top level n5 key for the n5 version.
Here is the minimum working example of the problem.

Commands:

$ pixi run python .\tensorstore_n5_issue.py
Traceback (most recent call last):
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 202, in _load_metadata_nosync
    meta_bytes = self._store[mkey]
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\n5.py", line 376, in __getitem__
    value = array_metadata_to_zarr(self._load_n5_attrs(key_new), top_level=top_level)
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\n5.py", line 657, in array_metadata_to_zarr
    array_metadata.pop("n5")
KeyError: 'n5'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\User\tensorstore_n5_issue.py", line 64, in <module>
    main(apply_fix)
  File "C:\User\tensorstore_n5_issue.py", line 60, in main
    n5_read_and_checksum_array(store_path)
  File "C:\User\tensorstore_n5_issue.py", line 35, in n5_read_and_checksum_array
    zarr.open(store=n5_store, mode='r')
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\convenience.py", line 133, in open
    return open_array(_store, mode=mode, **kwargs)
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\creation.py", line 689, in open_array
    z = Array(
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 170, in __init__
    self._load_metadata()
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 193, in _load_metadata
    self._load_metadata_nosync()
  File "C:\User\.pixi\envs\default\lib\site-packages\zarr\core.py", line 204, in _load_metadata_nosync
    raise ArrayNotFoundError(self._path) from e
zarr.errors.ArrayNotFoundError: array not found at path %r' ''
$ pixi run python .\tensorstore_n5_issue.py --fix
C:\User\AppData\Local\Temp\tmpysaajast
Added 'n5': '4.0.0' to C:\User\AppData\Local\Temp\tmpysaajast\attributes.json

tensorstore_n5_issue.py:

import numpy as np
import tempfile
import tensorstore as ts
import zarr
import os
import sys
import json

def ts_create_n5_test(n5_path):
    chunk_shape = (16, 16)
    data = np.arange(np.prod(chunk_shape)).reshape(chunk_shape)

    # Set up the basic N5 store specification
    n5_store_spec = {
        'driver': 'n5',
        'kvstore': {
            'driver': 'file',
            'path': n5_path
        },
        'metadata': {
            'dimensions': list(data.shape),
            'blockSize': list(chunk_shape),
            'dataType': data.dtype.name,
            'compression': {
                'type': 'raw'
            }
        }
    }

    n5_store = ts.open(n5_store_spec, create=True, delete_existing=True).result()
    n5_store.write(data).result()

def n5_read_and_checksum_array(store_path):
    n5_store = zarr.N5FSStore(store_path)
    zarr.open(store=n5_store, mode='r')

# Function to load and fix the attributes.json metadata
def fix_attributes_json(store_path):
    # Define the path to attributes.json
    attributes_json_path = os.path.join(store_path, "attributes.json")

    # Load the content of attributes.json
    with open(attributes_json_path, "r") as file:
        attributes_data = json.load(file)

    attributes_data["n5"] = "4.0.0"
    print(f"Added 'n5': '4.0.0' to {attributes_json_path}")

    # Write the modified data back to attributes.json
    with open(attributes_json_path, "w") as file:
        json.dump(attributes_data, file, indent=4)

def main(apply_fix):
    store_path = tempfile.mkdtemp()
    print(store_path)
    ts_create_n5_test(store_path)
    if apply_fix:
        fix_attributes_json(store_path)
    n5_read_and_checksum_array(store_path)

if __name__ == '__main__':
    apply_fix = len(sys.argv) > 1 and sys.argv[1] == "--fix"
    main(apply_fix)

pixi.toml:

[project]
authors = ["Diyi Chen <[email protected]>"]
channels = ["conda-forge"]
description = "Demonstrate issue when saving N5 datasets using Tensorstore"
name = "tensorstore-n5-issue"
platforms = ["win-64"]
version = "0.1.0"

[tasks]

[dependencies]
python = "3.10.*"
numpy = ">=2.0.1,<3"
zarr = ">=2.18.2,<3"
numcodecs = ">=0.12.1,<0.13"
fsspec = ">=2024.9.0,<2025"

[pypi-dependencies]
tensorstore = ">=0.1.64, <0.2"
@laramiel
Copy link
Collaborator

laramiel commented Oct 9, 2024

We should add that. Just a note that according to the Java code, the "n5" version attribute doesn't need to be set:

https://github.com/saalfeldlab/n5/blob/b8d92d5b25ae08c96527f831104ede732553c8e3/src/main/java/org/janelia/saalfeldlab/n5/N5Reader.java#L212

If no version is specified or the version string does not conform to the SemVer format, 0.0.0 will be returned. For incomplete versions, such as 1.2, the missing elements are filled with 0, i.e. 1.2.0 in this case.

@mkitti
Copy link

mkitti commented Oct 17, 2024

The n5 specification, item 3, clearly states the following.

The version of this specification is 4.0.0 and is stored in the "n5" attribute of the root group "/".

https://github.com/saalfeldlab/n5/blob/9a4fc3fe6678ad3f7304d0518add5f481b88e41a/README.md?plain=1#L25

While it is true that the reference implementation can read datasets from prior versions and regard the absence of a n5 attribute as pre-1.0.0, is it the intention of tensorstore to only implement n5 prior to v1.0.0 (ca. 2018)?

@jbms
Copy link
Collaborator

jbms commented Oct 18, 2024

zarr-python doesn't actually do anything with the version number, so it would be better to fix it to not fail if it is missing. That way it can also read older n5 datasets prior to the introduction of the n5 attribute.

Note that tensorstore doesn't distinguish between root vs non-root so we would need to just always write the attribute when creating an array.

In fact it is not clear to me what purpose the version number serves, nor what changes, if any, were made from one version to the next. As far as I can see, the only change made to the specification is the introduction of the version number itself.

@laramiel
Copy link
Collaborator

I agree; I tried to see what the deltas were between any of the "versions" and from what I can tell, up to version 3.3 ish the README.md stated that the version number was always 1.0.0.

@d-v-b
Copy link

d-v-b commented Oct 20, 2024

zarr-python doesn't actually do anything with the version number, so it would be better to fix it to not fail if it is missing.

yep, I think failing when the n5 key is missing is a zarr-python bug that we should fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants