Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade validation methods #1911

Merged
merged 46 commits into from
Jan 3, 2025
Merged

Upgrade validation methods #1911

merged 46 commits into from
Jan 3, 2025

Conversation

stephprince
Copy link
Contributor

@stephprince stephprince commented May 29, 2024

Motivation

Addresses several issues summarized in #1808. This is a breaking change for the next major release.

This PR also modifies the validate method so that it

  1. accepts a single path as input (the CLI still accepts multiple paths)
  2. no longer returns a status code but will return errors if the function fails in the process of performing validation. (The CLI still returns an exit code).

TODO

  • test with nwbinspector (as of now should only require changes to this line)
  • check test coverage
  • finish testing with ZarrIO with file path validation (bump to later PR)
  • when publishing the next release on conda-forge, the recipe file should also be updated to add the pynwb-validate entry point

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

src/pynwb/validate.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented May 29, 2024

Codecov Report

Attention: Patch coverage is 96.77419% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.69%. Comparing base (e47cd5a) to head (146d0e1).
Report is 1 commits behind head on release-3.0.0.

Files with missing lines Patch % Lines
src/pynwb/__init__.py 78.57% 2 Missing and 1 partial ⚠️
src/pynwb/validation.py 98.55% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                Coverage Diff                @@
##           release-3.0.0    #1911      +/-   ##
=================================================
- Coverage          92.69%   92.69%   -0.01%     
=================================================
  Files                 27       28       +1     
  Lines               2684     2710      +26     
  Branches             706      709       +3     
=================================================
+ Hits                2488     2512      +24     
- Misses               127      128       +1     
- Partials              69       70       +1     
Flag Coverage Δ
integration 73.35% <96.77%> (+0.25%) ⬆️
unit 83.13% <44.35%> (-0.44%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rly rly added this to the 3.0 milestone Oct 10, 2024
@stephprince stephprince marked this pull request as ready for review December 17, 2024 00:20
@stephprince stephprince requested a review from rly December 17, 2024 00:20
@stephprince
Copy link
Contributor Author

stephprince commented Dec 17, 2024

@rly this should be ready for review

There are a couple remaining TODO items related to validation support for NWBZarr files using the file path that could potentially be bumped to a separate PR:

  1. I started to try to implement support for NWBZarr file validation by allowing the ZarrIO.load_namespaces method to accept a ZarrIO.file object as input, but ran into this issue - [Bug]: .specloc not saved in consolidated metadata hdmf-dev/hdmf-zarr#243.
  2. We would also want to either rename the ZarrIO.file or HDF5IO._file property for consistent naming across backends:
    # TODO update HDF5IO to have .file property to make consistent with ZarrIO
    # then update input arguments here
    namespace_dependencies = io.load_namespaces(namespace_catalog=catalog,
    file=io._file)

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
docs/source/validation.rst Outdated Show resolved Hide resolved
src/pynwb/__init__.py Outdated Show resolved Hide resolved
docs/source/validation.rst Outdated Show resolved Hide resolved
test.py Outdated
@@ -169,7 +169,7 @@ def validate_nwbs():
is_family_nwb_file = False
try:
with pynwb.NWBHDF5IO(nwb, mode='r') as io:
errors = pynwb.validate(io)
errors = validate(io, use_cached_namespaces=False) # previously io did not validate against cached namespaces
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
errors = validate(io, use_cached_namespaces=False) # previously io did not validate against cached namespaces
errors = validate(io, use_cached_namespaces=False),
errors.append(validate(io, use_cached_namespaces=True))
  1. I think the original comment here might not make sense in the future
  2. The example NWB files are all generated using the version of NWB being tested, and because pynwb caches the spec by default, there should be no difference between using cached namespaces and not.
  3. Since the pynwb.validate and pynwb-validate should be the same now, we don't really need this test anymore since we have the pynwb-validate test below. But since this validate_nwbs() function is super conservative in its testing of every combination, then for consistency, I suggest we validate with both use_cached_namespaces=True and use_cached_namespaces=False.

Copy link
Contributor Author

@stephprince stephprince Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I set use_cached_namespaces=True here, I get several errors when running the validate_nwb() section of test.py. I had added use_cached_namespaces=False so that it matched the previous test behavior, but if it is expected that there should be no difference between using the cached namespaces and not in this particular case, then maybe these errors are indicative of another issue?

You can replicate by running test.py, but the errors look like this below. Maybe the mylab extension generation needs to be updated?:

2024-12-23 10:12:02,969 - INFO - Validating with pynwb.validate method.
Traceback (most recent call last):
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/validate/validator.py", line 280, in get_validator
    return self.__validators[dt]
           ~~~~~~~~~~~~~~~~~^^^^
KeyError: 'NWBFile'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/smprince/Documents/code/pynwb/test.py", line 172, in validate_nwbs
    errors = validate(io, use_cached_namespaces=True)  # previously io did not validate against cached namespaces
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/utils.py", line 672, in func_call
    return func(**pargs)
           ^^^^^^^^^^^^^
  File "/Users/smprince/Documents/code/pynwb/src/pynwb/validation.py", line 191, in validate
    validation_errors += _validate_helper(io=io, namespace=validation_namespace)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/Documents/code/pynwb/src/pynwb/validation.py", line 19, in _validate_helper
    return validator.validate(builder)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/validate/validator.py", line 298, in validate
    validator = self.get_validator(dt)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/utils.py", line 668, in func_call
    return func(args[0], **pargs)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/smprince/anaconda3/envs/pynwb/lib/python3.11/site-packages/hdmf/validate/validator.py", line 283, in get_validator
    raise ValueError(msg)
ValueError: data type 'NWBFile' not found in namespace mylab

Copy link
Contributor Author

@stephprince stephprince Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these lines:

ns_builder.include_type("ElectricalSeries", namespace="core")

ns_builder.include_type("NWBDataInterface", namespace="core")

could be updated to:
ns_builder.include_namespace("core")

to fix these errors. I think these changes more closely match the latest version of ndx-template create_extension_spec.py file. However, should ns_builder.include_type still work without including the entire namespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. For now, let's update that to ns_builder.include_namespace("core").

We need to think through what it means to run the pynwb validator on a file, validating against a particular non-core namespace. Should the core namespace always be included during validation, regardless of whether the extension includes the core namespace, and the choices are either use the core namespace cached in the file or the one installed with pynwb? I think so...

I think we eventually want to move toward not even allowing validation against a particular namespace. Either the file is valid given its cached namespaces (or the namespaces installed by pynwb and loaded by the user), or the file is not. Otherwise, we run into weird issues with such as:
hdmf-dev/hdmf#608 and hdmf-dev/hdmf#525

I would say let's make the above change for now, and iterate on these other ideas in a separate PR which does not need to make it in pynwb 3.0.0rc1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the core namespace always be included during validation, regardless of whether the extension includes the core namespace, and the choices are either use the core namespace cached in the file or the one installed with pynwb? I think so...

I think so too? But agreed it would definitely be helpful to compile and discuss all these different scenarios to get a better idea of the end goal validation behavior

I would say let's make the above change for now, and iterate on these other ideas in a separate PR which does not need to make it in pynwb 3.0.0rc1

Sounds good!

test.py Show resolved Hide resolved
src/pynwb/validation.py Outdated Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
@stephprince stephprince changed the base branch from dev to release-3.0.0 January 2, 2025 17:36
@stephprince stephprince requested a review from rly January 2, 2025 21:25
@stephprince stephprince merged commit f02e61b into release-3.0.0 Jan 3, 2025
24 of 25 checks passed
@stephprince stephprince deleted the upgrade-validator branch January 3, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment