Validation finds unexpected elements #542

dsleiter · 2021-03-19T07:54:37Z

Motivation

The validator now returns warnings on any extra fields that it finds.

Ref Validation find unexpected elements #288 - this is a new PR but taking many changes from this previous PR
Ref Validator should throw warning if there are extra fields NeurodataWithoutBorders/pynwb#942
Ref Validation find unexpected elements NeurodataWithoutBorders/pynwb#1178
Adds docval usage to the spec matching utility classes as discussed here: Ensure all specs are validated #526 (comment)

How to test the behavior?

Simple example:

from hdmf.build import GroupBuilder
from hdmf.spec import GroupSpec, SpecCatalog, SpecNamespace
from hdmf.validate import ValidatorMap

spec_catalog = SpecCatalog()
group_spec = GroupSpec('A test group spec with no attributes', data_type_def='Foo')
spec_catalog.register_spec(group_spec, 'test.yaml')
namespace = SpecNamespace('a test namespace', 'test_core', [{'source': 'test.yaml'}],
                          version='0.1.0', catalog=spec_catalog)
vmap = ValidatorMap(namespace)
extra_attribute = {'x': 42}
builder = GroupBuilder('foo', attributes={'data_type': 'Foo', 'extra_field': 'bar'})
result = vmap.validate(builder)
print(result)

Review and run all new unit tests for full coverage.

Checklist

Did you update CHANGELOG.md with your changes?
Have you checked our Contributing document?
Have you ensured the PR clearly describes the problem and the solution?
Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

* Ref hdmf-dev#288 * Starting a new PR but taking appropriate changes from the above PR * Ref NeurodataWithoutBorders/pynwb#942 * GroupValidator returns ExtraFieldWarning for extra groups, datasets, and links * Includes unit tests covering the new warning * Attribute validation WIP

* Ref hdmf-dev#288 * Starting a new PR but taking appropriate changes from the above PR * Ref NeurodataWithoutBorders/pynwb#942 * Temporary solution for handling reserved attributes, will be fixed in later commit * Includes unit tests for attribute validation

…ation type issue * Ref hdmf-dev#288 * Starting a new PR but taking appropriate changes from the above PR * Ref NeurodataWithoutBorders/pynwb#942 * Validator now skips validation of reserved attributes provided by the spec * Discovered and resolved an issue where the validator was validating a group child based on the parent groups spec instead of the actual type of the builder. This is important when an inheriting class has additional attributes. * Added a regression test case for this issue

ajtritt

Code looks great, thanks for taking this on. A few change requests. There's only one real issue (See SpecMatch object). The rest are trivial docstring request.

src/hdmf/validate/validator.py

dsleiter · 2021-03-22T06:22:18Z

@ajtritt thanks for the review! Good suggestions.

In working through this, I began wondering if it could be beneficial to introduce an AttributeBuilder class. I don't know if this would help or hurt across other parts of hdmf, but in the validator here I think it would help with two things:

it would provide a separation between attributes coming from the user api layer (containers) and attributes coming from the data translation layer (builders) so that there wouldn't be a need to define "reserved" attributes as they could be separated
attribute names and values would be paired together and would provide functionality for things like builder location and accessing an attribute's parent

If you think this could potentially be beneficial, I can create an issue to discuss in more detail

…_unexpected_elements

dsleiter · 2021-03-30T00:04:04Z

@ajtritt @rly I merged in changes from dev, but unfortunately now we have failing tests.

The cause seems to be due to a combination of three things:

the new EnumData which is part of the hdmf-experimental namespace Integrate experimental namespace #545
the existing validator can't handle multiple namespaces (were you working on something related to this recently @rly?)
because EnumData is not part of the hdmf-common namespace, the validator doesn't know that it inherits from VectorData and thinks that it is an extra field when validating a DynamicTable containing an EnumData - so as of this PR it now returns a warning

I think the ideal solution is to make the ValidatorMap be aware of multiple namespaces, but I think that is probably better as a separate issue, do you both agree?

Until that issue is complete, how would you feel about adjusting H5RoundTripMixin.validate to only fail on validation errors and pass / warn on validation warnings? Or do you have any other suggestions?

rly · 2021-03-30T02:08:33Z

Nice job hunting down the source of the issue.

Until that issue is complete, how would you feel about adjusting H5RoundTripMixin.validate to only fail on validation errors and pass / warn on validation warnings? Or do you have any other suggestions?

Yes, I think that is a great idea.

rly · 2021-03-30T02:12:35Z

the existing validator can't handle multiple namespaces (were you working on something related to this recently @rly?)

I was, but it is not yet ready. If you want, you can take a stab at it, but I won't be able to get to it until at least a week post-hackathon.

* Responding to: hdmf-dev#542 (comment) * Create severity level constants for warnings and errors

dsleiter · 2021-04-06T04:02:39Z

I updated the H5RoundTripMixin to only fail on validation errors and warn on validation warnings

Until that issue is complete, how would you feel about adjusting H5RoundTripMixin.validate to only fail on validation errors and pass / warn on validation warnings?

I'm not entirely certain the warning inside the unit test is valuable and might add noise, so let me know if you'd prefer without the warning and I'll update.

dsleiter · 2021-04-06T04:07:06Z

I see that many pynwb tests failed due to the new validation warning. Most of them look like they'll be fixed by adding neurodata_type to the reserved attributes, but there might be other extra field warnings beyond that.

As a longer-term strategy (once we fix validation of multiple namespaces), do we want tests to fail on validator warnings or only on errors?

t-b · 2021-04-13T11:48:42Z

I see that many pynwb tests failed due to the new validation warning. Most of them look like they'll be fixed by adding neurodata_type to the reserved attributes, but there might be other extra field warnings beyond that.

Do you have some examples about pynwb validation warnings due to this PR here?

* Ref NeurodataWithoutBorders/pynwb#942 * Ref NeurodataWithoutBorders/pynwb#1178

dsleiter · 2021-04-14T06:53:18Z

@t-b the errors I was referring to are from the ci/circleci: pynwb-dev-pythons38 job here: https://app.circleci.com/pipelines/github/hdmf-dev/hdmf/2866/workflows/bb203ca4-2953-4094-8247-c80247e99006/jobs/33759

There were many errors happening because the validator was not recognizing neurodata_type as the type_key for pynwb, but thanks for your question, as in describing the cause I determined how to fix the issue. It is now resolved via this commit.

There are a few remaining pynwb tests failing, but most don't appear to have anything to do with this PR. There are 4 failing tests related to extra field warnings, all of them from the integration tests in hdf5.test_misc.TestUnitsIO, but I think these are actually valid warnings:

  ERROR in test test_get_obs_intervals (hdf5.test_misc.TestUnitsIO):
  Traceback (most recent call last):
    File "/home/circleci/project/pynwb/tests/integration/hdf5/test_misc.py", line 63, in test_get_obs_intervals
      ut = self.roundtripContainer()
    File "/home/circleci/project/pynwb/src/pynwb/testing/testh5io.py", line 101, in roundtripContainer
      self.validate()
    File "/home/circleci/project/pynwb/src/pynwb/testing/testh5io.py", line 172, in validate
      raise Exception(err)
  Exception: VectorData (acquisition/UnitsTest/spike_times.resolution): encountered extra field

The schema for Units has a dataset named spike_times which is of type VectorData. spike_types has an additional attribute defined: resolution.
https://github.com/NeurodataWithoutBorders/nwb-schema/blob/e13c24ca24d6d17fa4ce8f3e6f40bd8894432f83/core/nwb.misc.yaml#L177-L200
However, when the validator runs, it compares the spike_times builder against the spec for VectorData, which does not include resolution, and therefore raises the warning.

Should specs that only define neurodata_type_inc be allowed to add fields to a data type?

If not, then I think either spike_times needs to define a new type via neurodata_type_def, or resolution needs to be moved somewhere else.

Do you agree with my evaluation?

…by inheriting specs * Ref NeurodataWithoutBorders/pynwb#942 * Ref NeurodataWithoutBorders/pynwb#1178

t-b · 2021-04-14T19:19:22Z

@t-b the errors I was referring to are from the ci/circleci: pynwb-dev-pythons38 job here: https://app.circleci.com/pipelines/github/hdmf-dev/hdmf/2866/workflows/bb203ca4-2953-4094-8247-c80247e99006/jobs/33759

There were many errors happening because the validator was not recognizing neurodata_type as the type_key for pynwb, but thanks for your question, as in describing the cause I determined how to fix the issue. It is now resolved via this commit.

Okay, question answered ;)

The schema for Units has a dataset named spike_times which is of type VectorData. spike_types has an additional attribute defined: resolution.
https://github.com/NeurodataWithoutBorders/nwb-schema/blob/e13c24ca24d6d17fa4ce8f3e6f40bd8894432f83/core/nwb.misc.yaml#L177-L200
However, when the validator runs, it compares the spike_times builder against the spec for VectorData, which does not include resolution, and therefore raises the warning.

Should specs that only define neurodata_type_inc be allowed to add fields to a data type?

If not, then I think either spike_times needs to define a new type via neurodata_type_def, or resolution needs to be moved somewhere else.

Do you agree with my evaluation?

Yes I agree. Specifically I would say to

Should specs that only define neurodata_type_inc be allowed to add fields to a data type?

No. This is also my understanding when rereading https://schema-language.readthedocs.io/en/latest/specification_language_description.html#group-specification-keys.

I don't know what the fallout from changing something deep like that is though. If we fix that I think we want to complain already when parsing the spec YAML files.

@rly Any thoughts?

dsleiter · 2021-04-15T01:49:51Z

No. This is also my understanding when rereading https://schema-language.readthedocs.io/en/latest/specification_language_description.html#group-specification-keys.

Ok, then assuming the spec for Units does violate the schema language, I would propose to:

create a bug issue to correct the Units spec (and associated container and object mapper)
update the failing pynwb tests to only fail for validation errors and raise a warning on validation warnings as part of Validation find unexpected elements NeurodataWithoutBorders/pynwb#1178

That way we can be unblocked from including the new warnings in hdmf and pynwb and discuss how to fix the spec separately.

If we fix that I think we want to complain already when parsing the spec YAML files.

This seems like a great idea to me!

t-b · 2021-04-15T13:57:57Z

@dsleiter Sounds good!

The failing pynwb tests are validating old NWB files without cached spec. So yes making the error currently encountered into a warning sounds like a good approach.

EDIT:
I also just tested this branch against files NWBv2 files created by MIES, https://github.com/AllenInstitute/MIES, and validation fails as expected due to additional fields. Nice!

ajtritt

Looks good to me, nice work!

Also try to trigger GH Action for codecov...

rly · 2021-04-21T06:16:29Z

@dsleiter I'm reviewing this and so far so good. For this PR and the future, could you please enable GitHub Actions in your fork of HDMF if it is not already enabled? I know we've had a lot of back and forth regarding how to trigger CI on forks with varying success. I think doing this would allow the codecov CI run, but I am not 100% sure. I might still need to update the workflow or do something like what is mentioned here https://securitylab.github.com/research/github-actions-preventing-pwn-requests

rly · 2021-04-21T06:23:46Z

@dsleiter Oh, never mind. It looks like GitHub Actions does run and uploads to https://app.codecov.io/gh/agencyenterprise/hdmf and probably for security reasons, codecov does not comment in this PR and the checks are not shown here. I might look into this more, but it's not that important.

rly · 2021-04-21T07:15:05Z

Looks all good to me! Thanks @dsleiter!

rly · 2021-04-21T07:18:12Z

Should specs that only define neurodata_type_inc be allowed to add fields to a data type?

If not, then I think either spike_times needs to define a new type via neurodata_type_def, or resolution needs to be moved somewhere else.

I'll look into this before we release the next version of nwb-schema.

dsleiter · 2021-04-21T08:21:27Z

Thanks for the reviews and merge @ajtritt and @rly!

@rly RE: codecov - nice to see that the github actions are working and the report is at least uploaded to our codecov. Looks like the comment on the PR is made by a codecov bot - do you have this github app configured for this repo? https://github.com/apps/codecov I will see if we can get it installed on agencyenterprise/hdmf, in which case maybe it'll post from our codecov to any of my PRs here.

t-b · 2021-04-21T11:16:52Z

@dsleiter Yes, nice work. Especially thanks for picking it up where I left it.

This reverts commit 67cffec.

* Fix hdmf-dev#585 * Ref hdmf-dev#542 * Builder can now be validated against more than one spec in order to validate against additional fields added to inner data types * Also make validation Errors comparable as a way to remove duplicates that can sometimes be generated

…ecs (#609) * Validate builders against both top level data type specs and inner specs * Fix #585 * Ref #542 * Builder can now be validated against more than one spec in order to validate against additional fields added to inner data types * Also make validation Errors comparable as a way to remove duplicates that can sometimes be generated * Update changelog * Ref #585 * Fix pynwb validation errors related to reference and compound data types * Ref #585 * This is just a workaround for checking the data_type of BuilderH5ReferenceDataset and BuilderH5TableDataset objects * Plan to add unit tests after some discussion to validate the approach * Remove validator reference to H5-specific classes and add unit tests * use ReferenceResolver instead of referencing BuilderH5ReferenceDataset or BuilderH5TableDataset * Fix #585 * Update tests/unit/validator_tests/test_validate.py * Update tests/unit/validator_tests/test_validate.py Co-authored-by: Ryan Ly <[email protected]>

dsleiter added 5 commits March 18, 2021 14:53

Refactor validator matchers

e5d7f48

Update changelog

87545c6

ajtritt requested changes Mar 20, 2021

View reviewed changes

dsleiter added 2 commits March 22, 2021 15:11

Add docstrings and refactor for clarity

d283e57

Refactor SpecMatch class and usage to improve clarity

c39ecfc

dsleiter requested a review from ajtritt March 22, 2021 06:25

Merge remote-tracking branch 'upstream/dev' into enh/validation_finds…

79c8de0

…_unexpected_elements

Update H5RoundTripMixin tests to no longer fail on validation warning

3288c50

* Responding to: hdmf-dev#542 (comment) * Create severity level constants for warnings and errors

dsleiter added 3 commits April 14, 2021 15:09

Merge branch 'dev' into enh/validation_finds_unexpected_elements

328dc06

Update changelog

6cc2c49

Use reserved attribute names from inheriting classes

9285413

* Ref NeurodataWithoutBorders/pynwb#942 * Ref NeurodataWithoutBorders/pynwb#1178

Add unit tests to verify that reserved attribute keys can be changed …

1ac6ba9

…by inheriting specs * Ref NeurodataWithoutBorders/pynwb#942 * Ref NeurodataWithoutBorders/pynwb#1178

Merge branch 'dev' into enh/validation_finds_unexpected_elements

9d5f9a5

ajtritt previously approved these changes Apr 21, 2021

View reviewed changes

Add coverage exception

74186ef

Also try to trigger GH Action for codecov...

rly dismissed ajtritt’s stale review via 74186ef April 21, 2021 06:08

rly approved these changes Apr 21, 2021

View reviewed changes

rly merged commit 67cffec into hdmf-dev:dev Apr 21, 2021

rly mentioned this pull request Apr 21, 2021

Validation find unexpected elements #288

Closed

rly mentioned this pull request Apr 21, 2021

Clarify whether fields can be added to groups/dsets with only data_type_inc hdmf-dev/hdmf-schema-language#13

Closed

dsleiter deleted the enh/validation_finds_unexpected_elements branch April 21, 2021 08:00

rly added a commit that referenced this pull request Apr 22, 2021

Revert "Validation finds unexpected elements (#542)"

1063088

This reverts commit 67cffec.

rly mentioned this pull request Apr 22, 2021

Revert "Validation finds unexpected elements (#542)" #583

Closed

rly added a commit that referenced this pull request Apr 22, 2021

Revert "Validation finds unexpected elements (#542)"

90d5963

This reverts commit 67cffec.

rly mentioned this pull request Apr 22, 2021

Revert "Validation finds unexpected elements (#542)" #584

Merged

rly added a commit that referenced this pull request Apr 22, 2021

Revert "Validation finds unexpected elements (#542)" (#584)

c92378a

This reverts commit 67cffec.

dsleiter mentioned this pull request Apr 22, 2021

Validator should allow extensions to data_types which only define data_type_inc #585

Closed

5 tasks

rly mentioned this pull request May 13, 2021

Account for all loaded namespaces during validation #608

Open

5 tasks

dsleiter mentioned this pull request May 13, 2021

Validate builders against both top level data type specs and inner specs #609

Merged

6 tasks

dsleiter mentioned this pull request Jun 3, 2021

Create validation warnings on extra fields #624

Open

5 tasks

t-b mentioned this pull request Jan 12, 2023

Validation find unexpected elements NeurodataWithoutBorders/pynwb#1178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation finds unexpected elements #542

Validation finds unexpected elements #542

dsleiter commented Mar 19, 2021

ajtritt left a comment

dsleiter commented Mar 22, 2021

dsleiter commented Mar 30, 2021

rly commented Mar 30, 2021

rly commented Mar 30, 2021

dsleiter commented Apr 6, 2021

dsleiter commented Apr 6, 2021

t-b commented Apr 13, 2021

dsleiter commented Apr 14, 2021

t-b commented Apr 14, 2021

dsleiter commented Apr 15, 2021

t-b commented Apr 15, 2021 •

edited

Loading

ajtritt left a comment

rly commented Apr 21, 2021

rly commented Apr 21, 2021 •

edited

Loading

rly commented Apr 21, 2021

rly commented Apr 21, 2021

dsleiter commented Apr 21, 2021

t-b commented Apr 21, 2021

Validation finds unexpected elements #542

Validation finds unexpected elements #542

Conversation

dsleiter commented Mar 19, 2021

Motivation

How to test the behavior?

Checklist

ajtritt left a comment

Choose a reason for hiding this comment

dsleiter commented Mar 22, 2021

dsleiter commented Mar 30, 2021

rly commented Mar 30, 2021

rly commented Mar 30, 2021

dsleiter commented Apr 6, 2021

dsleiter commented Apr 6, 2021

t-b commented Apr 13, 2021

dsleiter commented Apr 14, 2021

t-b commented Apr 14, 2021

dsleiter commented Apr 15, 2021

t-b commented Apr 15, 2021 • edited Loading

ajtritt left a comment

Choose a reason for hiding this comment

rly commented Apr 21, 2021

rly commented Apr 21, 2021 • edited Loading

rly commented Apr 21, 2021

rly commented Apr 21, 2021

dsleiter commented Apr 21, 2021

t-b commented Apr 21, 2021

t-b commented Apr 15, 2021 •

edited

Loading

rly commented Apr 21, 2021 •

edited

Loading