Support Object IDs in NWBFile #991

ajtritt · 2019-07-01T21:51:23Z

Motivation

Checklist

Have you checked our Contributing document?
Have you ensured the PR description clearly describes problem and the solution?
Is your contribution compliant with our coding style ? This can be checked running flake8 from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using #XXX notation where XXX is the issue number ?

codecov · 2019-07-29T18:18:53Z

Codecov Report

Merging #991 into dev will increase coverage by 0.07%.
The diff coverage is 60.86%.

@@            Coverage Diff             @@
##              dev     #991      +/-   ##
==========================================
+ Coverage   72.29%   72.37%   +0.07%     
==========================================
  Files          36       36              
  Lines        2718     2722       +4     
  Branches      513      514       +1     
==========================================
+ Hits         1965     1970       +5     
+ Misses        632      627       -5     
- Partials      121      125       +4

Impacted Files	Coverage Δ
src/pynwb/spec.py	`86.31% <ø> (ø)`	⬆️
src/pynwb/file.py	`76.4% <50%> (-1.2%)`	⬇️
src/pynwb/core.py	`71.92% <69.23%> (+0.54%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6bd603f...4852325. Read the comment docs.

sgratiy · 2019-08-16T21:45:08Z

@oruebel
The object_id changes when re-creating nwb file from the same data. This is problematic for regression testing when you want to create an nwb files from test data and compare to the existing test nwb file. I do the comparison with h5diff and it fails because of the differences in object_id. h5diff does not have an option to exclude some attributes when comparing files.
Is there a way to opt out of saving object_id into nwb file?

oruebel · 2019-08-19T17:52:46Z

@sgratiy I'm not sure its a good idea to give folks the option to change object_id. The intend of object_id is to provide a unique identifier for data. The object_id is not a hash of the data, but data that is generated at different times (i.e., even if you make a copy) are and should be considered different for the purpose of the object_id. I am afraid that allowing folks to set object_id will cause more trouble than benefit.

If I understand your use-case correctly what you are doing is:

Save a file as reference
Create in your test a new file with the same data
Run h5diff between the file created in the test and the one you stored as a reference

I think for this specific use-case of regression testing it will be best to fix the issue in the regression test itself. A few possible solutions may be:

You could ignore any diffs for object_id in the test, since differences in object_id are intentional. h5diff has an --exclude-path option, but I'm not sure if it works for attributes
You could implement a custom diff function using h5py (e.g., as a static method on the HDF5IO backend in HDMF) with an option to ignore object_id.

You could overwrite the object_id value after you write the file and before running h5diff. Using h5py this would probably look something like this:

import h5py
f = h5py.File('mfile', 'w')
def overwrite_object_id(name, h5obj):
     if 'object_id' in h5pobj.attrs:
          h5obj.attrs['object_id'] = 'Undefined'

You could monkey patch the object_id property of the Container class in HDMF for your regression tests to set the object_id to some fixed value https://github.com/hdmf-dev/hdmf/blob/e13fc0bbfcdb927fba291db5a9de2798036ea6ac/src/hdmf/container.py#L35-L39 with something like:
```
from hdmf import Container
def fixed_property(self):
      return "Undefined"
Container.object_id = property(fixed_property)
```
https://github.com/NeurodataWithoutBorders/diff could be useful. I don't know the tool and I don't think it is being actively maintained but it may provide you with a starting point for what you need.

sgratiy · 2019-08-20T00:33:37Z

@oruebel Yes, you correctly interpreted my use case. The --exclude-path option in h5diff does not apply for attributes. Thanks for a detailed response and outlining possible solutions. I truly appreciate it. I have decided to go with the monkey patching suggestion as it was the easiest to implement.

This solution will work for me, but I am still not quite sure about your statement

The intend of object_id is to provide a unique identifier for data.
By "data" I understand the actual data that is stored in the file and could be used by scientists. In that case case the current object_id is not unique identified for data. Instead it is identified for an object created in memory. I am not sure that uuid would be as useful as data hash.

oruebel · 2019-08-20T18:20:37Z

Instead it is identified for an object created in memory.

It identifies a specific object in a specific file.

I am not sure that uuid would be as useful as data hash.

The problem with data hash is that you can easily have different data with the same hash. Also, generating a good hash for large data is time consuming. A data hash and object_id serve different purposes.

ajtritt and others added 12 commits June 27, 2019 14:54

add LabelledDict default search key

cfe29fb

add neurodata specific name for data_id

460537f

index all objects in an NWB file

b6df822

drop neurodata_id rename, rename data_id to object_id

84d4ff5

fix import, add object_id resolution to roundtrip test

bc643aa

Clean up tests and check object id matches in roundtrip tests

a442d80

Merge branch 'dev' into enh/data_id

a32c1ef

Make test modular storage not extend testroundtrip

33dda59

Test string printing in roundtrip properly

9d9e950

Set parent attribute instead of using add_child (deprecated)

079c4b2

Do not save object_ids in map if none. Warn if None.

40fdde3

Add script to add object ID to files without it

63dcd16

ajtritt requested review from rly and bendichter July 1, 2019 21:51

Merge branch 'dev' into enh/data_id

a04cd34

ajtritt requested a review from lawrence-mbf July 1, 2019 21:53

bendichter and others added 8 commits July 21, 2019 11:42

Merge branch 'dev' into enh/data_id

9fcd0b0

Merge branch 'dev' into enh/data_id

5cdd182

Remove unused set_parents function

2dd2080

Do not set parent when it already exists so that a link can be created

2dc17ba

Fix flake8

4a4f3e3

Fix for when parent is still a Proxy

5590174

Fix printing of DataIO wrapped dataset that has been closed

a1b052a

Fix modular storage test errors

cb14b8f

rly added 4 commits July 29, 2019 17:16

Enhance modular storage test, requires latest hdmf

b042657

Merge branch 'dev' into enh/data_id

27081c7

Merge branch 'dev' into enh/data_id

5dff664

Merge branch 'dev' into enh/data_id

4852325

oruebel approved these changes Jul 30, 2019

View reviewed changes

ajtritt merged commit 4cd273e into dev Jul 30, 2019

rly deleted the enh/data_id branch July 31, 2019 00:58

sgratiy mentioned this pull request Aug 19, 2019

Add regression test for nwb1 to nwb 2 conversion AllenInstitute/ipfx#271

Closed

oruebel mentioned this pull request Aug 19, 2019

Object_id is not unique to object data #1041

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Object IDs in NWBFile #991

Support Object IDs in NWBFile #991

ajtritt commented Jul 1, 2019

codecov bot commented Jul 29, 2019 •

edited

Loading

sgratiy commented Aug 16, 2019 •

edited

Loading

oruebel commented Aug 19, 2019

sgratiy commented Aug 20, 2019

oruebel commented Aug 20, 2019

Support Object IDs in NWBFile #991

Support Object IDs in NWBFile #991

Conversation

ajtritt commented Jul 1, 2019

Motivation

Checklist

codecov bot commented Jul 29, 2019 • edited Loading

Codecov Report

sgratiy commented Aug 16, 2019 • edited Loading

oruebel commented Aug 19, 2019

sgratiy commented Aug 20, 2019

oruebel commented Aug 20, 2019

codecov bot commented Jul 29, 2019 •

edited

Loading

sgratiy commented Aug 16, 2019 •

edited

Loading