Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache DataStub data #448

Closed
wants to merge 9 commits into from
Closed

cache DataStub data #448

wants to merge 9 commits into from

Conversation

lawrence-mbf
Copy link
Collaborator

Fixes #412

Motivation

Add caching to data stubs.

How to test the behavior?

% dataStub being some dataStub.
dims = dataStub.dims; % should be much faster on an external or network drive due to caching behavior.

Checklist

  • Have you ensured the PR description clearly describes the problem and solutions?
  • Have you checked to ensure that there aren't other open or previously closed Pull Requests for the same change?
  • If this PR fixes an issue, is the first line of the PR description fix #XX where XX is the issue number?

@lawrence-mbf lawrence-mbf requested a review from bendichter July 25, 2022 20:36
@lawrence-mbf lawrence-mbf changed the title cache data stub data cache DataStub data Jul 25, 2022
@codecov
Copy link

codecov bot commented Jul 25, 2022

Codecov Report

Merging #448 (43c95cf) into master (d43160c) will increase coverage by 0.03%.
The diff coverage is 89.83%.

@@            Coverage Diff             @@
##           master     #448      +/-   ##
==========================================
+ Coverage   85.92%   85.96%   +0.03%     
==========================================
  Files         126      126              
  Lines        5216     5236      +20     
==========================================
+ Hits         4482     4501      +19     
- Misses        734      735       +1     
Impacted Files Coverage Δ
+types/+untyped/DataStub.m 92.00% <87.75%> (+0.09%) ⬆️
+io/parseDataset.m 94.11% <100.00%> (+0.36%) ⬆️
+types/+untyped/+datapipe/BoundPipe.m 77.53% <100.00%> (ø)
+types/+untyped/ExternalLink.m 95.65% <100.00%> (+0.12%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@lawrence-mbf lawrence-mbf marked this pull request as draft July 25, 2022 20:48
Since these are also cached.
- datastub constructor can now take dataset and file ids.
- modify parse dataset to use new datastub constructor
Originally took two arguments. Now keyword arguments in External Links
@lawrence-mbf lawrence-mbf marked this pull request as ready for review July 27, 2022 16:42
better organize and isolate control of file id lifetimes when using data stub caching.
H5F.reopen does not create a new id in older versions of MATLAB. Defaulting back into legacy
behavior with ids.
@bendichter
Copy link
Contributor

@lawrence-mbf is this ready to merge?

@lawrence-mbf
Copy link
Collaborator Author

@bendichter I don't think so. There was still some instability in the last hackathon that caused full MATLAB crashes and the "solution" in the last commit only really resolved it at the high level. Unfortunately, I narrowed this down to a MATLAB bug present in versions older than R2021a where H5F.close() would close all file IDs, even if you've cloned it properly.

To reproduce:

filename = 'test.nwb';
datasetPath = '/file_create_date';

fid = H5F.open(filename, 'H5F_ACC_RDONLY', 'H5P_DEFAULT');
sourceDid = H5D.open(fid, datasetPath);
sourceGid = H5G.open(fid, '/general');

% this can occur with both H5F.open and H5F.reopen
fidReopened = H5F.open(filename, 'H5F_ACC_RDONLY', 'H5P_DEFAULT');
% fidReopened = H5F.reopen(fid);
didReopened = H5D.open(fidReopened, datasetPath);

% The error only occurs when ALL HDF5 IDs are closed including the source file id.
% not closing any of the below ids keeps it "alive" and does not cause a crash.
H5D.close(sourceDid);
H5G.close(sourceGid);
H5F.close(fid);

data = H5D.read(didReopened); % MATLAB crashes here.

@bendichter
Copy link
Contributor

I see. Is there any way we could file this with Mathworks?

@lawrence-mbf
Copy link
Collaborator Author

I suppose you could but it is technically fixed in R2021a and later. You could also bring it up in the working group I suppose.

@bendichter
Copy link
Contributor

I think they still maintain patches for older versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache File and Dataset IDs for DataStubs
2 participants