Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dandi export and read #956

Merged
merged 33 commits into from
Jun 6, 2024
Merged

Dandi export and read #956

merged 33 commits into from
Jun 6, 2024

Conversation

samuelbray32
Copy link
Collaborator

@samuelbray32 samuelbray32 commented May 7, 2024

Description

Partial steps for #861

  • Step (3)

    • makes new table DandiPath
      • tracks translations between analysis file name and renamed path in a dandiset
      • has method to stream data from dandi
    • makes function DandiPath.compile_dandiset()
      • Uses dandi api functions to get dandi-compliant names for files
      • organizes and uploads files to dandi
      • populates DandiPath
    • makes admin function Export.prepare_files_for_export()
  • Step (4)

    • adds new fallback to fetch_nwb which streams file from dandi if available

Checklist:

  • This PR should be accompanied by a release: unsure
  • If release, I have updated the CITATION.cff
  • This PR makes edits to table definitions: no
  • N/A If table edits, I have included an alter snippet for release notes.
  • I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes
  • Change the dandi url's from development to production server

@edeno
Copy link
Collaborator

edeno commented May 8, 2024

I know this is still a WIP, but we should also decide whether to make dandi a dependency or an optional dependency.

definition = """
-> Export.File
---
dandiset_id: str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the dandiset_id the same for all files in a given Export? If so, I might avoid some redundancy by having DandiPath be a master table with one part per Export.File

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, the dandiset ID is common for everything in export. However I could see a future case where an analysis file is used in multiple papers and rather than re-uploading a new copy of the file to dandi you would point to it's existing version in a different dandiset. I might lean towards leaving the flexibility for this in the table structure

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does dandi handle this case? Presumably, they wouldn't want to store two copies of the same data file across dandisets. But, if the name is contextually determined by other files, the name would have to differ. I wouldn't be surprised if they had cross-dandiset soft-link tool, but I don't know for sure

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have different names on Dandi, but they would share the same object_id for the file. I haven't tested if that blocks upload.

Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a quick first pass to keep up to speed

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@samuelbray32 samuelbray32 marked this pull request as ready for review May 20, 2024 21:17
@edeno edeno requested a review from CBroz1 May 20, 2024 21:18
@samuelbray32
Copy link
Collaborator Author

samuelbray32 commented May 20, 2024

I'm leaving the functions pointing to the Dandi dev server for now until the other parts of the code are reviewed.

Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, @samuelbray32. I had some comments, but I think the heaviest lifting would be migrating it all to common_dandi to revise the dependency order. I'm happy to help migrating to a more object-oriented approach so the helpers can share properties

# make a temp dir with symbolic links to the export files
source_files = (self.File() & key).fetch("file_path")
paper_dir = f"{export_dir}/{paper_id}"
os.makedirs(paper_dir, exist_ok=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If replacing os with pathlib, you can pathlib_dir.mkdir(parents=True, exist_ok=True)

translations = [
{
**(
self.File() & key & f"file_path LIKE '%{t['filename']}'"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My file_like helper would work here i think

@samuelbray32
Copy link
Collaborator Author

Thanks @CBroz1. I can work on moving the functionality into dandi_common. For another common comment, dandi is already a dependency in pynwb so it shouldn't cause issues for import

@rly
Copy link
Collaborator

rly commented May 22, 2024

FYI dandi is not a dependency of pynwb.

Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple quick comments. I can do a full review later today

Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting some minor changes



def validate_dandiset(
folder, min_severity="ERROR", ignore_external_files=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read a recommendation recently that suggested setting default min_severity values to w/e the lowest is, like warn. The theory being that seeing warnings makes it easier to deal with the errors when they arise. Do you anticipate warnings at this stage that should be hidden from the user?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary use of this function is to check for errors that would prevent upload to Dandi, so the default min_severity is set to the level that would do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense - we can keep as is. If we can find a way to display warnings without interrupting validation, I think that would make maintenance easier - suggestions over time might push users to address warnings before they potentially become errors in future dandi versions

filtered_results = [
i
for i in filtered_results
if not i.message.startswith("Path is not inside")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how I would handle warnings we expect to see given our setup

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, unsure what the recommedation is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment was in ref to the above suggestion to relax min_severity, and then silence items we expect by the same list comprehension means

Copy link
Member

@CBroz1 CBroz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded to some comments re dandi warnings. Approved with or without addressing

@edeno edeno merged commit 3e7d35a into master Jun 6, 2024
7 checks passed
@edeno edeno deleted the dandi_export branch June 6, 2024 18:30
@samuelbray32 samuelbray32 mentioned this pull request Jun 6, 2024
7 tasks
edeno pushed a commit that referenced this pull request Jun 18, 2024
* Give UUID to artifact interval

* Add ability to set smoothing sigma in get_firing_rate (#994)

* add option to set spike smoothing sigma

* update changelog

* Add docstrings to SortedSpikesGroup and Decoding methods (#996)

* Add docstrings

* update changelog

* fix spelling

---------

Co-authored-by: Samuel Bray <[email protected]>

* Add Common Errors doc (#997)

* Add Common Errors

* Update changelog

* Mua notebook (#998)

* documented some of mua notebook

* mua notebook documented

* documented some of mua notebook

* synced py script

* Dandi export and read (#956)

* compile exported files, download dandiset, and organize

* add function to translate files into dandi-compatible names

* add table to store dandi name translation and steps to populate

* add dandiset validation

* add function to fetch nwb from dandi

* add function to change obj_id of nwb_file

* add dandi upload call and fix circular import

* debug dandi file streaming

* fix circular import

* resolve dandi-streamed files with fetch_nwb

* implement review comments

* add admin tools to fix common dandi discrepencies

* implement tool to cleanup common dandi errors

* add dandi export to tutorial

* fix linting

* update changelog

* fix spelling

* style changes from review

* reorganize function locations

* fix circular import

* make dandi dependency optional in imports

* store dandi instance of data in DandiPath

* resolve case of pre-existing dandi entries for export

* cleanup bugs from refactor

* update notebook

* Apply suggestions from code review

Co-authored-by: Chris Broz <[email protected]>

* add requested changes from review

* make method check_admin_privilege in LabMember

---------

Co-authored-by: Chris Broz <[email protected]>

* Minor fixes (#999)

* give analysis nwb new uuid when created

* fix function argument

* update changelog

* Fix bug in change in analysis_file object_id (#1004)

* fix bug in change in analysis_file_object_id

* update changelog

* Remove classes for usused tables (#1003)

* #976

* Remove notebook reference

* Non-daemon parallel populate (#1001)

* initial non daemon parallel commit

* resolve namespace and pickling errors

* fix linting

* update changelog

* implement review comments

* add parallel_make flag to spikesorting recording tables

* fix multiprocessing spawn error on mac

* move propert

---------

Co-authored-by: Samuel Bray <[email protected]>

* Update pipeline column for IntervalList

---------

Co-authored-by: Samuel Bray <[email protected]>
Co-authored-by: Samuel Bray <[email protected]>
Co-authored-by: Chris Broz <[email protected]>
Co-authored-by: Denisse Morales-Rodriguez <[email protected]>
Co-authored-by: Samuel Bray <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants