Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow recompute via _make_file func #1093

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
9192573
WIP: remove AnalysisNwbfileLog
CBroz1 Sep 6, 2024
27f0004
WIP: recompute
CBroz1 Sep 6, 2024
743502d
WIP: recompute 2
CBroz1 Sep 11, 2024
9d23949
WIP: recompute 3
CBroz1 Sep 12, 2024
39f07bf
WIP: recompute 4
CBroz1 Sep 12, 2024
1b38818
WIP: recompute 5, electrodes object
CBroz1 Sep 18, 2024
282d553
WIP: recompute 6, add file hash
CBroz1 Sep 19, 2024
94168de
WIP: recompute 7
CBroz1 Sep 20, 2024
b553f77
Merge branch 'master' of https://github.com/LorenFrankLab/spyglass in…
CBroz1 Sep 20, 2024
a594786
✅ : recompute
CBroz1 Sep 20, 2024
df1800e
w
CBroz1 Oct 21, 2024
6d0df07
Handle groups and links
CBroz1 Oct 21, 2024
1587997
Remove debug
CBroz1 Oct 22, 2024
1ed831e
Add directory hasher
CBroz1 Nov 12, 2024
7547fe2
Merge branch 'rcp' of https://github.com/CBroz1/spyglass into rcp
CBroz1 Nov 13, 2024
23799f8
Merge branch 'master' of https://github.com/LorenFrankLab/spyglass in…
CBroz1 Nov 13, 2024
d0011bf
Update directory hasher
CBroz1 Nov 13, 2024
ad7c74a
WIP: update hasher
CBroz1 Jan 8, 2025
558f38b
WIP: fetch upstream, resolve conflicts
CBroz1 Jan 8, 2025
54a3ca1
WIP: error specificity
CBroz1 Jan 9, 2025
1e41698
Add tables for recompute processing
CBroz1 Feb 4, 2025
ae52aed
WIP: incorporate feedback
CBroz1 Feb 21, 2025
0795fa6
WIP: fetch upstream
CBroz1 Mar 3, 2025
2e89070
WIP: enforce environment restriction
CBroz1 Mar 3, 2025
8c172aa
WIP: fetch upstream
CBroz1 Mar 3, 2025
bfe49d1
WIP: typo
CBroz1 Mar 3, 2025
72f8a25
WIP: add tests
CBroz1 Mar 4, 2025
9c27d87
WIP: start add V0 hasher
CBroz1 Mar 5, 2025
8835794
See details
CBroz1 Mar 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 26 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,27 @@
# Change Log

## [0.5.5] (Unreleased)
## \[0.5.5\] (Unreleased)

### Release Notes

<!-- Running draft to be removed immediately prior to release. -->

<!-- When altering tables, import all foreign key references. -->

```python
import datajoint as dj
from spyglass.spikesorting.v1 import recording as v1rec # noqa
from spyglass.spikesorting.v0 import spikesorting_recording as v0rec # noqa
from spyglass.linearization.v1.main import * # noqa

dj.FreeTable(dj.conn(), "common_nwbfile.analysis_nwbfile_log").drop()
dj.FreeTable(dj.conn(), "common_session.session_group").drop()
TrackGraph.alter() # Add edge map parameter
v0rec.SpikeSortingRecording().alter()
v0rec.SpikeSortingRecording().update_ids()
v1rec.SpikeSortingRecording().alter()
v1rec.SpikeSortingRecording().update_ids()
```

### Infrastructure

Expand All @@ -10,6 +31,7 @@
- Improve cron job documentation and script #1226, #1241
- Update export process to include `~external` tables #1239
- Only add merge parts to `source_class_dict` if present in codebase #1237
- Add recompute ability for `SpikeSortingRecording` #1093

### Pipelines

Expand All @@ -34,6 +56,7 @@
#1108, #1172, #1187
- Add docstrings to all public methods #1076
- Update DataJoint to 0.14.2 #1081
- Remove `AnalysisNwbfileLog` #1093
- Allow restriction based on parent keys in `Merge.fetch_nwb()` #1086, #1126
- Import `datajoint.dependencies.unite_master_parts` -> `topo_sort` #1116,
#1137, #1162
Expand All @@ -45,6 +68,7 @@
- Update DataJoint install and password instructions #1131
- Fix dandi upload process for nwb's with video or linked objects #1095, #1151
- Minor docs fixes #1145
- Add Nwb hashing tool #1093
- Test fixes
- Remove stored hashes from pytests #1152
- Remove mambaforge from tests #1153
Expand Down Expand Up @@ -104,6 +128,7 @@
- Fix bug in `get_group_by_shank` #1096
- Fix bug in `_compute_metric` #1099
- Fix bug in `insert_curation` returned key #1114
- Add fields to `SpikeSortingRecording` to allow recompute #1093
- Fix handling of waveform extraction sparse parameter #1132
- Limit Artifact detection intervals to valid times #1196

Expand Down
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ nav:
- Merge Tables: Features/Merge.md
- Export: Features/Export.md
- Centralized Code: Features/Mixin.md
- Recompute: Features/Recompute.md
- For Developers:
- Overview: ForDevelopers/index.md
- How to Contribute: ForDevelopers/Contribute.md
Expand Down
55 changes: 55 additions & 0 deletions docs/src/Features/Recompute.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Recompute

## Why

Some analysis files that are generated by Spyglass are very unlikely to be
reaccessed. Those generated by `SpikeSortingRecording` tables were identified as
taking up tens of terabytes of space, while very seldom accessed after their
first generation. By finding a way to recompute these files on demand, we can
save significant server space at the cost of an unlikely 10m of recompute time
per file.

Spyglass 0.5.5 introduces the opportunity to delete and recompute both newly
generated files after this release, and old files that were generated before
this release.

## How

`SpikeSortingRecording` has a new `_make_file` method that will be called in the
event a file is accessed but not found. This method will generate the file and
compare it's hash to the hash of the file that was expected. If the hashes
match, the file will be saved and returned. If the hashes do not match, the file
will be deleted and an error raised. For steps to avoid such errors, see the
steps below.

### New files

Newly generated files will automatically record information about their
dependencies and the code that generated them in `RecomputeSelection` tables. To
see the dependencies of a file, you can access `RecordingRecomputeSelection`

```python
from spyglass.spikesorting.v1 import recompute as v1_recompute

v1_recompute.RecordingRecomputeSelection()
```

### Old files

To ensure the replicability of old files prior to deletion, we'll need to...

1. Update the tables for new fields.
2. Attempt file recompute, and record dependency info for successful attempts.

<!-- TODO: add code snippet. 2 or 3 tables?? -->

```python
from spyglass.spikesorting.v0 import spikesorting_recording as v0_recording
from spyglass.spikesorting.v1 import recording as v1_recording

# Alter tables to include new fields, updating values
v0_recording.SpikeSortingRecording().alter()
v0_recording.SpikeSortingRecording().update_ids()
v1_recording.SpikeSortingRecording().alter()
v1_recording.SpikeSortingRecording().update_ids()
```
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ addopts = [
# "--pdb", # drop into debugger on failure
"-p no:warnings",
# "--no-teardown", # don't teardown the database after tests
"--quiet-spy", # don't show logging from spyglass
# "--quiet-spy", # don't show logging from spyglass
# "--no-dlc", # don't run DLC tests
"--show-capture=no",
"--pdbcls=IPython.terminal.debugger:TerminalPdb", # use ipython debugger
Expand Down
20 changes: 19 additions & 1 deletion src/spyglass/common/common_dandi.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,26 @@ class DandiPath(SpyglassMixin, dj.Manual):
dandi_instance = "dandi": varchar(32)
"""

def fetch_file_from_dandi(self, key: dict):
def key_from_path(self, file_path) -> dict:
return {"filename": os.path.basename(file_path)}

def has_file_path(self, file_path: str) -> bool:
return bool(self & self.key_from_path(file_path))

def raw_from_path(self, file_path) -> dict:
return {"filename": Path(file_path).name.replace("_.nwb", ".nwb")}

def has_raw_path(self, file_path: str) -> bool:
return bool(self & self.raw_from_path(file_path))

def fetch_file_from_dandi(
self, key: dict = None, nwb_file_path: str = None
):
"""Fetch the file from Dandi and return the NWB file object."""
if key is None and nwb_file_path is None:
raise ValueError("Must provide either key or nwb_file_path")
key = key or self.key_from_path(nwb_file_path)

dandiset_id, dandi_path, dandi_instance = (self & key).fetch1(
"dandiset_id", "dandi_path", "dandi_instance"
)
Expand Down
8 changes: 2 additions & 6 deletions src/spyglass/common/common_ephys.py
Original file line number Diff line number Diff line change
Expand Up @@ -481,7 +481,7 @@ def make(self, key):
"""
# get the NWB object with the data; FIX: change to fetch with
# additional infrastructure
lfp_file_name = AnalysisNwbfile().create(key["nwb_file_name"]) # logged
lfp_file_name = AnalysisNwbfile().create(key["nwb_file_name"])

rawdata = Raw().nwb_object(key)
sampling_rate, interval_list_name = (Raw() & key).fetch1(
Expand Down Expand Up @@ -571,7 +571,6 @@ def make(self, key):
},
replace=True,
)
AnalysisNwbfile().log(key, table=self.full_table_name)
self.insert1(key)

def nwb_object(self, key):
Expand Down Expand Up @@ -766,9 +765,7 @@ def make(self, key):
6. Adds resulting interval list to IntervalList table.
"""
# create the analysis nwb file to store the results.
lfp_band_file_name = AnalysisNwbfile().create( # logged
key["nwb_file_name"]
)
lfp_band_file_name = AnalysisNwbfile().create(key["nwb_file_name"])

# get the NWB object with the lfp data;
# FIX: change to fetch with additional infrastructure
Expand Down Expand Up @@ -964,7 +961,6 @@ def make(self, key):
"previously saved lfp band times do not match current times"
)

AnalysisNwbfile().log(lfp_band_file_name, table=self.full_table_name)
self.insert1(key)

def fetch1_dataframe(self, *attrs, **kwargs) -> pd.DataFrame:
Expand Down
Loading
Loading