Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent delete orphans #1002

Merged
merged 21 commits into from
Jun 26, 2024
Merged

Prevent delete orphans #1002

merged 21 commits into from
Jun 26, 2024

Conversation

CBroz1
Copy link
Member

@CBroz1 CBroz1 commented Jun 8, 2024

Description

Changes for #951 ...

  • Tables now capture 'part masters' for cued deletion related to DJ 151 - these are tables with fks to tables other than master within the descendants of a given table
  • A RestrGraph object is now used to cascade a restriction down all at once. This removes the need for TableChains independently moving from a given node to each merge part individually.
  • delete_downstream_merge has been renamed to delete_downstream_parts to reflect expanded scope

Changes for #948 ...

  • Using the same RestrGraph as above proved difficult when it came to ignoring cross-pipeline intervals. A position interval should not be deleted when position data is pared with another modality being deleted.
  • Instead, IntervalList is given its own nightly_cleanup func to mirror the others, subtracting all child tables. This clears all orphans, every time, at the cost of building that table list each time. Future work could cache this list somewhere or demote this operation to running nightly, rather than at every delete call
  • The delete func also includes deletes of external files and orphaned external refs. Less costly, but could similarly be demoted to a daily call via cronjob

Added tests

This pr adds coverage for ...

  • Exports
  • RestrGraph
  • Long distance restriction table banning
  • is_merge_table
Coverage report
---------- coverage: platform linux, python 3.9.19-final-0 -----------
Name                                                       Stmts   Miss  Cover   Missing
----------------------------------------------------------------------------------------
src/spyglass/common/common_behav.py                          259     38    85%   47-51, 84, 198, 226, 312-316, 320-349, 352, 393-394, 429, 440, 451-457, 487, 524-525, 544, 614-616, 619-620, 646-650
src/spyglass/common/common_dandi.py                           94     94     0%   1-256
src/spyglass/common/common_device.py                         202     18    91%   90, 122, 288, 391-396, 408, 540, 594-597, 658-662, 704-720
src/spyglass/common/common_dio.py                             52      3    94%   41-45, 129
src/spyglass/common/common_ephys.py                          317    181    43%   74, 111, 176-185, 210-252, 287-292, 295, 309, 381-385, 455-546, 550-559, 562-563, 614-696, 711-900, 903-904
src/spyglass/common/common_filter.py                         188     19    90%   23-24, 92-96, 99-102, 138-139, 338-341, 379, 390-398, 472
src/spyglass/common/common_interval.py                       195      5    97%   52-53, 159, 529, 539
src/spyglass/common/common_lab.py                             95      8    92%   58-59, 143-144, 247-248, 271-272
src/spyglass/common/common_nwbfile.py                        249     87    65%   211, 225, 242-243, 285-305, 349-352, 359, 366, 442-517, 544-608, 625-644, 662-663, 677, 681-686, 764, 771
src/spyglass/common/common_position.py                       269     29    89%   33-35, 190, 359-360, 367-368, 629-630, 701-767, 775-776
src/spyglass/common/common_region.py                          14      0   100%
src/spyglass/common/common_ripple.py                         122     83    32%   24-28, 49-52, 74-95, 112-123, 141-177, 181, 184, 188-262, 272-284, 297-316, 324-354
src/spyglass/common/common_sensors.py                         24      2    92%   36-37
src/spyglass/common/common_session.py                        105     13    88%   105, 150-154, 162, 174-177, 261-267
src/spyglass/common/common_subject.py                         25      5    80%   25-26, 34-37
src/spyglass/common/common_task.py                            82     16    80%   34-40, 120-121, 145, 178-185
src/spyglass/common/common_usage.py                          173     38    78%   85, 106-107, 165, 174, 182, 217, 228-232, 240-244, 271, 284-288, 317, 374, 429-433, 437-448
src/spyglass/common/errors.py                                  2      0   100%
src/spyglass/common/populate_all_common.py                    47     12    74%   40-46, 85-88, 157-159, 162-168
src/spyglass/common/prepopulate/prepopulate.py                44     31    30%   19, 24-74, 85-94
src/spyglass/common/signal_processing.py                      14     14     0%   1-53
src/spyglass/lfp/analysis/v1/lfp_band.py                     136     10    93%   108, 241, 243, 263, 277-281, 357-361, 431, 459
src/spyglass/lfp/lfp_electrode.py                             24      0   100%
src/spyglass/lfp/lfp_imported.py                              12      0   100%
src/spyglass/lfp/lfp_merge.py                                 22      0   100%
src/spyglass/lfp/v1/lfp.py                                    66      4    94%   120-124, 183-184
src/spyglass/lfp/v1/lfp_artifact.py                           55     25    55%   128-207
src/spyglass/lfp/v1/lfp_artifact_MAD_detection.py             37     29    22%   40-59, 77, 102, 124-132, 152-171
src/spyglass/lfp/v1/lfp_artifact_difference_detection.py      79     69    13%   71-223, 252-279
src/spyglass/linearization/merge.py                           14      0   100%
src/spyglass/linearization/v0/main.py                         48     24    50%   60-62, 69-70, 84-88, 122-187, 190
src/spyglass/linearization/v1/main.py                         54      9    83%   54-56, 63-64, 78-82, 187
src/spyglass/position/position_merge.py                       87     50    43%   106-107, 110-123, 139-262
src/spyglass/position/v1/dlc_decorators.py                    18      5    72%   11, 16-18, 22
src/spyglass/position/v1/dlc_reader.py                       110     24    78%   24, 38, 44-45, 51, 57-58, 61, 70, 74, 80-81, 135-137, 146, 149-162, 214, 218
src/spyglass/position/v1/dlc_utils.py                        492    265    46%   58, 61, 69, 72, 97-100, 104, 149-161, 232-235, 239-241, 246, 259, 280, 293-305, 310-316, 328-341, 356-373, 394, 405, 414, 490, 497-498, 540, 558-571, 604-611, 621-622, 651-667, 692-746, 772-782, 839, 847-849, 860-866, 889, 943, 945, 949-957, 1028-1029, 1036-1037, 1041-1299
src/spyglass/position/v1/position_dlc_centroid.py            210     11    95%   159, 173, 204-212, 219, 226, 245, 275, 704
src/spyglass/position/v1/position_dlc_cohort.py               44      1    98%   118
src/spyglass/position/v1/position_dlc_model.py               132     29    78%   35-39, 191-194, 196, 208, 281-325, 343
src/spyglass/position/v1/position_dlc_orient.py              120     32    73%   60, 112-132, 143, 220-229, 235, 249-278
src/spyglass/position/v1/position_dlc_pose_estimation.py     136      4    97%   66, 73, 120, 261
src/spyglass/position/v1/position_dlc_position.py            182     34    81%   54, 100, 198-199, 206-220, 354-356, 361, 383, 386, 408, 444-467
src/spyglass/position/v1/position_dlc_project.py             236    108    54%   128-205, 278-303, 316, 347, 361-413, 425, 457, 476-479, 486-489, 514-555, 582, 596
src/spyglass/position/v1/position_dlc_selection.py           140      5    96%   213, 334, 378, 402, 414
src/spyglass/position/v1/position_dlc_training.py             91      5    95%   143-144, 161, 207-210
src/spyglass/position/v1/position_trodes_position.py         174      8    95%   67, 282-283, 361-362, 496, 502-503
src/spyglass/utils/database_settings.py                       99     15    85%   147, 162, 173-175, 179-183, 195, 211-214, 222, 225
src/spyglass/utils/dj_graph.py                               466     26    94%   48-49, 118, 368, 400, 467, 471, 609, 626-628, 643, 696, 892, 911, 917-918, 953, 979, 981, 1071-1073, 1086, 1111, 1133
src/spyglass/utils/dj_helper_fn.py                           159     64    60%   42, 65, 131-141, 153, 253, 262-263, 274, 328-390, 406-413, 433-449, 465-474
src/spyglass/utils/dj_merge_tables.py                        266     68    74%   35, 37, 43-47, 74-75, 91, 153, 179, 235, 241, 323-325, 345, 351-352, 361, 412, 418, 456-462, 483-502, 525, 586, 592, 603, 650, 658, 681-682, 685, 687, 710, 721, 757-758, 768, 778-784, 788-798, 805-808, 825-836
src/spyglass/utils/dj_mixin.py                               331     44    87%   91, 221-232, 241-249, 282-288, 327-328, 384, 400, 453, 503-537, 615, 639-640, 731, 738, 886, 909-910
src/spyglass/utils/logging.py                                 17      2    88%   26-27
src/spyglass/utils/nwb_helper_fn.py                          163     35    79%   52-81, 100, 119, 127-132, 174, 177, 186, 210, 234-235, 279, 297, 407, 520, 537, 545-563
----------------------------------------------------------------------------------------
TOTAL                                                       6792   1701    75%

Minor changes

  • Add data folder from tests to gitignore to ignore all file types, including new h5 and sh files from DLC and export processes
  • Edit docs and notebooks to reflect renamed delete_downstream_parts
  • Fix bug in export process whereby only files from the last analysis were collected
  • RestrGraph:
    • make deepcopy of graph to avoid table load() calls undoing cascade work
    • promote dunder methods up the class heirarchy
    • fix camel and ensure_name methods to handle lists of tables
    • Add warn arg to clean up verbose printouts across diff uses of cascade and get_ft
    • fix bug where masters with multiple parts would halt a cascade because direction could not be inferred from path
    • use graph distance as replacement approximation of 'downstreamness' vs previous chain length
    • Allow graphs to go both up and down - currently unused, holdover from dead end of getting potential interval list orphans via up cascade
  • TableChain
    • Remove protections against merge tables in TableChains. Under new cascade architecture, shouldn't be any issues with ambiguous path
    • Remove redundant all_ft property, inherit instead
    • Fix bug where parent/child reported as unrestricted because cascade does not continue when table is empty - now, all tables in path set empty

Checklist:

  • No. This PR should be accompanied by a release: (yes/no/unsure)
  • N/a If release, I have updated the CITATION.cff
  • No. This PR makes edits to table definitions: (yes/no)
  • N/a If table edits, I have included an alter snippet for release notes.
  • N/a If this PR makes changes to position, I ran the relevant tests locally.
  • I have updated the CHANGELOG.md with PR number and description.
  • I have added/edited docs/notebooks to reflect the changes

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CBroz1 CBroz1 marked this pull request as ready for review June 11, 2024 18:06
@edeno edeno requested review from samuelbray32 and edeno and removed request for samuelbray32 June 11, 2024 18:09
@edeno edeno requested a review from samuelbray32 June 11, 2024 18:27
@samuelbray32
Copy link
Collaborator

Thanks Chris. I'll get into this tomorrow. First question from a quick glance though. It seems like the nightly cleanup of IntervalList removes all orphaned entries.

I know that at least I have used the IntervalList table to store manually created intervals before (aknowledge it isn't best practice, and most those entries should be in a table at this point). I think it would be good to check with the lab if anyone is using the orphan entries before we execute that cleanup

@CBroz1
Copy link
Member Author

CBroz1 commented Jun 12, 2024

I know that at least I have used the IntervalList table to store manually created intervals before

That's a good point. I had assumed existing entries were just a byproduct of the current delete process. Here's a snippet to look at existing orphans...

from spyglass.common import IntervalList, Session

all_orphans = IntervalList().nightly_cleanup(dry_run=True)
exp_sess_tbl = dj.U('nwb_file_name', 'lab_member_name')
orphans_by_experimenter = exp_sess_tbl & (all_orphans * Session * Session.Experimenter)
orphans_no_experimenter = all_orphans * Session - Session.Experimenter

@edeno
Copy link
Collaborator

edeno commented Jun 12, 2024

Is there a way to tie the interval orphan cleanup to ones specific to pipelines?

@CBroz1
Copy link
Member Author

CBroz1 commented Jun 12, 2024

Is there a way to tie the interval orphan cleanup to ones specific to pipelines?

I can cascade the delete restriction up, but, in doing so, I may capture restrictions used across pipelines. Is there something else you had in mind?

@edeno
Copy link
Collaborator

edeno commented Jun 12, 2024

I suppose we don't know how interval lists are generated necessarily unless they have the pipeline column. I was thinking that manually inserted vs. inserted automatically would be a useful distinction.

@@ -262,30 +260,27 @@ def _import_merge_tables(self):
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of how delete_downtream_parts finds part tables, should the Group tables PositionGroup and SortedSpikesGroup be imported here to in order to have them in the graph when found?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these items and renamed the func accordingly

@CBroz1
Copy link
Member Author

CBroz1 commented Jun 14, 2024

The downstream portions of this work would be replaced by datajoint-python #1158

@CBroz1 CBroz1 mentioned this pull request Jun 17, 2024
5 tasks
@CBroz1 CBroz1 requested a review from samuelbray32 June 20, 2024 21:51
@CBroz1 CBroz1 requested a review from samuelbray32 June 24, 2024 18:57
@samuelbray32
Copy link
Collaborator

Sorry I don't know the graphs well enough to comment more specifically:

When I run this on the master branch I'm able to execute successfully.

from spyglass.position.v1 import TrodesPosV1
key = {"nwb_file_name":"minirec20230622_.nwb"}
(TrodesPosV1 & key).delete(key)

but on this one I need to first import RippleTimesV1, LinearizedPositionV1, and MuaEventsV1. Since this list could get pretty long when deleting from further upstream, is there a way to avoid making this required?

@CBroz1
Copy link
Member Author

CBroz1 commented Jun 25, 2024

Since this list could get pretty long when deleting from further upstream, is there a way to avoid making this required?

We can always stack more onto the list of import_part_masters. I was hesitant to do so, not wanting to import too much behind the scenes. This is the list of items that I found often needed importing in testing. I'm not sure what makes these nodes special, but they prevented the errors. The downside of baking them in is that importing just a subset of the package is no longer optional

@samuelbray32
Copy link
Collaborator

samuelbray32 commented Jun 25, 2024

I'm not sure what makes these nodes special, but they prevented the errors.

My guess:

  • Looking at them, they all seem to be either tables immediately downstream of a merge table, or from files not in the __init__.py file of their modules.
  • The current delete implementation requires things to be loaded into the namespace.
    • this was solved for custom tables by stopping cascade when you reach those
    • this is solved for part/merge tables by loading those specifically
      • loading the part/merge tables brings the classes defined by their modules __init__.py files into the namespace
      • these problem classes are either in a different module or not in _init__.py and so don't get loaded in

I think running deletes with this method is going to require large chunks of the package. I would lean towards having that done for the user by the function if it's required, even if it loads more than necessary

@CBroz1
Copy link
Member Author

CBroz1 commented Jun 25, 2024

I've added these potential blockers to the automatic import list

@edeno
Copy link
Collaborator

edeno commented Jun 25, 2024

@samuelbray32 are you satisfied with these changes?

@samuelbray32
Copy link
Collaborator

samuelbray32 commented Jun 26, 2024

I think it works, just one more in case it helps with generalizability. In this line of _get_ft it has a fallback if a node is not returnes byget_node()

if not (ft := self._get_node(table).get("ft")):
            ft = FreeTable(self.connection, table)
            self._set_node(table, "ft", ft)

However get_node() raises an error if it's not already in the graph. which prevents that option in _get_ft()

def _get_node(self, table: Union[str, Table]):
        """Get node from graph."""
        table = self._ensure_name(table)
        table = self._ensure_names(table)
        if not (node := self.graph.nodes.get(table)):
            raise ValueError(
                f"Table {table} not found in graph."
                + "\n\tPlease import this table and rerun"
            )
        return node

I don't have a local database setup to test while the servers are down, but would having an argument in get_node() to ignore this error and return None avoid the import issues?

If not then fine to merge

@CBroz1
Copy link
Member Author

CBroz1 commented Jun 26, 2024

would having an argument in get_node() to ignore this error and return None avoid the import issues?

My first attempt at 'ignore outside nodes' looked something like this, where the check was in the deepest call, fetching/storing node info. I hit an issue in the _get_edge logic because I was looking for edges not in the graph, and decided to keep the _is_out logic to the parent funcs, like bridge_restr. Do you have another use-case in mind where we would want to run get_node in this fail-safe way?

@edeno edeno merged commit b89ea99 into LorenFrankLab:master Jun 26, 2024
7 checks passed
@CBroz1 CBroz1 deleted the ndo branch June 27, 2024 00:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants