Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge duplicate funcs #977

Open
3 tasks
CBroz1 opened this issue May 17, 2024 · 4 comments
Open
3 tasks

Merge duplicate funcs #977

CBroz1 opened this issue May 17, 2024 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@CBroz1
Copy link
Member

CBroz1 commented May 17, 2024

Process

While writing pytests, I noticed quite a few funcs that were copies of methods shared across tables, both across different versions of a pipeline and across different pipelines. I put together the following script to map out where funcs are duplicated, ignoring funcs shared via inheritance, and funcs with contextual definitions like insert_default

Script
import importlib
import inspect
import pkgutil
from collections import defaultdict

import datajoint as dj
import ndx_franklab_novela
from hdmf.data_utils import GenericDataChunkIterator
from probeinterface import Probe
from spikeinterface.core import BaseRecordingSegment, BaseSorting
from tqdm import tqdm

import spyglass
from spyglass.utils import SpyglassMixin, _Merge
from spyglass.utils.dj_graph import TableChain

ignored_functions = set(
    dir(dj.Computed)
    + dir(SpyglassMixin)
    + dir(TableChain)
    + dir(_Merge)
    + dir(ndx_franklab_novela)
    + dir(GenericDataChunkIterator)
    + dir(BaseSorting)
    + dir(BaseRecordingSegment)
    + dir(Probe)
    + [
        "insert_default",
        "fetch1_dataframe",
        "fetch_dataframe",
        "_no_transaction_make",
        "nwb_object" "log",
        "cleanup",
        "nightly_cleanup",
        "increment_access",
        "_to_dict",
        "to_dict",
        "_from_dict",
    ]
)
ignored_classes = set(
    [
        "SpyglassMixin",
        "SpyglassMixinPart",
        "Merge",
        "_Merge",
        "TableChain",
        "RestrGraph",
        "AbstractGraph",
    ]
)
all_ignored = ignored_functions | ignored_classes


def load_funcs(package_name=spyglass.__name__):
    function_paths = defaultdict(list)
    for importer, module_name, _ in tqdm(
        pkgutil.walk_packages(spyglass.__path__),
        desc="Loading cache",
        total=127,
    ):
        module = importer.find_module(module_name).load_module(module_name)

        # Iterate through all objects (functions and classes) in the module
        for name, obj in inspect.getmembers(module):
            if (
                getattr(obj, "__module__", "") != module_name
                or name in all_ignored
            ):
                continue
            if inspect.isfunction(obj):
                function_paths[name].append(module_name)
            elif inspect.isclass(obj):
                for name, obj in inspect.getmembers(obj):
                    if (
                        inspect.isfunction(obj)
                        and name not in ignored_functions
                    ):
                        function_paths[name].append(module_name)
    return {k: v for k, v in function_paths.items() if len(v) > 1}


dupe_funcs = load_funcs()
sorted_funcs = sorted(dupe_funcs.keys())
for func in sorted_funcs:
    print("-", func)
    for module in dupe_funcs[func]:
        print("  -", module)
Results
  • _check_artifact_thresholds
    • lfp.v1.lfp_artifact_difference_detection
    • spikesorting.v0.spikesorting_artifact
    • spikesorting.v1.artifact
  • _compute_artifact_chunk
    • spikesorting.v0.spikesorting_artifact
    • spikesorting.v1.artifact
  • _compute_metric
    • spikesorting.v0.spikesorting_curation
    • spikesorting.v1.metric_curation
  • _convert_algorithm_params
    • decoding.v0.dj_decoder_conversion
    • decoding.v1.dj_decoder_conversion
  • _convert_dict_to_class
    • decoding.v0.dj_decoder_conversion
    • decoding.v1.dj_decoder_conversion
    • decoding.v1.temp
  • _convert_env_dict
    • decoding.v0.dj_decoder_conversion
    • decoding.v1.dj_decoder_conversion
    • decoding.v1.temp
  • _convert_environment_to_dict
    • decoding.v0.dj_decoder_conversion
    • decoding.v1.dj_decoder_conversion
  • _convert_transitions_to_dict
    • decoding.v0.dj_decoder_conversion
    • decoding.v1.dj_decoder_conversion
  • _get_artifact_times
    • spikesorting.v0.spikesorting_artifact
    • spikesorting.v1.artifact
  • _get_interval_range
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
  • _get_peak_amplitude
    • decoding.v0.clusterless
    • decoding.v1.waveform_features
  • _get_recording_timestamps
    • spikesorting.v0.spikesorting_recording
    • spikesorting.v1.recording
  • _get_sort_interval_valid_times
    • spikesorting.v0.spikesorting_recording
    • spikesorting.v1.recording
  • _init_artifact_worker
    • spikesorting.v0.spikesorting_artifact
    • spikesorting.v1.artifact
  • _reformat_metrics
    • spikesorting.v0.curation_figurl
    • spikesorting.v1.figurl_curation
  • calculate_position_info
    • common.common_position
    • position.v1.position_trodes_position
  • convert_classes_to_dict
    • decoding.v0.dj_decoder_conversion
    • decoding.v1.dj_decoder_conversion
  • convert_to_pixels
    • common.common_position
    • position.v1.dlc_utils
    • position.v1.position_trodes_position
  • create_figurl
    • mua.v1.mua
    • ripple.v1.ripple
  • create_group
    • decoding.v1.clusterless
    • decoding.v1.core
    • spikesorting.analysis.v1.group
  • discretize_and_trim
    • decoding.v0.visualization_1D_view
    • decoding.v0.visualization_2D_view
  • fetch_environments
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
  • fetch_linear_position_info
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
  • fetch_model
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
  • fetch_position_info
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
  • fetch_results
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
  • fetch_spike_data
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
    • spikesorting.analysis.v1.group
  • fill_nan
    • common.common_position
    • position.v1.dlc_utils
    • position.v1.position_trodes_position
  • generate_pos_components
    • common.common_position
    • position.v1.position_trodes_position
  • get_Kay_ripple_consensus_trace
    • common.common_ripple
    • ripple.v1.ripple
  • get_ahead_behind_distance
    • decoding.v1.clusterless
    • decoding.v1.sorted_spikes
  • get_data_for_multiple_epochs
    • decoding.v0.clusterless
    • decoding.v0.sorted_spikes
  • get_decoding_data_for_epoch
    • decoding.v0.clusterless
    • decoding.v0.sorted_spikes
  • get_networkx_track_graph
    • common.common_position
    • linearization.v0.main
    • linearization.v1.main
  • get_ripple_lfps_and_position_info
    • common.common_ripple
    • ripple.v1.ripple
  • get_time_bins_from_interval
    • decoding.v0.clusterless
    • decoding.v0.sorted_spikes
  • interpolate_to_new_time
    • common.common_ripple
    • ripple.v1.ripple
  • log
    • common.common_nwbfile
    • common.common_nwbfile
  • make_default_decoding_parameters_cpu
    • decoding.v0.clusterless
    • decoding.v0.sorted_spikes
  • make_default_decoding_parameters_gpu
    • decoding.v0.clusterless
    • decoding.v0.sorted_spikes
  • make_video
    • common.common_position
    • position.v1.dlc_utils
    • position.v1.position_trodes_position
  • nwb_object
    • common.common_ephys
    • common.common_ephys
  • plot_ripple
    • common.common_ripple
    • ripple.v1.ripple
  • plot_ripple_consensus_trace
    • common.common_ripple
    • ripple.v1.ripple
  • plot_track_graph
    • common.common_position
    • linearization.v0.main
    • linearization.v1.main
  • plot_track_graph_as_1D
    • common.common_position
    • linearization.v0.main
    • linearization.v1.main
  • populate_from_lock_file
    • lock.file_lock
    • lock.file_lock
  • restore_classes
    • decoding.v0.dj_decoder_conversion
    • decoding.v1.dj_decoder_conversion
  • set_lfp_band_electrodes
    • common.common_ephys
    • lfp.analysis.v1.lfp_band
  • set_lfp_electrodes
    • common.common_ephys
    • common.common_ripple
    • ripple.v1.ripple

Results

There some false positives...

  • Some I'm not catching within inheritance
  • Some coincidental overlap in name for different purposes (e.g., create_group)

But there are also many repetitions across versions of pipelines,

  1. Exact copies (e.g., fill_nan)
  2. Modified copies that could accept an arg to determine if the new functionality should be used (e.g., make_video)
  3. Copies that refer back to the previous version (e.g., generate_pos_components)
  4. Copies that perform similar operations with different foreign key references

Proposed

  • (1) Exact copies should (3) refer to the previous version, or, in the event of import issues, be migrated to a subpackage-level utilities script
  • The original version of (2) modified copies should be replaced with the updated version, including an arg to determine use of new features
  • The original version of (4) items with different references should be edited to extract the core functionality to a private method later versions can use
@CBroz1 CBroz1 added the enhancement New feature or request label May 17, 2024
@edeno
Copy link
Collaborator

edeno commented May 17, 2024

I'm generally in support of this but I think we do want to be careful about making pipelines depend on each other.

@CBroz1
Copy link
Member Author

CBroz1 commented May 17, 2024

Does that mean that any function common to multiple pipelines should be extracted out to the parent folder of all places it is used?

@edeno
Copy link
Collaborator

edeno commented May 17, 2024

I don't have a rule in mind yet. We should just carefully think about how pipelines are supposed to interact and depend on each other.

@CBroz1 CBroz1 self-assigned this May 18, 2024
@CBroz1 CBroz1 mentioned this issue May 21, 2024
6 tasks
@CBroz1
Copy link
Member Author

CBroz1 commented Jul 31, 2024

I took a peek at every duplicate function name and found a lot of overlap.

These are functions that can definitely refer to shared or upstream resources as-is or with minor tweaks

  • _ensure_names
  • _get_interval_range
  • _get_peak_amplitude
  • _get_recording_timestamps
  • _reformat_metrics
  • get_ahead_behind_distance
  • get_firing_rate
  • get_networkx_track_graph
  • get_spike_indicator
  • get_time_bins_from_interval
  • interpolate_to_new_time
  • make_default_decoding_parameters_cpu
  • make_default_decoding_parameters_gpu
  • set_group_by_shank
  • convert_to_pixels
  • discretize_and_trim
  • fill_nan
  • get_Kay_ripple_consensus_trace

These are funcs that would be best resolved by introducing a new
SpyglassParamMixin or SpyglassSelectionMixin class

  • get_default
  • insert_default
  • insert_selection

These are funcs that would need significant refactoring to be resolved

  • make_video
  • plot_ripple
  • plot_ripple_consensus_trace
  • plot_track_graph
  • plot_track_graph_as_1D
  • _compute_metric
  • _get_sort_interval_valid_times
  • get_decoding_data_for_epoch
  • get_recording
  • get_ripple_lfps_and_position_info
  • get_sort_group_info
  • get_sorting
  • set_lfp_band_electrodes
  • set_lfp_electrodes

These funcs are passed to ChunkRecordingExecutor which takes the respective funcs as input with amplitude_thresh renamed to amplitude_thresh_uV. Would require testing to see if reverting name would allow them to function the same.

  • _get_artifact_times - some feature differences
  • _init_artifact_worker
  • _check_artifact_thresholds
  • _compute_artifact_chunk

These are instances that would likely be resolved by #1049

  • _convert_algorithm_params
  • _convert_dict_to_class
  • _convert_env_dict
  • _convert_environment_to_dict
  • _convert_transitions_to_dict
  • _to_dict
  • convert_classes_to_dict
  • restore_classes

@CBroz1 CBroz1 mentioned this issue Jul 31, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants