combine SpikeEventSeries and Clustering #239

bendichter · 2018-10-16T00:15:23Z

2) Feature Request

How would people feel about folding Clustering into SpikeEventSeries by removing Clustering and adding a SpikeEventSeries.cluster field of dtype int? We could make data optional, in case a user only wants to store the times and not the waveforms (this is pretty common).

This would:

Make the relationship between a spike time and the spike waveform more explicit
Remove redundant timestamps that is currently duplicated across the two objects
Resolve the issue that Clustering should have an electrodes field (add electrodes as optional argument to ecephys.Clustering #194)
Resolve the issue that Clustering should be a TimeSeries(make Clustering and FeatureExtraction inherit from TimeSeries #112)

Checklist

Have you ensured the feature or change was not already reported ?
Have you included a brief and descriptive title?
Have you included a clear description of the problem you are trying to solve?
Have you included a minimal code snippet that reproduces the issue you are encountering?

The text was updated successfully, but these errors were encountered:

ajtritt · 2018-10-16T00:24:18Z

The only thing I am unsure of is making SpikeEventSeries.data optional. If a SpikeEventSeries doesn't have data, then it fails to meet the requirements of a TimeSeries.

bendichter · 2018-10-16T01:13:25Z

Would it be bad form to overwrite this?

i.e.

name: SpikeEventSeries:
    - datasets:
        - name: data
          quantity: '?'

...

is that allowed?

ajtritt · 2018-10-16T02:55:46Z

It shouldn’t be. What’s the point of inheritance if you can just toss out everything you inherit?

oruebel · 2018-10-16T05:28:04Z

I agree, making a required field optional should not be allowed in inheritance, if only because it breaks upstream behavior. A TimeSeries without data really isn't a TimeSeries. Do we maybe need a separate type to handle the timestamps-only case?

ajtritt · 2018-10-16T06:00:26Z

What if we just make a new subtype of the base TimeSeries where data stores cluster ids?

bendichter · 2018-10-16T06:34:26Z

@ajtritt yeah that almost works, but I could see cases where a user might want to store waveforms but does not want to cluster them, so I think ideally both would be optional. I like the idea of having a time-only type, @oruebel. I think there are other cases where this would be useful, like instantaneous events.

ajtritt · 2018-10-16T13:56:48Z

Yeah, we would keep SpikeEventSeries, and just add a new type.

tjd2002 · 2018-10-16T14:22:43Z

I'm all for removing 'Clustering'. I've been thinking of Clustering+SpikeEventSeries as being superseded by the new Unit table (for Unit metadata including clustering metrics) and UnitTimes (for spike times of each clustered unit), and expecting that all that older stuff would soon be deprecated. We've been using Unit + UnitTimes to store the output of clustering.

I think this use of Unit + UnitTimes handles the case where there is no per-spike waveform data. So we could keep SpikeEventSeries and add the '.cluster' field as proposed, and then only use it when storing waveforms. I think this would mean we don't have to worry about the no-data case discussed above.

tjd2002 · 2018-10-16T14:35:31Z

This would also resolve #111, which requests additional fields be added to Clustering.

bendichter · 2018-10-16T21:16:20Z

@tjd2002 I'm glad to hear you are using these structures in your lab and they are working for you! It sounds like you are working with a conversion script that goes from some acquisition format and converts to NWB. This is where we'd like labs to be right now, so that's great.

Eventually we would like to work with acquisition groups to automatically save data as NWB files, so no conversion is necessary. I think the Clustering datatype would be useful for this because you can just append to the dataset as you go, whereas the UnitTimes structure might be a bit complicated (I imagine inserting data into the vector would be difficult during streaming and that appending to the end would be much easier, but I could be wrong about this). The "no-data case" discussed above was for cases like streaming live data where you may want to record times and clusters in real time but you might not want to save waveforms. Do you think that use-case might come up?

tjd2002 · 2018-10-17T18:53:13Z

My issue with the older "Clustering+ClusterWaveforms+SpikeEventSeries" was that it was all tied pretty closely to a particular clustering workflow (I think it was KlustaKwik?), and folks also found it was not likely to be performant for larger numbers of units. I see you have proposed some changes to Clustering to address those limitations, but the Unit metadata table seemed like a better all-around solution (more easily extensible, e.g.).

I think it's very confusing to have 2 different facilities (Clustering and friends Vs. Unit/UnitTimes) for the same type of data. If possible it would be great to optimize the storage for streaming without requiring this duplication. Would it be possible to just overload the UnitTimes I/O (by creating something like StreamingUnitTimes which is optimized for appending?) to solve the streaming issue, and keep everything in the nice Unit framework?

For EventWaveforms, I like your suggestion elsewhere to add in a '.cluster' property. This seems like it could handle both the streaming, and the off-line use cases?

tjd2002 · 2019-01-17T02:32:14Z

Conversation seems to be continuing over at #194, in case anyone else is following along.

stephprince · 2024-05-13T17:10:53Z

Closing because Clustering has been deprecated.

bendichter transferred this issue from NeurodataWithoutBorders/pynwb Jan 8, 2019

tjd2002 mentioned this issue Jan 17, 2019

Improve docs for use of new Units table to store spike times. NeurodataWithoutBorders/pynwb#794

Closed

1 task

bendichter mentioned this issue Jan 17, 2019

[WIP] add UnitSeries neurodata_type #245

Draft

stephprince closed this as completed May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

combine SpikeEventSeries and Clustering #239

combine SpikeEventSeries and Clustering #239

bendichter commented Oct 16, 2018

ajtritt commented Oct 16, 2018

bendichter commented Oct 16, 2018

ajtritt commented Oct 16, 2018

oruebel commented Oct 16, 2018

ajtritt commented Oct 16, 2018

bendichter commented Oct 16, 2018

ajtritt commented Oct 16, 2018

tjd2002 commented Oct 16, 2018

tjd2002 commented Oct 16, 2018

bendichter commented Oct 16, 2018

tjd2002 commented Oct 17, 2018

tjd2002 commented Jan 17, 2019

stephprince commented May 13, 2024

combine SpikeEventSeries and Clustering #239

combine SpikeEventSeries and Clustering #239

Comments

bendichter commented Oct 16, 2018

2) Feature Request

Checklist

ajtritt commented Oct 16, 2018

bendichter commented Oct 16, 2018

ajtritt commented Oct 16, 2018

oruebel commented Oct 16, 2018

ajtritt commented Oct 16, 2018

bendichter commented Oct 16, 2018

ajtritt commented Oct 16, 2018

tjd2002 commented Oct 16, 2018

tjd2002 commented Oct 16, 2018

bendichter commented Oct 16, 2018

tjd2002 commented Oct 17, 2018

tjd2002 commented Jan 17, 2019

stephprince commented May 13, 2024