Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine SpikeEventSeries and Clustering #239

Closed
4 tasks done
bendichter opened this issue Oct 16, 2018 · 13 comments
Closed
4 tasks done

combine SpikeEventSeries and Clustering #239

bendichter opened this issue Oct 16, 2018 · 13 comments

Comments

@bendichter
Copy link
Contributor

2) Feature Request

How would people feel about folding Clustering into SpikeEventSeries by removing Clustering and adding a SpikeEventSeries.cluster field of dtype int? We could make data optional, in case a user only wants to store the times and not the waveforms (this is pretty common).

This would:

  1. Make the relationship between a spike time and the spike waveform more explicit
  2. Remove redundant timestamps that is currently duplicated across the two objects
  3. Resolve the issue that Clustering should have an electrodes field (add electrodes as optional argument to ecephys.Clustering #194)
  4. Resolve the issue that Clustering should be a TimeSeries(make Clustering and FeatureExtraction inherit from TimeSeries #112)

Checklist

  • Have you ensured the feature or change was not already reported ?
  • Have you included a brief and descriptive title?
  • Have you included a clear description of the problem you are trying to solve?
  • Have you included a minimal code snippet that reproduces the issue you are encountering?
@ajtritt
Copy link
Member

ajtritt commented Oct 16, 2018

The only thing I am unsure of is making SpikeEventSeries.data optional. If a SpikeEventSeries doesn't have data, then it fails to meet the requirements of a TimeSeries.

@bendichter
Copy link
Contributor Author

Would it be bad form to overwrite this?

i.e.

name: SpikeEventSeries:
    - datasets:
        - name: data
          quantity: '?'

...

is that allowed?

@ajtritt
Copy link
Member

ajtritt commented Oct 16, 2018

It shouldn’t be. What’s the point of inheritance if you can just toss out everything you inherit?

@oruebel
Copy link
Contributor

oruebel commented Oct 16, 2018

I agree, making a required field optional should not be allowed in inheritance, if only because it breaks upstream behavior. A TimeSeries without data really isn't a TimeSeries. Do we maybe need a separate type to handle the timestamps-only case?

@ajtritt
Copy link
Member

ajtritt commented Oct 16, 2018

What if we just make a new subtype of the base TimeSeries where data stores cluster ids?

@bendichter
Copy link
Contributor Author

@ajtritt yeah that almost works, but I could see cases where a user might want to store waveforms but does not want to cluster them, so I think ideally both would be optional. I like the idea of having a time-only type, @oruebel. I think there are other cases where this would be useful, like instantaneous events.

@ajtritt
Copy link
Member

ajtritt commented Oct 16, 2018

Yeah, we would keep SpikeEventSeries, and just add a new type.

@tjd2002
Copy link
Contributor

tjd2002 commented Oct 16, 2018

I'm all for removing 'Clustering'. I've been thinking of Clustering+SpikeEventSeries as being superseded by the new Unit table (for Unit metadata including clustering metrics) and UnitTimes (for spike times of each clustered unit), and expecting that all that older stuff would soon be deprecated. We've been using Unit + UnitTimes to store the output of clustering.

I think this use of Unit + UnitTimes handles the case where there is no per-spike waveform data. So we could keep SpikeEventSeries and add the '.cluster' field as proposed, and then only use it when storing waveforms. I think this would mean we don't have to worry about the no-data case discussed above.

@tjd2002
Copy link
Contributor

tjd2002 commented Oct 16, 2018

This would also resolve #111, which requests additional fields be added to Clustering.

@bendichter
Copy link
Contributor Author

@tjd2002 I'm glad to hear you are using these structures in your lab and they are working for you! It sounds like you are working with a conversion script that goes from some acquisition format and converts to NWB. This is where we'd like labs to be right now, so that's great.

Eventually we would like to work with acquisition groups to automatically save data as NWB files, so no conversion is necessary. I think the Clustering datatype would be useful for this because you can just append to the dataset as you go, whereas the UnitTimes structure might be a bit complicated (I imagine inserting data into the vector would be difficult during streaming and that appending to the end would be much easier, but I could be wrong about this). The "no-data case" discussed above was for cases like streaming live data where you may want to record times and clusters in real time but you might not want to save waveforms. Do you think that use-case might come up?

@tjd2002
Copy link
Contributor

tjd2002 commented Oct 17, 2018

My issue with the older "Clustering+ClusterWaveforms+SpikeEventSeries" was that it was all tied pretty closely to a particular clustering workflow (I think it was KlustaKwik?), and folks also found it was not likely to be performant for larger numbers of units. I see you have proposed some changes to Clustering to address those limitations, but the Unit metadata table seemed like a better all-around solution (more easily extensible, e.g.).

I think it's very confusing to have 2 different facilities (Clustering and friends Vs. Unit/UnitTimes) for the same type of data. If possible it would be great to optimize the storage for streaming without requiring this duplication. Would it be possible to just overload the UnitTimes I/O (by creating something like StreamingUnitTimes which is optimized for appending?) to solve the streaming issue, and keep everything in the nice Unit framework?

For EventWaveforms, I like your suggestion elsewhere to add in a '.cluster' property. This seems like it could handle both the streaming, and the off-line use cases?

@bendichter bendichter transferred this issue from NeurodataWithoutBorders/pynwb Jan 8, 2019
@tjd2002
Copy link
Contributor

tjd2002 commented Jan 17, 2019

Conversation seems to be continuing over at #194, in case anyone else is following along.

@stephprince
Copy link
Contributor

Closing because Clustering has been deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants