FIX: Replace very inefficient discrete _get_trial #403

KatharineShapcott · 2022-12-30T09:15:11Z

On my data x20 speedup

Author Guidelines

Is the change set < 600 lines?
Was the code checked for memory leaks/performance bottlenecks?
Is the code running locally and on the ESI cluster?
Is the code running on all supported platforms?

Reviewer Checklist

On my data x20 speedup

KatharineShapcott · 2022-12-30T09:17:14Z

My assumption is that most spike/event data will be sorted like ours is and therefore only part of the file needs to be loaded from disk, hence the dramatic speedup. But it doesn't require sorted data to work.

codecov · 2022-12-30T09:37:33Z

Codecov Report

Base: 68.22% // Head: 67.64% // Decreases project coverage by -0.58% ⚠️

Coverage data is based on head (ff23719) compared to base (5816ae2).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head ff23719 differs from pull request most recent head 6d6dce7. Consider uploading reports for the commit 6d6dce7 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #403      +/-   ##
==========================================
- Coverage   68.22%   67.64%   -0.58%     
==========================================
  Files          80       80              
  Lines        9602     9600       -2     
  Branches     1993     1995       +2     
==========================================
- Hits         6551     6494      -57     
- Misses       2527     2583      +56     
+ Partials      524      523       -1

Impacted Files	Coverage Δ
syncopy/datatype/discrete_data.py	`67.28% <100.00%> (-12.42%)`	⬇️
syncopy/datatype/methods/definetrial.py	`83.50% <100.00%> (-0.41%)`	⬇️
syncopy/specest/freqanalysis.py	`61.70% <0.00%> (-3.65%)`	⬇️
syncopy/plotting/_helpers.py	`85.91% <0.00%> (-2.82%)`	⬇️
syncopy/nwanalysis/wilson_sf.py	`92.75% <0.00%> (-1.45%)`	⬇️
syncopy/plotting/mp_plotting.py	`44.66% <0.00%> (-1.34%)`	⬇️
syncopy/specest/compRoutines.py	`89.41% <0.00%> (-0.69%)`	⬇️
syncopy/datatype/base_data.py	`76.82% <0.00%> (-0.30%)`	⬇️
syncopy/datatype/util.py	`71.64% <0.00%> (+1.49%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

syncopy/datatype/discrete_data.py

KatharineShapcott · 2023-01-02T16:16:46Z

While we're at it, I noticed that printing is very slow (although a lot faster after this speedup) since many different attributes are actually calling _get_trial under the hood.

sample
817 ms ± 6.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
time
401 ms ± 5.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
trials
384 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
trialtime
496 ms ± 4.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
trialinfo
392 ms ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

What if I make a _get_trial specifically for the SpikeData class that I can assume is sorted. Then storing a searchsorted call in a private variable _trialslice would be very fast e.g.

def _get_trial(self, trialno):
        if not hasattr(self, '_trialslice'):
            self._trialslice = np.searchsorted(self._data[:,self.dimord.index("sample")], self.sampleinfo.ravel())
            self._trialslice = self._trialslice.reshape(self.sampleinfo.shape)
        idx = slice(self._trialslice[trialno,0], self._trialslice[trialno,1])
        return self._data[idx,:]

tensionhead · 2023-01-03T13:59:38Z

Puh, I am not sure I can follow you here.. so when I do this for some synthetic data spd (syncopy.tests.synth_data.poisson_noise) I have:

In [19]: spd.sampleinfo
Out[19]: 
array([[      0.,  180609.],
       [ 200000.,  396256.],
       [ 400000.,  582885.],
       [ 600000.,  783863.],
       [ 800000.,  996461.],
       [1000000., 1184701.],
       [1200000., 1392257.],
       [1400000., 1592449.],
       [1600000., 1783830.],
       [1800000., 1999999.]])

and after your proposed operation:

In [17]: trialslice.reshape(spd.sampleinfo.shape)
Out[17]: 
array([[     0,  17986],
       [ 19928,  39816],
       [ 40191,  58440],
       [ 60106,  78473],
       [ 80111,  99810],
       [100169, 118774],
       [120309, 139543],
       [140340, 159565],
       [160378, 178619],
       [180300, 200000]])

which I am not sure what it is exactly, looks like sub-intervals of the 1st trial?!

KatharineShapcott · 2023-01-03T16:27:16Z

Yes definitely confusing compared to continuous data... trialslice is not samples anymore but is in units of number of spikes. So what this tells me is that your first trial contains 17,986 spikes, while your second trial contains 19,888 spikes and the first spike that falls within that trial is at index 19,928 etc.

The problem with this solution is that if you change the trialdefinition or use selectdata you need to update trialslice at the same time. Are you around for a meeting about this at some point?

Removed custom .trials property

KatharineShapcott · 2023-01-05T11:29:03Z

Okay I think I get why this crashed now. With the previous version of .trials it was impossible to return an empty array, it would just be skipped over. So when you tried to select trials[0] here you would really get trials[1]. That's fine but the problem seems to be that if you try to use selectdata to select that empty trial and there is no data at all then everything gets set to None including the trialdefinition.

spikes.trials[0]
Out[36]: array([], shape=(0, 3), dtype=int64)

In [37]: spikes.trialdefinition
Out[37]: 
array([[   0.,   10.,    0.],
       [  10.,  100.,    0.],
       [ 200.,  500.,    0.],
       [ 600.,  700.,    0.],
       [ 800., 1000.,    0.]])

In [38]: selected.trialdefinition
Out[38]: array(None, dtype=object)

Which causes the following error

selected.trials[0]

SyNCoPy encountered an error in 

/gs/home/shapcottk/.conda/envs/syncopyoe/lib/python3.8/site-packages/IPython/core/interactiveshell.py, line 3361 in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)

--------------------------------------------------------------------------------
Abbreviated traceback:

<ipython-input-40-391b5f5f858d>, line 1 in <cell line: 1>
        selected.trials[0]

Use `import traceback; import sys; traceback.print_tb(sys.last_traceback)` for full error traceback.

TypeError: 'NoneType' object is not subscriptable

tensionhead · 2023-01-05T15:49:47Z

Mmh, so in the tests we select empty trials, that's somewhat surprising! I would consider this an edge case.. There's loads of CI output so I can't really see what was happening though. Shall I look into this or you think you can get it to work?

KatharineShapcott · 2023-01-05T16:56:16Z

Yeah it's kind of an edge case but since the test data contains very few spikes it occurs quite often in the tests I think.

I tried to avoid it by returning the empty arrays instead of None, let's see what happens now.

KatharineShapcott · 2023-01-05T17:14:30Z

Hmm, one error seems to be due to this, I don't why that's suddenly a problem?

FAILED test_discretedata.py::TestEventData::test_ed_dataselection - syncopy.shared.errors.SPYTypeError: Wrong type of value slice(-2, None, None) for key 'eventid': expected serializable data type, e.g. floats, lists, tuples, ... found slice

syncopy/syncopy/tests/test_discretedata.py

Line 533 in ccbff5c

slice(-2, None) # negative-start slice

eventidSelections = [
            [0, 0, 1],  # preserve repetition, don't convert to slice
            range(0, 2),  # narrow range
            slice(-2, None)  # negative-start slice
        ]

tensionhead · 2023-01-05T17:17:49Z

Mmh, we just deprecated slice as valid selection parameter with #407 , as they are not serializable and hence the cfg can't be saved to json. But you branched out befor that got merged today no?!

KatharineShapcott · 2023-01-05T17:19:36Z

Yes but something is weird, the line numbers in the tests don't match the ones in my branch :/

tensionhead · 2023-01-05T17:25:56Z

ok, I will try to merge dev into your branch 🤞

update from Dev

tensionhead · 2023-01-05T19:39:05Z

So dev itself got repaired, can't really tell atm what is happening here tbh, maybe @dfsp-spirit got an idea?!

EDIT: or we take the most straightforward answer: the changes actually broke sth, as now locally I get:

	T1.test_general()
/xxx/syncopy/syncopy/tests/test_selectdata.py, line 450 in test_general
	assert np.array_equal(selected.trials[tk],

particularly line 443 looks like a good candidate: propArr = np.unique(selected.data[:, propIdx]).astype(np.intp)

Changes to be committed: modified: syncopy/datatype/discrete_data.py

KatharineShapcott · 2023-01-06T10:21:22Z

I don't think we need to revert removing the unique from sample, the tests passed from that change before:

KatharineShapcott · 2023-01-17T12:44:51Z

@tensionhead The one that's failing is a test_ed_dataselection within the test_discretedata. Can we simply remove that test?

syncopy/syncopy/tests/test_discretedata.py

Line 561 in ba57fd5

assert np.array_equal(obj.trials[trialno][selector.time[tk], :],

KatharineShapcott · 2023-01-17T12:48:32Z

I'm pretty sure it's failing because of the relative indexing thing. trialno is 3 and tk is 0 when it fails

tensionhead · 2023-01-17T13:53:46Z

Phew.. right we also have selection tests witin the individual data class tests 😅 Yup, I think it is safe to remove, this whole EventData probably needs an overhaul anyways.. the event id indexing is so weird.

tensionhead

Looks good, nice small surgical changes! I just have 1 minor question, see below.

EDIT: Looks like the original failing test just evaporated with the re-write of the test_selectdata.py 🙂

syncopy/datatype/discrete_data.py

syncopy/datatype/methods/definetrial.py

KatharineShapcott · 2023-01-18T10:31:05Z

@tensionhead Should I merge in the other speedups?

tensionhead · 2023-01-18T10:36:44Z

Yeah was wondering myself, but if the next CI run passes I think we are good

tensionhead

Great, together with #415 we hopefully made significant perfomance gains for DiscreteData

FIX: Replace very inefficient discrete _get_trial

533425b

On my data x20 speedup

KatharineShapcott marked this pull request as draft December 30, 2022 09:23

FIX: end of slice incremented by 1

8c1db01

tensionhead reviewed Jan 2, 2023

View reviewed changes

syncopy/datatype/discrete_data.py Outdated Show resolved Hide resolved

FIX: return empty array NOT all data

f28eead

CHG: remove unique from sample

f2c280c

KatharineShapcott added 2 commits January 4, 2023 12:55

CHG: new _trialslice property for DiscreteData

1cd9b03

CHG: Update DiscreteData to use _trialslice

321c2d3

Removed custom .trials property

FIX: no data returns empty array

ccbff5c

Merge pull request #411 from esi-neuroscience/dev

a946403

update from Dev

CHG: revert sample property

b730e92

Changes to be committed: modified: syncopy/datatype/discrete_data.py

This was referenced Jan 11, 2023

Re-implement .trials #418

Closed

Rewrite selectdata tests #419

Closed

Merge branch 'dev' into discrete-speedup

ba57fd5

FIX: Remove incorrect test

ff23719

KatharineShapcott added the Performance Improve the number crunching label Jan 18, 2023

KatharineShapcott marked this pull request as ready for review January 18, 2023 07:05

tensionhead reviewed Jan 18, 2023

View reviewed changes

syncopy/datatype/discrete_data.py Outdated Show resolved Hide resolved

syncopy/datatype/discrete_data.py Outdated Show resolved Hide resolved

syncopy/datatype/discrete_data.py Show resolved Hide resolved

syncopy/datatype/methods/definetrial.py Show resolved Hide resolved

FIX: Remove unique from sample

6d6dce7

tensionhead approved these changes Jan 18, 2023

View reviewed changes

tensionhead merged commit ce5707c into dev Jan 18, 2023

tensionhead deleted the discrete-speedup branch March 31, 2023 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: Replace very inefficient discrete _get_trial #403

FIX: Replace very inefficient discrete _get_trial #403

KatharineShapcott commented Dec 30, 2022

KatharineShapcott commented Dec 30, 2022

codecov bot commented Dec 30, 2022 •

edited

Loading

KatharineShapcott commented Jan 2, 2023

tensionhead commented Jan 3, 2023

KatharineShapcott commented Jan 3, 2023

KatharineShapcott commented Jan 5, 2023

tensionhead commented Jan 5, 2023

KatharineShapcott commented Jan 5, 2023

KatharineShapcott commented Jan 5, 2023

tensionhead commented Jan 5, 2023

KatharineShapcott commented Jan 5, 2023

tensionhead commented Jan 5, 2023

tensionhead commented Jan 5, 2023 •

edited

Loading

KatharineShapcott commented Jan 6, 2023

KatharineShapcott commented Jan 17, 2023 •

edited

Loading

KatharineShapcott commented Jan 17, 2023 •

edited

Loading

tensionhead commented Jan 17, 2023 •

edited

Loading

tensionhead left a comment •

edited

Loading

KatharineShapcott commented Jan 18, 2023

tensionhead commented Jan 18, 2023

tensionhead left a comment

FIX: Replace very inefficient discrete _get_trial #403

FIX: Replace very inefficient discrete _get_trial #403

Conversation

KatharineShapcott commented Dec 30, 2022

Author Guidelines

Reviewer Checklist

KatharineShapcott commented Dec 30, 2022

codecov bot commented Dec 30, 2022 • edited Loading

Codecov Report

KatharineShapcott commented Jan 2, 2023

tensionhead commented Jan 3, 2023

KatharineShapcott commented Jan 3, 2023

KatharineShapcott commented Jan 5, 2023

tensionhead commented Jan 5, 2023

KatharineShapcott commented Jan 5, 2023

KatharineShapcott commented Jan 5, 2023

tensionhead commented Jan 5, 2023

KatharineShapcott commented Jan 5, 2023

tensionhead commented Jan 5, 2023

tensionhead commented Jan 5, 2023 • edited Loading

KatharineShapcott commented Jan 6, 2023

KatharineShapcott commented Jan 17, 2023 • edited Loading

KatharineShapcott commented Jan 17, 2023 • edited Loading

tensionhead commented Jan 17, 2023 • edited Loading

tensionhead left a comment • edited Loading

Choose a reason for hiding this comment

KatharineShapcott commented Jan 18, 2023

tensionhead commented Jan 18, 2023

tensionhead left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 30, 2022 •

edited

Loading

tensionhead commented Jan 5, 2023 •

edited

Loading

KatharineShapcott commented Jan 17, 2023 •

edited

Loading

KatharineShapcott commented Jan 17, 2023 •

edited

Loading

tensionhead commented Jan 17, 2023 •

edited

Loading

tensionhead left a comment •

edited

Loading