-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: Replace very inefficient discrete _get_trial #403
Conversation
On my data x20 speedup
My assumption is that most spike/event data will be sorted like ours is and therefore only part of the file needs to be loaded from disk, hence the dramatic speedup. But it doesn't require sorted data to work. |
Codecov ReportBase: 68.22% // Head: 67.64% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## dev #403 +/- ##
==========================================
- Coverage 68.22% 67.64% -0.58%
==========================================
Files 80 80
Lines 9602 9600 -2
Branches 1993 1995 +2
==========================================
- Hits 6551 6494 -57
- Misses 2527 2583 +56
+ Partials 524 523 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
While we're at it, I noticed that printing is very slow (although a lot faster after this speedup) since many different attributes are actually calling
What if I make a
|
Puh, I am not sure I can follow you here.. so when I do this for some synthetic data In [19]: spd.sampleinfo
Out[19]:
array([[ 0., 180609.],
[ 200000., 396256.],
[ 400000., 582885.],
[ 600000., 783863.],
[ 800000., 996461.],
[1000000., 1184701.],
[1200000., 1392257.],
[1400000., 1592449.],
[1600000., 1783830.],
[1800000., 1999999.]]) and after your proposed operation: In [17]: trialslice.reshape(spd.sampleinfo.shape)
Out[17]:
array([[ 0, 17986],
[ 19928, 39816],
[ 40191, 58440],
[ 60106, 78473],
[ 80111, 99810],
[100169, 118774],
[120309, 139543],
[140340, 159565],
[160378, 178619],
[180300, 200000]]) which I am not sure what it is exactly, looks like sub-intervals of the 1st trial?! |
Yes definitely confusing compared to continuous data... trialslice is not samples anymore but is in units of number of spikes. So what this tells me is that your first trial contains 17,986 spikes, while your second trial contains 19,888 spikes and the first spike that falls within that trial is at index 19,928 etc. The problem with this solution is that if you change the trialdefinition or use selectdata you need to update trialslice at the same time. Are you around for a meeting about this at some point? |
Removed custom .trials property
Okay I think I get why this crashed now. With the previous version of .trials it was impossible to return an empty array, it would just be skipped over. So when you tried to select trials[0] here you would really get trials[1]. That's fine but the problem seems to be that if you try to use selectdata to select that empty trial and there is no data at all then everything gets set to None including the trialdefinition.
Which causes the following error
|
Mmh, so in the tests we select empty trials, that's somewhat surprising! I would consider this an edge case.. There's loads of CI output so I can't really see what was happening though. Shall I look into this or you think you can get it to work? |
Yeah it's kind of an edge case but since the test data contains very few spikes it occurs quite often in the tests I think. I tried to avoid it by returning the empty arrays instead of None, let's see what happens now. |
Hmm, one error seems to be due to this, I don't why that's suddenly a problem? FAILED test_discretedata.py::TestEventData::test_ed_dataselection - syncopy.shared.errors.SPYTypeError: Wrong type of syncopy/syncopy/tests/test_discretedata.py Line 533 in ccbff5c
|
Mmh, we just deprecated |
Yes but something is weird, the line numbers in the tests don't match the ones in my branch :/ |
ok, I will try to merge dev into your branch 🤞 |
update from Dev
So EDIT: or we take the most straightforward answer: the changes actually broke sth, as now locally I get: T1.test_general()
/xxx/syncopy/syncopy/tests/test_selectdata.py, line 450 in test_general
assert np.array_equal(selected.trials[tk], particularly line 443 looks like a good candidate: |
Changes to be committed: modified: syncopy/datatype/discrete_data.py
@tensionhead The one that's failing is a test_ed_dataselection within the test_discretedata. Can we simply remove that test? syncopy/syncopy/tests/test_discretedata.py Line 561 in ba57fd5
|
I'm pretty sure it's failing because of the relative indexing thing. |
Phew.. right we also have selection tests witin the individual data class tests 😅 Yup, I think it is safe to remove, this whole |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, nice small surgical changes! I just have 1 minor question, see below.
EDIT: Looks like the original failing test just evaporated with the re-write of the test_selectdata.py
🙂
@tensionhead Should I merge in the other speedups? |
Yeah was wondering myself, but if the next CI run passes I think we are good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, together with #415 we hopefully made significant perfomance gains for DiscreteData
On my data x20 speedup
Author Guidelines
Reviewer Checklist