-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive memory consumption of np.nansum
#380
Comments
- the quick fix outlined in #380 Changes to be committed: modified: syncopy/shared/computational_routine.py modified: syncopy/shared/kwarg_decorators.py
Ok, so the real culprit was this somewhat odd
Where first both operands would be put into a list So I will replace both appearance of that memory hungry operation with a simple |
- this old np.sum([target, res], axis=0) construction increased the memory footprint 2-3 fold - ordinary `+` operator is memory sparing AND can also deal with NaNs - fixes #380 Changes to be committed: modified: syncopy/shared/computational_routine.py modified: syncopy/shared/kwarg_decorators.py
Hi! Nice catch! Simply using
We have tested it (ages ago) but really focused on the parallel computing case (so the above leak in |
Thanks, I also think that's a nice step forward 🚀
The parallel case was actually no different, the same problematic line was within the IO decorator which springs into action for syncopy/syncopy/shared/kwarg_decorators.py Line 659 in 36b7005
So what you probably mean is that only now due to the cross spectral cFs this memory blow up became apparent 😉 But yeah I agree, proper profiling for different use cases will remain a challenging task ahead. |
Just summarizing my chat with Gregor here, if we would use more in place operations the memory usage would stay smaller. In this case you could do:
I did a simple test of the nan handling, this is how it behaves: If we want the nans replaced by 0 we can do the following without wasting much memory, I don;t know about time though:
|
Yes, thx to @KatharineShapcott for coming up with the best solution! One operand has to be in memory, yet the result can be written directly into the dataset via the
leading to a memory footprint of less than 2 trials for that operation: Note that the 1st memory consumption peak/plateau of every function profiled, is the creation of either 2 arrays or 1 array and 1 dataset As a bonus, at least for this test here it's even slightly faster than the previous winner It's great that the standard trial-average (that's what it's all about: |
- thx to @KatharineShapcott ! - addresses #380 Changes to be committed: modified: syncopy/shared/computational_routine.py modified: syncopy/shared/kwarg_decorators.py
I think we can close this with 06a2991 |
np.nansum
np.nansum
- see #380 for details Changes to be committed: modified: syncopy/datatype/methods/statistics.py
Update
Ok, so it looks like it is not a real leak.. we just underestimated how much RAM
np.sum
andnp.nansum
actually use. Here I profilednp.ndarray
andh5pyDataSet
creation and summation in the combinations indicated by function name. Note that a single array has size of 256MB, so we expect two arrays to cover slightly more than 500MB. For the summing operations total temporary RAM usage goes up to over 1.2GB:From this ad-hoc analysis, I conclude we need at least 4 times the memory size of a single trial, and not just 2 which is only true for static memory usage.
Repeating with 1024MB single-trial size, we see that
np.nansum
needs pretty much one extra trial in memory:So this means, that just by chance.. switching from
np.nansum
tonp.sum
kept the memory usage below what the machine I used to find this had to offer (32GB RAM vs. 8GB single trial size).Synopsis
Thanks to @kajal5888 we now got some real life problem with data having trials of being around 8GB. When trying to process the data Syncopy consistently crashed (OOM Kill) with
keeptrials=False
and with or withoutparallel=True
and with or without ACME on a machine with 32GB of RAM available. So this means that the basic promise: as long as 2 trials fit into memory you are fine did not hold up to reality.@pantaray I guess no one (me inclusive) really tested this promise with seriously sized data before?
The Culprit and Quick Fix
After some searching I found the suspect
np.nansum
, which gets used here:syncopy/syncopy/shared/computational_routine.py
Line 910 in 36b7005
Here is a relevant issue:
pydata/bottleneck#201
Apparently when combining
np.ndarray
and other dataypes within anp.nansum([obj1, obj2])
call, they mentionlist
andpd.DataFrame
, there are memory leaks.Switching to
np.sum
for testing in the sequential branch of the CR fixed the OOM killing and the processing succeeded for said dataset.What Next
The issue above was of course recognized as being quite serious but got apparently already fixed January 2019:
pydata/bottleneck#202
We need to 1st check if that fix made it into our environments. If yes, then maybe combining
h5py.Dataset
andnp.ndarray
still has issues, in that case we have to escalate that further.The text was updated successfully, but these errors were encountered: