You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the fantastic memory_profiler package and running a few tests with abagen reveals that, in the best case, the get_expression_data() functions uses ~4GiB RAM and, in the worst case, uses up to ~7GiB RAM.
This isn't great, and it would be nice to find ways to optimize this down a bit. Unfortunately, I think at a minimum we're going to be dealing with ~2GiB of RAM given that's about how much loading and storing the raw microarray + pacall data uses, but beyond that we can definitely try and get things down.
Currently, the worst of the memory usage is happening when lr_mirror is set to True. The step where we duplicate the samples has, for a brief period of time, both the original microarray data AND the duplicated microarray data in memory, leading to huge jumps until the original is removed.
Proposed solution
I tried to resolve this issue by adding an optional keyword argument inplace to the samples.mirror_samples command, but that didn't seem to help a ton (at least not with the current implementation). I'm going to have to dig into the internals of things a bit more and try and figure out if there are stages where we can offload holding things in memory.
The flipside is that offloading things from memory might increase runtime (due to the need to load them back in whenever they're needed), so perhaps this might warrant a low_memory parameter or something?
The text was updated successfully, but these errors were encountered:
The issue
Using the fantastic
memory_profiler
package and running a few tests withabagen
reveals that, in the best case, theget_expression_data()
functions uses ~4GiB RAM and, in the worst case, uses up to ~7GiB RAM.This isn't great, and it would be nice to find ways to optimize this down a bit. Unfortunately, I think at a minimum we're going to be dealing with ~2GiB of RAM given that's about how much loading and storing the raw microarray + pacall data uses, but beyond that we can definitely try and get things down.
Currently, the worst of the memory usage is happening when
lr_mirror
is set toTrue
. The step where we duplicate the samples has, for a brief period of time, both the original microarray data AND the duplicated microarray data in memory, leading to huge jumps until the original is removed.Proposed solution
I tried to resolve this issue by adding an optional keyword argument
inplace
to thesamples.mirror_samples
command, but that didn't seem to help a ton (at least not with the current implementation). I'm going to have to dig into the internals of things a bit more and try and figure out if there are stages where we can offload holding things in memory.The flipside is that offloading things from memory might increase runtime (due to the need to load them back in whenever they're needed), so perhaps this might warrant a
low_memory
parameter or something?The text was updated successfully, but these errors were encountered: