Memory issues with `abagen.get_expression_data` #99

rmarkello · 2019-09-11T21:05:56Z

The issue

Using the fantastic memory_profiler package and running a few tests with abagen reveals that, in the best case, the get_expression_data() functions uses ~4GiB RAM and, in the worst case, uses up to ~7GiB RAM.

This isn't great, and it would be nice to find ways to optimize this down a bit. Unfortunately, I think at a minimum we're going to be dealing with ~2GiB of RAM given that's about how much loading and storing the raw microarray + pacall data uses, but beyond that we can definitely try and get things down.

Currently, the worst of the memory usage is happening when lr_mirror is set to True. The step where we duplicate the samples has, for a brief period of time, both the original microarray data AND the duplicated microarray data in memory, leading to huge jumps until the original is removed.

Proposed solution

I tried to resolve this issue by adding an optional keyword argument inplace to the samples.mirror_samples command, but that didn't seem to help a ton (at least not with the current implementation). I'm going to have to dig into the internals of things a bit more and try and figure out if there are stages where we can offload holding things in memory.

The flipside is that offloading things from memory might increase runtime (due to the need to load them back in whenever they're needed), so perhaps this might warrant a low_memory parameter or something?

The text was updated successfully, but these errors were encountered:

rmarkello added refactor Not an enhancement, but not a bug high priority Things in urgent need of attention labels Sep 11, 2019

rmarkello mentioned this issue Sep 11, 2019

[MNT] Docs / package structure updates in prep for 0.2.0 #95

Merged

This was referenced Nov 20, 2019

[TEST] Updating CircleCI #122

Merged

[REF] Massive reduction in memory usage #130

Merged

rmarkello closed this as completed in #130 Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issues with `abagen.get_expression_data` #99

Memory issues with `abagen.get_expression_data` #99

rmarkello commented Sep 11, 2019

Memory issues with abagen.get_expression_data #99

Memory issues with abagen.get_expression_data #99

Comments

rmarkello commented Sep 11, 2019

The issue

Proposed solution

Memory issues with `abagen.get_expression_data` #99

Memory issues with `abagen.get_expression_data` #99