Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issues with abagen.get_expression_data #99

Closed
rmarkello opened this issue Sep 11, 2019 · 0 comments · Fixed by #130
Closed

Memory issues with abagen.get_expression_data #99

rmarkello opened this issue Sep 11, 2019 · 0 comments · Fixed by #130
Labels
high priority Things in urgent need of attention refactor Not an enhancement, but not a bug

Comments

@rmarkello
Copy link
Owner

The issue

Using the fantastic memory_profiler package and running a few tests with abagen reveals that, in the best case, the get_expression_data() functions uses ~4GiB RAM and, in the worst case, uses up to ~7GiB RAM.

This isn't great, and it would be nice to find ways to optimize this down a bit. Unfortunately, I think at a minimum we're going to be dealing with ~2GiB of RAM given that's about how much loading and storing the raw microarray + pacall data uses, but beyond that we can definitely try and get things down.

Currently, the worst of the memory usage is happening when lr_mirror is set to True. The step where we duplicate the samples has, for a brief period of time, both the original microarray data AND the duplicated microarray data in memory, leading to huge jumps until the original is removed.

Proposed solution

I tried to resolve this issue by adding an optional keyword argument inplace to the samples.mirror_samples command, but that didn't seem to help a ton (at least not with the current implementation). I'm going to have to dig into the internals of things a bit more and try and figure out if there are stages where we can offload holding things in memory.

The flipside is that offloading things from memory might increase runtime (due to the need to load them back in whenever they're needed), so perhaps this might warrant a low_memory parameter or something?

@rmarkello rmarkello added refactor Not an enhancement, but not a bug high priority Things in urgent need of attention labels Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Things in urgent need of attention refactor Not an enhancement, but not a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant