Minimize memory usage during posterior generation #248

sjfleming · 2023-08-17T15:02:15Z

There are examples where the CPU memory used to store the full posterior is quite large. For example, a deeply sequenced, multiplexed multi-donor sample (overloaded) and run with PIP-seq, which generates like 600+ ambient RNA counts per drop.

This posterior is very big, since there are lots of nonzero entries in the count matrix, and a lot of them are not small.

The question is, can we just incrementally write the posterior to disk? Something like this:
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#table-format

sjfleming · 2023-08-28T14:07:02Z

Actually we really should not have to write it to disk. The (albeit compressed) posterior h5 is under 2GB.

sjfleming mentioned this issue Aug 21, 2023

OOM Killed during batch processing when running cellbender on slurm #251

Open

sjfleming added the enhancement New feature or improvement label Aug 22, 2023

sjfleming self-assigned this Aug 22, 2023

This was referenced Aug 28, 2023

Batch size, CUDA out of memory #67

Closed

Memory-efficient posterior generation #263

Merged

github-project-automation bot added this to remove-background Aug 26, 2024

github-project-automation bot moved this to In Progress in remove-background Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimize memory usage during posterior generation #248

Minimize memory usage during posterior generation #248

sjfleming commented Aug 17, 2023

sjfleming commented Aug 28, 2023

Minimize memory usage during posterior generation #248

Minimize memory usage during posterior generation #248

Comments

sjfleming commented Aug 17, 2023

sjfleming commented Aug 28, 2023