Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict parallelization for Flash loader #355

Closed
rettigl opened this issue Mar 7, 2024 · 2 comments
Closed

Restrict parallelization for Flash loader #355

rettigl opened this issue Mar 7, 2024 · 2 comments

Comments

@rettigl
Copy link
Member

rettigl commented Mar 7, 2024

The flash (and future FEL) loader currently does the conversion to parquet in parallel with all files simultaneously. This can consume huge amounts of memory, and lead to very inefficient conversion for systems with few cores and low memory (such as the github action workers).

The solution would be to do the conversion in chunks determined by the number of available cores and/or the num_cores config parameter we already define. Maybe this should be moved from [binning] to [core] then.

@rettigl
Copy link
Member Author

rettigl commented Jul 18, 2024

Closed in #329

@rettigl rettigl closed this as completed Jul 18, 2024
@zain-sohail
Copy link
Member

zain-sohail commented Jul 29, 2024

With current scheme, one core is used per dataframe. But since the time is only a few seconds per file now, the overheads are way too large. Hence, running multiple files per core is likely optimal.

I have tested with 49 files.
With the main branch, using 49 workers leads to about 200 s time. Likewise refactored branch takes 150 s with 49 workers. This is about the same as loading it serially, or possibly worse.

Instead of using 49 workers, using just 10 workers leads to speed up. 100 s with main branch and 30 s with refactored. So there is still a speed up but the n_workers parameter should be adaptively scaled.
Simple method could be 1 worker per 3-5 files.

Refactored code:
image

Instead of joblib, I also tried with dask distributed localcluster and that provides a more linear speedup with the amount of cores used.

Here I am reading 49 files. Serially it takes about 196 s. Dask delayed without the client takes 100 s (basically single core then).

Two different runs using dask.distributed LocalCluster.
image

In current code, only one other place uses joblib so maybe we can make a switch to dask distributed.

@zain-sohail zain-sohail reopened this Jul 29, 2024
@OpenCOMPES OpenCOMPES locked and limited conversation to collaborators Jan 14, 2025
@zain-sohail zain-sohail converted this issue into discussion #550 Jan 14, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants