You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The flash (and future FEL) loader currently does the conversion to parquet in parallel with all files simultaneously. This can consume huge amounts of memory, and lead to very inefficient conversion for systems with few cores and low memory (such as the github action workers).
The solution would be to do the conversion in chunks determined by the number of available cores and/or the num_cores config parameter we already define. Maybe this should be moved from [binning] to [core] then.
The text was updated successfully, but these errors were encountered:
With current scheme, one core is used per dataframe. But since the time is only a few seconds per file now, the overheads are way too large. Hence, running multiple files per core is likely optimal.
I have tested with 49 files.
With the main branch, using 49 workers leads to about 200 s time. Likewise refactored branch takes 150 s with 49 workers. This is about the same as loading it serially, or possibly worse.
Instead of using 49 workers, using just 10 workers leads to speed up. 100 s with main branch and 30 s with refactored. So there is still a speed up but the n_workers parameter should be adaptively scaled.
Simple method could be 1 worker per 3-5 files.
Refactored code:
Instead of joblib, I also tried with dask distributed localcluster and that provides a more linear speedup with the amount of cores used.
Here I am reading 49 files. Serially it takes about 196 s. Dask delayed without the client takes 100 s (basically single core then).
Two different runs using dask.distributed LocalCluster.
In current code, only one other place uses joblib so maybe we can make a switch to dask distributed.
The flash (and future FEL) loader currently does the conversion to parquet in parallel with all files simultaneously. This can consume huge amounts of memory, and lead to very inefficient conversion for systems with few cores and low memory (such as the github action workers).
The solution would be to do the conversion in chunks determined by the number of available cores and/or the
num_cores
config parameter we already define. Maybe this should be moved from[binning]
to[core]
then.The text was updated successfully, but these errors were encountered: