-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to Dask Distributed #816
Conversation
…` option - and docs.
Just some minor stuff, but looking good otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Just some nitpicky stuff about docstrings throughout. mkdocs
automatically generates doc pages for each function based on the docstrings, so it's good to keep them as complete as possible.
Once you tidy that up it's good to go!
Co-authored-by: Dougal Dobie <[email protected]>
…ast-pipeline into move_to_distributed
Make the pipeline work with Dask Distributed.
This mostly attempts to avoid saving dataframes to Pandas were possible and also avoids loading dataframes from Pandas. Exceptions to this and their reasons are commented with NOTEs.
Also - where possible avoid
persist
or pausing the computation withwait
until the end of each pipeline step (ie. Await
is issued only at the end ofassociation
,new_sources
,forced_extract
and after source statistics infinalise
). Sometimespersist
is necessary - and the reasons for this is also commented with a NOTE.Ready to go now I think.
Still to do: