Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to Dask Distributed #816

Merged
merged 23 commits into from
Feb 26, 2025
Merged

Move to Dask Distributed #816

merged 23 commits into from
Feb 26, 2025

Conversation

mauch
Copy link
Contributor

@mauch mauch commented Feb 18, 2025

Make the pipeline work with Dask Distributed.

This mostly attempts to avoid saving dataframes to Pandas were possible and also avoids loading dataframes from Pandas. Exceptions to this and their reasons are commented with NOTEs.

Also - where possible avoid persist or pausing the computation with wait until the end of each pipeline step (ie. A wait is issued only at the end of association, new_sources, forced_extract and after source statistics in finalise). Sometimes persist is necessary - and the reasons for this is also commented with a NOTE.

Ready to go now I think.

Still to do:

  • Forced Extract
  • Finalise
  • Tests

@mauch mauch added do not merge Do not merge this PR v2.0 labels Feb 18, 2025
@ddobie
Copy link
Contributor

ddobie commented Feb 18, 2025

Just some minor stuff, but looking good otherwise.

@mauch mauch marked this pull request as ready for review February 26, 2025 04:28
Copy link
Contributor

@ddobie ddobie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Just some nitpicky stuff about docstrings throughout. mkdocs automatically generates doc pages for each function based on the docstrings, so it's good to keep them as complete as possible.

Once you tidy that up it's good to go!

@mauch mauch merged commit 724ca7c into v2.0 Feb 26, 2025
4 checks passed
@mauch mauch deleted the move_to_distributed branch February 26, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Do not merge this PR v2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants