Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve UCX documentation and examples #545

Merged
merged 12 commits into from
Mar 10, 2021

Conversation

charlesbluca
Copy link
Member

@charlesbluca charlesbluca commented Mar 5, 2021

Addressing #544, this PR aims to clarify the requirements, configuration, and usage of UCX with Dask-CUDA.

Still a lot to be done:

  • Flesh out hardware/software requirements
  • Rework CLI/Python usage examples
  • Clarify some uncertainties in the Configuration section
  • Add standalone examples of UCX usage

@charlesbluca charlesbluca added 2 - In Progress Currently a work in progress doc Documentation non-breaking Non-breaking change labels Mar 5, 2021
@charlesbluca
Copy link
Member Author

Thinking that the UCX usage sections could probably be simplified a bit if we better fleshed out dask-cuda-worker and dask_cuda.initialize(...) in their own pages.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments here @charlesbluca , but it's looking good so far, thanks for working on this!

docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Mar 8, 2021

Codecov Report

Merging #545 (444df77) into branch-0.19 (f99a037) will increase coverage by 29.89%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-0.19     #545       +/-   ##
================================================
+ Coverage        62.37%   92.27%   +29.89%     
================================================
  Files               22       16        -6     
  Lines             2517     1605      -912     
================================================
- Hits              1570     1481       -89     
+ Misses             947      124      -823     
Impacted Files Coverage Δ
dask_cuda/initialize.py 92.59% <ø> (+3.70%) ⬆️
dask_cuda/local_cuda_cluster.py 81.39% <ø> (ø)
dask_cuda/explicit_comms/dataframe/shuffle.py 95.89% <0.00%> (-1.37%) ⬇️
dask_cuda/benchmarks/local_cudf_merge.py
dask_cuda/benchmarks/local_cudf_shuffle.py
dask_cuda/benchmarks/utils.py
dask_cuda/_version.py
dask_cuda/benchmarks/local_cupy_map_overlap.py
dask_cuda/benchmarks/local_cupy.py
dask_cuda/utils.py 90.86% <0.00%> (+1.02%) ⬆️
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f99a037...444df77. Read the comment docs.

@charlesbluca
Copy link
Member Author

Thanks for the review @pentschev! Addressed them as best I could.

I reorganized the use cases into a general "Usage" section and consolidated most of the information on client setup to an individual "Client" section that showcases dask_cuda.initialize; feel free to let me know how that looks. My first thought is that it may be nested too deep when it gets to sections on setting up the scheduler + workers using the CLI.

@charlesbluca
Copy link
Member Author

Is there anything we can do with the API reference to make dask_cuda.initialize show up with its arguments? It seems like it's meant for general use, but right now it seems like the only way to see what arguments it takes is to check the source code on GitHub.

@github-actions github-actions bot added the python python code needed label Mar 8, 2021
@charlesbluca
Copy link
Member Author

Made some changes to the API source so that dask_cuda.initialize.initialize should show up properly now; also moved the docstring for the dask_cuda.initialize module to the function itself and added some parameters.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @charlesbluca , this is looking good so far!

I added a few minor suggestions and replied most of your questions (if I missed any, please let me know).

Since this is getting larger than I initially predicted, what do you say we merge this after my comments are addressed and continue with samples in a follow-up PR? I generally think this is better to avoid getting PRs too large and increasingly more difficult to track changes.

- dask_cuda.initialize

See https://docs.dask.org/en/latest/configuration.html for more information
about Dask configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a note here now to let users know that this is already done for LocalCUDACluster (including the client, if it's instantiated on the same process as the cluster) and dask-cuda-worker, so users don't need to do that again. The common place where users would want to use this is when they instantiate their standalone client connecting to a cluster that has already been started with a combination of dask-scheduler/dask-cuda-workers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, in that case I will emphasize in the docstring that this script is meant for use with the client to mirror the pre-configured CUDA cluster/workers, and that the worker preload script would only need to be used with mainline Dask/Distributed.

docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
docs/source/ucx.rst Outdated Show resolved Hide resolved
@charlesbluca
Copy link
Member Author

Addressed your comments - let me know if I should change anything. If not, I'm happy to merge and handle the examples in a separate PR! Thanks for the help 😄

@charlesbluca charlesbluca marked this pull request as ready for review March 9, 2021 16:59
@charlesbluca charlesbluca requested a review from a team as a code owner March 9, 2021 16:59
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more small change, then I think we can merge it, but we're still blocked by #546.

dask_cuda/initialize.py Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

rerun tests

@jakirkham jakirkham requested a review from pentschev March 10, 2021 03:17
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @charlesbluca for the work on improving docs here!

@pentschev
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 72029bc into rapidsai:branch-0.19 Mar 10, 2021
rapids-bot bot pushed a commit that referenced this pull request Mar 17, 2021
Following up on #545, this PR adds standalone examples of UCX usage to a new `examples/ucx/` directory.

Right now, these are pretty simple - just showing cluster, worker, and client setups for NVLink, InfiniBand, both, or neither. I imagine there are problem some more complicated/niche set up we could showcase here.

cc @pentschev

Authors:
  - Charles Blackmon-Luca (@charlesbluca)

Approvers:
  - Peter Andreas Entschev (@pentschev)

URL: #551
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress doc Documentation non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants