-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically configure enable_tcp_over_ucx by default when protocol=ucx in LocalCUDACluster #424
Comments
The reason I enforced users to set |
Also, why would you pass |
I agree. I suggested the None to try to closely match the existing behavior, but I prefer what you're describing.
I think this is absolutely a potential risk. Do we know of any examples in which someone sets My thinking is that if someone wants to switch between UCX and TCP, ideally that can be done by one parameter. Keeping two parameters to protect against accidental setting of one parameter adds some complexity on the user side, and the protocol mixup may be something that doesn't happen too much in practice. |
There's actually another culprit, if we make Summarizing:
I do agree that the current behavior is annoying, but the alternative is today to let people think switching to |
To make myself even clearer, from my experience with users over the last year with Dask-CUDA, they DON'T read the documentation regarding UCX, so switching the default to |
I think the general ask here is that we try and reduce the the number of configurations for the common case. So when, for example, we have nvlink and/or inifiniband set to True, we automatically configure for tcp as well. My understanding is that we trying to handle the case where UCX can be configured with RDMA/RDMACM, is that correct ? |
If that's the case, then you just proved my point about documentation: dask-cuda/dask_cuda/local_cuda_cluster.py Lines 78 to 80 in 302d1b8
dask-cuda/dask_cuda/local_cuda_cluster.py Lines 84 to 86 in 302d1b8
|
Yes, I'm sorry :( Looking again at the code, the minimum needed is:
Because we also will set the protocol correctly when either those variables are set: dask-cuda/dask_cuda/local_cuda_cluster.py Lines 214 to 218 in 302d1b8
Is there a way we should change the docs to not overload users with options as their first introduction to dask-cuda+UCX ? |
I don't really understand the question, if one of those options evaluate to |
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d. |
This issue has been labeled |
Since we now support automatic UCX configuration by default as of #792 , I believe this issue is not relevant anymore. Tentatively closing it, please feel free to reopen if there is still anything needed. |
Today, I configure a UCX-enabled cluster with some values for the following four arguments:
In general, the common practice is to enable NVLink, possibly enable InfiniBand, and always enable TCP over UCX. While there are scenarios in which we would want to disable TCP over UCX, by and large we enable it. To reduce the number of parameters that users must tweak to get UCX working, it might be nice to:
enable_tcp_over_ucx=True
when a user leavesenable_tcp_over_ucx
as None andprotocol="ucx"
,enable_tcp_over_ucx=False
if desired, with the corresponding expected behaviorAnother approach might be to instead switch this to simply be
disable_tcp_over_ucx
and have it otherwise be implicitly set to True, if that is preferred. cc @quasiben for thoughts as we were discussing this earlier.The text was updated successfully, but these errors were encountered: