Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Can dask-cuda-workers work with dask-yarn? #43

Closed
randerzander opened this issue May 7, 2019 · 10 comments
Closed

[QST] Can dask-cuda-workers work with dask-yarn? #43

randerzander opened this issue May 7, 2019 · 10 comments
Labels
question Further information is requested

Comments

@randerzander
Copy link
Contributor

I'm able to use the dask-cuda-worker CLI commands to start one worker per GPU. but am wondering if there's a way to expose them to a YARN resource manager?

Would this depend on the dask-yarn package?

In my environment, GPU resource consumption isn't actually monitored, but I would still like to register my dask cluster as a YARN application, so that dashboards and other YARN monitoring tools are aware of its existence.

@mrocklin
Copy link
Contributor

mrocklin commented May 7, 2019

I'm able to use the dask-cuda-worker CLI commands to start one worker per GPU. but am wondering if there's a way to expose them to a YARN resource manager?

If using Yarn for one-worker-per-gpu workloads then I would probably just not use dask-cuda-worker, and would instead ask for a single GPU in your Yarn resource specfication. Then you can use the mainline dask-worker and things should be fine?

@mrocklin
Copy link
Contributor

mrocklin commented May 7, 2019

In my environment, GPU resource consumption isn't actually monitored, but I would still like to register my dask cluster as a YARN application, so that dashboards and other YARN monitoring tools are aware of its existence.

The solution to this is to use Dask-Yarn from the get-go, not to try to shoehorn something custom in. Dask Yarn handles a lot of annoying auth and connection things that I think you won't want to deal with.

@mrocklin
Copy link
Contributor

mrocklin commented May 7, 2019

Also cc @jcrist, just in case you're interested

@jcrist
Copy link
Member

jcrist commented May 7, 2019

I agree with @mrocklin that YARN support is external to this library. I'm not sure about what GPU integrations this library provides besides setting CUDA_VISIBLE_DEVICES, but dask-yarn works fine for spinning up workers on YARN, and if they have access to a GPU then they're free to use it. CUDA_VISIBLE_DEVICES won't be set, but you could handle that in user code if needed (if known beforehand, one way would be to use the worker_env parameter to YarnCluster). Criteo is using Skein (the underlying tech behind dask-yarn) to deploy tensorflow with gpus on YARN with no problem in tf-yarn.

Note that if your cluster does have GPUs tracked as a resource (say each node has 8 GPUs, and you want 1 per worker), you can request a single GPU and the correct cuda setup calls will be made to ensure your worker process only has access to that single GPU (see https://jcrist.github.io/skein/specification.html#resources and https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html for more information).

@pentschev pentschev added the question Further information is requested label Jan 8, 2021
@github-actions
Copy link

This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.

@randerzander
Copy link
Contributor Author

@quasiben may have some updates here

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@quasiben
Copy link
Member

With a recent skein update and PR dask/dask-yarn#140 -- dask-cuda should now be supported on YARN. @ayushdg have you had a chance to test ?

@ayushdg
Copy link
Member

ayushdg commented Mar 23, 2021

I was able to test and successfully run dask-cudf workflows with the latest release of skein and dask/dask-yarn#140 in a yarn environment (google cloud Dataproc).

@quasiben
Copy link
Member

Thanks @ayushdg ! I'll close now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants