-
-
Notifications
You must be signed in to change notification settings - Fork 22
Fix DRMAACluster.scale_down and drop Adaptive._retire_workers #85
base: master
Are you sure you want to change the base?
Conversation
35da8f6
to
9ffeca3
Compare
Pass all of `worker_ids` through a `set` before converting them to a `list` to ensure there are no duplicates.
As the purpose of `scale_down` is to shutdown workers through the scheduler directly, pull code from `stop_workers` to do exactly this with DRMAA. Update `stop_workers` to make use of `scale_down` for this functionality as well. Should help ensure `DRMAACluster` matches more closely to the expected `Cluster` specification.
The `_retire_workers` method seems to largely duplicate the same in `distributed`'s `Adaptive`. So just go ahead and drop our implementation. This was tried before, but didn't work do to duplicate `retire_workers` calls to the `Scheduler` in both `Adaptive._retire_workers` and `DRMAACluster.scale_down`. However as the behavior of `DRMAACluster.scale_down` has now been corrected, it should now be possible to drop our implementation of `Adaptive._retire_workers`. Hence we now do drop it here.
d6712d5
to
279c3e5
Compare
Have tested this on our cluster and in a couple testing Docker containers. Seems to work well. Only minor thing is we are not seeing logging from this line in |
Should add |
81955bd
to
d004a3c
Compare
Instead of having coroutines used for `scale_up` and `scale_down`, use regular methods in their place. Move the coroutines into internal spec. This breaks the API of dask-drmaa. However is technically more correct given the Cluster API's current expectations.
d004a3c
to
750cd0c
Compare
Looks good to me, code changes seem sensible and looked fine when I lightly tested with our cluster. Nice work cleaning it up 😃 |
Thanks for testing @azjps. Glad to hear it worked. Generally am happy with this as well. Just on the fence about including the last commit. Any thoughts? |
This reverts commit 750cd0c.
Fixes #65
Replaces #81
Rewrites
DRMAACluster.scale_down
to use DRMAA to terminate the workers. Also makes use ofDRMAACluster.scale_down
inDRMAACluster.stop_workers
(where this code was pulled from). With this change, it should now be possible to dropAdaptive._retire_workers
, which is already present in theAdaptive
parent class.cc @azjps