-
-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ray launcher WIP #518
Ray launcher WIP #518
Conversation
This pull request introduces 1 alert when merging 90ff502 into 13d663c - view on LGTM.com new alerts:
|
Great to see a launcher for ray forthcoming! Two brief comments:
|
Yup, I am excited too!
Unfortunately Ray support for remote cluster execution is not transparent.
Based on my answer to 1, I think this is only going to work if you are running from the head node. |
Update pull requests with support for returning JobReturns. I'm currently working on enabling the integration test suite. There's so many failures right now 😅 |
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/hydra/launcher/ray.yaml
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/hydra/launcher/ray.yaml
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/hydra/launcher/ray.yaml
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/hydra/launcher/ray.yaml
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_launcher.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_launcher.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_launcher.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_remote_invoke.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_launcher.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_launcher.py
Outdated
Show resolved
Hide resolved
This pull request introduces 2 alerts when merging ad3d76c into 079c325 - view on LGTM.com new alerts:
|
This pull request introduces 2 alerts when merging ee7f500 into 079c325 - view on LGTM.com new alerts:
|
plugins/hydra_joblib_launcher/hydra_plugins/hydra_joblib_launcher/joblib_launcher.py
Outdated
Show resolved
Hide resolved
Major changes for review:
|
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_aws_config.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_aws_config.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_launcher_util.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_local_launcher.py
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_remote_invoke.py
Outdated
Show resolved
Hide resolved
Major changes for review:
|
This pull request introduces 1 alert when merging d16e454 into 3206dc0 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good progress.
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/__init__.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/__init__.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/__init__.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/__init__.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_aws_launcher.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_local_launcher.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_remote_invoke.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_aws_launcher.py
Outdated
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/conf/__init__.py
Show resolved
Hide resolved
plugins/hydra_ray_launcher/hydra_plugins/hydra_ray_launcher/ray_launcher_util.py
Outdated
Show resolved
Hide resolved
This pull request introduces 1 alert when merging 70d0564 into 52b6da8 - view on LGTM.com new alerts:
|
tests all passed
7aeab8f
to
4d020e8
Compare
Rebased onto master again and response to omry's comments:
I'm not sure what you meant here, this is how the launcher config looks like now :)
everything combined it is 76m; it doesn't actually download all the dependencies of
Yes, I will follow up on this. I just created #1097. (I thought I already created an issue to track but looks like I forgot to. ) |
This pull request introduces 1 alert when merging 4d020e8 into e98f518 - view on LGTM.com new alerts:
|
roger.
76MB is still large. it can easily take 2-3 minutes to upload.
roger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's time to merge this to master!
me right now: > 🙀 I will keep an eye on the integration tests. I haven yet to update the env variable for Hydra's circleCI. |
Update 10/14
Follow up items:
Update 09/30
A few items I want to follow up separately (will create PR for these)
Update 09/28
Summary of the changes:
1. Address omry's comments.
2. Changes to integration test:
The goal is "No outbound traffic for the test instances." The barrier is the
pip install
andconda create
we need to run while setting up the instance which requires us to open 443 to all outbound traffic.To get around this: for
conda
and dependency packages needed for starting the cluster, I created a base AMI that has everything pre-installed. 2) for Hydra related packages, build the wheels at test time and install the wheels on the instance.The upside is we achieve "no outbound traffic for the instance", the downside is that means we need update AMI when dependencies changes. To help with that I created a script
(
create_ami.py
) to automate building the AMI.It would be good to build nightly AMIs and wheels, that's something I want to work on soon.
output from running `create_ami.py`
Stack trace
4. install conda in circleCI linux docker
Previously we pin the circleCI linux docker image to be python:3.8. however, the image runs on python 3.8.6 which is not yet available to be installed in conda. As a result, the ray launcher tests fails (cloudpickle requires the exact same version of python used on pickle and unpickle side)
To be consistent with how tests are run in MACOS and WIN, I added the miniconda installation for linux machines as well. The installation takes a few secs, so I didn't add cache for it.
Update 09/21
Now that #815 has finally landed. This PR is unblocked finally! This is my priority this week.
TODO items.
In order to upload and install latest wheels in the integration, I want to:
PLUGINS
env variable), build wheels, and scp them all to the ec2 instance.This way, we can remove all outbound traffic of the testing ec2 instances.
Update 09/08
edit: moved the TODO items to the latest update.
Update 09/01
This PR has been blocked by #815. Now that we've figured out a good solution for #815, I will go ahead and get 815 in first and then circle back here.
Also I'm going to create data class for both ray init, ray remote and boto configs (we will only add typing for common boto fields, and the boto config will extend Dict[str, Any])
For the integration tests to work on the latest code, we are going to build wheels and upload to S3 with each integration test run. This will be likely be a a separate PR.
Motivation
This is built on #515, sorry I had to open a new pull requests. I still need to figure out what's a better workflow with forking & syncing.
This PR address some comments from PR515:
Plan to add in next PR(s):
Update 04/10
Add Integration tests
Ray up automatically update cluster so no need for us to provide an option for update.
Now JobReturns are copied back to laptop
refactor _dump_func_params a bit more.
Next:
Have you read the Contributing Guidelines on pull requests?
Yes/No
Test Plan
Integration tests
Run launcher in both local and AWS mode
Run the
Related Issues and PRs
(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)