-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Deep Learning VM (DLVM) images as a base image option for Caliban. #20
base: main
Are you sure you want to change the base?
Conversation
…elp function to list the DLVM types.
…e-auth release issue.
Codecov Report
@@ Coverage Diff @@
## master #20 +/- ##
==========================================
- Coverage 45.67% 44.97% -0.70%
==========================================
Files 17 17
Lines 2728 2777 +49
==========================================
+ Hits 1246 1249 +3
- Misses 1482 1528 +46
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sagravat , this is great! I want to see if we can separate the logic here a bit more. Ideally, we would be able to reuse the existing run_interactive
and run_notebook
commands (I think that's the names), and make this PR into TWO changes.
The first is the --base_image
support, and the second is the GCP notebook submitter.
The second is the GCP scheduler, which I think will work without the DLVM base images.
I'm having trouble parsing the new entrypoint that the second change requires. Is it really different than how we launch jupyterlab normally? If not, I wonder if we could actually enable this by making it easier for folks to install ANY jupyterlab extension.
Happy to chat more. Take a look and let me know if these notes make sense.
"--dlvm", | ||
help="DLVM base image type. Must be one of " | ||
"{}".format(dlvm_types) + ". If supplied, " | ||
"Caliban will skip the build and push steps and use this image tag.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, I'm going to add comments as I read, so this may be clear later! Naively I would have assumed that the DLVM would be a base image, and that you could still install dependencies on top, right? If that is true, can we call this argument --base_image
?
Right now, --image_id
removes any requirement.txt installation; it's truly a flag to skip any build, including even getting your code into the image.
Really, the syntax should be caliban run e2a4af785bdb
...
@@ -486,7 +486,7 @@ def build_job_specs( | |||
experiments=experiments) | |||
|
|||
|
|||
def generate_image_tag(project_id, docker_args, dry_run: bool = False): | |||
def generate_image_tag(project_id, docker_args, dlvm: str = None, dry_run: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we call it --base_image
we can add that key to the generate_docker_args
function here: https://github.com/google/caliban/blob/master/caliban/cli.py#L536
return _dlvm_config(job_mode).get(dlvm_arg) | ||
|
||
|
||
def _dlvm_config(job_mode: JobMode) -> Dict[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, this is good. I think I see now why you have --dlvm
vs --base_image
. What we COULD do is have a prefix for --base_image
, something like --base_image dlvm:pytorch
, that would force a lookup here. That would let this feature give us general base images too.
@@ -391,9 +391,17 @@ def _notebook_entries(lab: bool = False, version: Optional[str] = None) -> str: | |||
|
|||
library = "jupyterlab" if lab else "jupyter" | |||
|
|||
return """ | |||
if not dlvm: | |||
return """ | |||
RUN pip install {}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, wait, good catch that we need up date this pip
too! I can do that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have this --lab
flag. I'm thinking that we should only do this special installation if --lab
is specified. If not, even with a dlvm
base image, we should just install normal jupyter
. wdyt?
RUN pip install {}{} | ||
""".format(library, version_suffix) | ||
else: | ||
return """ | ||
RUN /opt/conda/bin/pip install \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this only work on the deep learning VMs? and does it work on ALL the deep learning VMs?
If it works without the dlvms, maybe it needs its own flag.
@@ -511,7 +526,10 @@ def _dockerfile_template( | |||
if base_image_fn is None: | |||
base_image_fn = base_image_id | |||
|
|||
base_image = base_image_fn(job_mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead pass
base_image_fn = lambda job_mode: _dlvm_id(job_mode, dlvm)
Then we can keep to the API.
@@ -378,7 +378,7 @@ def _credentials_entries(user_id: int, | |||
return ret | |||
|
|||
|
|||
def _notebook_entries(lab: bool = False, version: Optional[str] = None) -> str: | |||
def _notebook_entries(lab: bool = False, version: Optional[str] = None, dlvm: bool = False) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this flag called scheduler
instead? I think it doesn't depend on dlvm
and COULD be a separate flag.
if inject_notebook.value != 'none': | ||
install_lab = inject_notebook == NotebookInstall.lab | ||
if dlvm is None: | ||
dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about remove the if/else
and make the line
dockerfile += _notebook_entries(lab=install_lab, version=jupyter_version, dlvm=bool(dlvm))
|
||
dockerfile += """ | ||
|
||
USER {uid}:{gid} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! Now that we're on python 3.6, we can actually make this
dockerfile += f"""
USER {uid}:{gid}
"""
using f-strings.
This contribution enables Deep Learning VM (DLVM) images from the Google Cloud AI Platform to be used as base images for execution in local, cloud, shell, and notebook modes. The notebook mode includes a Jupyterlab extension widget that allows the user to directly submit the notebook for training on the AI Platform with configurable hardware accelerators like GPUs and TPUs.