Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop all custom docker image requirements #1535

Open
Tracked by #1782
jvstme opened this issue Aug 11, 2024 · 3 comments
Open
Tracked by #1782

Drop all custom docker image requirements #1535

jvstme opened this issue Aug 11, 2024 · 3 comments

Comments

@jvstme
Copy link
Collaborator

jvstme commented Aug 11, 2024

Current

dstack allows running custom Docker images by specifying them in the image property. However, not all images can be used. These are some of the image requirements:

  • The software in the image should allow running as root
  • The image should have either apt-get or yum
  • The image should have /bin/sh
  • etc.

Proposed

Drop all image requirements and support all valid Docker images, including images built FROM scratch.

Implementation notes

The main source of requirements seems to be the installation and configuration of the OpenSSH server. Possible solutions to dropping the requirements related to the OpenSSH server include:

  • Shipping a statically-linked OpenSSH server binary that would allow running without root privileges and would not need a package manager for installation.
  • Using an alternative SSH server implementation in Go, so that the server could be part of the dstack-runner binary.
Copy link

This issue is stale because it has been open for 30 days with no activity.

@jvstme
Copy link
Collaborator Author

jvstme commented Oct 22, 2024

Some examples of images that don't work and their respective errors:

  • nvcr.io/nim/meta/llama3-8b-instruct:latest (or any other images with a non-root user) when run on RunPod or Vast.ai - never starts, killed by provisioning timeout
  • prom/prometheus - Error: Distribution not supported
  • fedora - sed: can't read /root/.profile: No such file or directory
  • gcr.io/etcd-development/etcd:v3.4.34 - exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown
  • bitnami/thanos - unable to find user root: no matching entries in passwd file

@un-def
Copy link
Collaborator

un-def commented Oct 23, 2024

Action Plan (WIP)

Completing the plan would allow to (at least):

  • run non-root images on backends where we cannot override the container user (e.g., NIM on RunPod);
  • run non-deb/rpm-based images.

Keep the default image user

  • Get USER from the Docker image, as it's already done for ENTRYPOINT and CMD (see JobConfigurator), store it as JobSpec.user.
  • (Optional) Add a new user property to the run configurations to override the default image user.
  • (Optional) If the user property is set and not equal to the default image user, exclude offers from backends where we cannot override the container user (RunPod, Vast.ai).
  • Start the container as root (if possible) to ensure that both the runner and the SSH server have sufficient permissions.
  • [runner] Execute the job with Cmd.SysProcAttr.Credential.{Uid,Gid} set according to the JobSpec.user.
  • [runner] Put SSH public keys into both USER/user's and root's ~/.ssh/authorized_keys.
  • [CLI] Use USER/user instead of root in the ~/.dstack/ssh/config (ssh run_name → log in as a default/overridden user, ssh root@run_name → log in as root) if it's possible to log in as user (the user has a home dir and a proper login shell, not nologin/false).

Download the runner

  • On shim-enabled instances, download the runner (and the SSH server if OpenSSH is used) with shim.
  • On backends without shim, try all possible tools (GNU Wget, Busybox Wget, cURL, urllib.urlopen, etc.) to download the runner/SSH server and fail if none available.

Bring our own SSH server

Statically linked OpenSSH or Dropbear or crypto/ssh-based Golang implementation embedded into the runner — yet to be decided.

  • [runner] If the runner is started by root, configure root SSH access. In addition, if JobSpec.user != root, configure non-root SSH access. In any case, JobSpec.user is the default SSH user (that is, JobSpec.user is the User in the SSH client config generated by dstack client).

(Optional) Images without *nix userland

  • [runner] Bring our own shell and tools (BusyBox).

@un-def un-def self-assigned this Dec 2, 2024
un-def added a commit that referenced this issue Dec 3, 2024
* If not set, use the default user from the image (if it, in turn, is
  not set either, Docker uses `root` as a default value)
* The container user is still set to `root`, as we need root privileges,
  at least to install sshd, but the runner executes the job (shell
  script with `commands` from the run configuration) as `user`.
* If the `user` is not root, it gets its own copy of
  `~/.ssh/authorized_keys` and `~/.ssh/environment`, making it possible
  to `ssh user@run-name` (the default user is still `root`, that is,
  `ssh run-name` logs in as root)
* `~/.ssh/environment` is now generated by the runner, not the outer
  shell script (container entrypoint), and includes all the same
  variables as the job env (including `DSTACK_*` vars and vars from
  the `env` property of the run configuration)

Part-of: #1535
un-def added a commit that referenced this issue Dec 5, 2024
* If not set, use the default user from the image (if it, in turn, is
  not set either, Docker uses `root` as a default value)
* The container user is still set to `root`, as we need root privileges,
  at least to install sshd, but the runner executes the job (shell
  script with `commands` from the run configuration) as `user`.
* If the `user` is not root, it gets its own copy of
  `~/.ssh/authorized_keys` and `~/.ssh/environment`, making it possible
  to `ssh user@run-name` (the default user is still `root`, that is,
  `ssh run-name` logs in as root)
* `~/.ssh/environment` is now generated by the runner, not the outer
  shell script (container entrypoint), and includes all the same
  variables as the job env (including `DSTACK_*` vars and vars from
  the `env` property of the run configuration)

Part-of: #1535
@un-def un-def removed their assignment Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants