Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fail if /root/.profile is missing in user-specified Docker image #1086

Closed
jvstme opened this issue Apr 3, 2024 · 9 comments · Fixed by #2151
Closed

Do not fail if /root/.profile is missing in user-specified Docker image #1086

jvstme opened this issue Apr 3, 2024 · 9 comments · Fixed by #2151

Comments

@jvstme
Copy link
Collaborator

jvstme commented Apr 3, 2024

Steps to reproduce

Run a configuration with image: fedora:39

> cat hello.dstack.yml
type: task

image: fedora:39
commands:
  - echo Hello

resources:
  cpu: 1..
  memory: 0.3GB..

> dstack run . -f hello.dstack.yml

Expected behaviour

The configuration runs succesfully

Actual behaviour

The run fails.
CLI:

ancient-impala-1 provisioning completed (terminating)
Run failed with error code JobTerminationReason.INTERRUPTED_BY_NO_CAPACITY. Check CLI and server logs for more 
details.

Server logs:

ERROR 2024-04-03T14:53:11.772 dstack._internal.server.background.tasks.process_running_jobs The docker container of the job 'ancient-impala-1-0-0' is not working: exit code: 2, error 
DEBUG 2024-04-03T14:53:11.773 dstack._internal.server.background.tasks.process_running_jobs runner healthcheck: {'state': 'pending', 'container_name': 'ancient-impala-1-0-0', 'status': 'exited', 'running': False, 'oom_killed': False, 'dead': False, 'exit_code': 2, 'error': ''}

shim.log on the cloud instance:

2024/04/03 12:52:22 Pulling image
2024/04/03 12:52:22 Creating container
2024/04/03 12:52:22 Unable to stop the container: Error response from daemon: No such container: ancient-impala-1-0-0
2024/04/03 12:52:22 Unable to remove the container: Error response from daemon: No such container: ancient-impala-1-0-0
2024/04/03 12:52:22 Running container, id=baaf276e0a395f9320d599e84ecbbd9518becd0e712bcba51dbd96b3902802a0
2024/04/03 12:53:11 Container finished successfully, id=baaf276e0a395f9320d599e84ecbbd9518becd0e712bcba51dbd96b3902802a0

Container logs on the cloud instance:

# docker logs baaf276e0a395f9320d599e84ecbbd9518becd0e712bcba51dbd96b3902802a0
/bin/sh: line 1: apt-get: command not found
Fedora 39 - x86_64                               23 MB/s |  89 MB     00:03    
Fedora 39 openh264 (From Cisco) - x86_64        3.7 kB/s | 2.6 kB     00:00    
Fedora 39 - x86_64 - Updates                     20 MB/s |  35 MB     00:01    
Dependencies resolved.
================================================================================
 Package                 Arch        Version                 Repository    Size
================================================================================
Installing:
 openssh-server          x86_64      9.3p1-10.fc39           updates      466 k
Upgrading:
 libblkid                x86_64      2.39.3-6.fc39           updates      117 k
 libmount                x86_64      2.39.3-6.fc39           updates      155 k
 libsmartcols            x86_64      2.39.3-6.fc39           updates       67 k
 libuuid                 x86_64      2.39.3-6.fc39           updates       28 k
 systemd-libs            x86_64      254.10-1.fc39           updates      687 k
 util-linux-core         x86_64      2.39.3-6.fc39           updates      508 k
Installing dependencies:
 dbus                    x86_64      1:1.14.10-1.fc39        fedora       8.1 k
 dbus-broker             x86_64      35-2.fc39               updates      176 k
 dbus-common             noarch      1:1.14.10-1.fc39        fedora        15 k
 device-mapper           x86_64      1.02.197-1.fc39         updates      138 k
 device-mapper-libs      x86_64      1.02.197-1.fc39         updates      176 k
 kmod-libs               x86_64      30-6.fc39               fedora        67 k
 libargon2               x86_64      20190702-3.fc39         fedora        28 k
 libfdisk                x86_64      2.39.3-6.fc39           updates      162 k
 libseccomp              x86_64      2.5.3-6.fc39            fedora        71 k
 libutempter             x86_64      1.2.1-10.fc39           fedora        26 k
 openssh                 x86_64      9.3p1-10.fc39           updates      439 k
 systemd                 x86_64      254.10-1.fc39           updates      4.7 M
 systemd-pam             x86_64      254.10-1.fc39           updates      360 k
 util-linux              x86_64      2.39.3-6.fc39           updates      1.2 M
 xkeyboard-config        noarch      2.40-1.fc39             updates      971 k
Installing weak dependencies:
 cryptsetup-libs         x86_64      2.6.1-3.fc39            fedora       491 k
 diffutils               x86_64      3.10-3.fc39             fedora       398 k
 libbpf                  x86_64      2:1.1.0-4.fc39          fedora       165 k
 libxkbcommon            x86_64      1.6.0-1.fc39            updates      142 k
 qrencode-libs           x86_64      4.1.1-5.fc39            fedora        61 k
 systemd-networkd        x86_64      254.10-1.fc39           updates      647 k
 systemd-resolved        x86_64      254.10-1.fc39           updates      293 k

Transaction Summary
================================================================================
Install  23 Packages
Upgrade   6 Packages

... (cut for brevity) ...                                   

Complete!
sed: can't read /root/.profile: No such file or directory

dstack version

0.17.0

Server logs

No response

Additional information

No response

@jvstme jvstme added the ux label Apr 3, 2024
@peterschmidt85
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@peterschmidt85
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

@peterschmidt85 peterschmidt85 closed this as not planned Won't fix, can't repro, duplicate, stale May 18, 2024
@jvstme
Copy link
Collaborator Author

jvstme commented May 20, 2024

Still relevant

@jvstme jvstme reopened this May 20, 2024
@peterschmidt85
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@peterschmidt85
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

@peterschmidt85 peterschmidt85 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2024
@jvstme jvstme reopened this Oct 22, 2024
@jvstme
Copy link
Collaborator Author

jvstme commented Oct 22, 2024

Related to #1535, may be fixed there

@github-actions github-actions bot removed the stale label Oct 23, 2024
Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Nov 22, 2024
@r4victor
Copy link
Collaborator

@un-def, did you get a chance to see if it can be fixed now as a part of #1535?

@un-def
Copy link
Collaborator

un-def commented Dec 23, 2024

It can be easily fixed (don't patch ~/.profile if it does not exist) right now, but I'm not sure how it could be done properly. Currently, we use OpenSSH-specific ~/.ssh/environment to deliver all variables but PATH, and ~/.profile for PATH (since variables from pam_env are applied on top of ~/.ssh/environment, and on Ubuntu there is /etc/environment with something like PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin", which effectively overwrites our PATH).

We have plenty of options:

  • Use pam_env's ~/.pam_environment instead of ~/.ssh/environment
  • Don't UsePAM to avoid /etc/environment altogether
  • Bring our own patched OpenSSH (probably also patched) Dropbear
  • ...and more

The problem is that most options are distro-specific, shell-specific, For example, even ~/.profile doesn't work reliably:

When Bash is invoked as an interactive login shell, or as a non-interactive shell with the --login option, it first reads and executes commands from the file /etc/profile, if that file exists. After reading that file, it looks for ~/.bash_profile, ~/.bash_login, and ~/.profile, in that order, and reads and executes commands from the first one that exists and is readable.

That is, if there is at least one of the files with higher precedence, ~/.profile is simply ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants