-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data directory included in chroot causes infinite directory structure #2522
Comments
Yeah I agree we should do some amount of checking but this will largely come down to documentation and operator configuration. |
We just got hit by this and it rendered our entire nomad worker cluster useless. This should be up in bold in the documentation. |
I too experienced this recently. (Nomad v 0.8.4) The The time taken it takes is not obvious to a new user and it takes a bit of "I have seen this symptom before" to understand what is going on rather than just by looking by the info/error messages. Would there be a way too show some sort of progress indicator during the |
Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!
Fixes #2522 Skip embedding client.alloc_dir when building chroot. If a user configures a Nomad client agent so that the chroot_env will embed the client.alloc_dir, Nomad will happily infinitely recurse while building the chroot until something horrible happens. The best case scenario is the filesystem's path length limit is hit. The worst case scenario is disk space is exhausted. A bad agent configuration will look something like this: ```hcl data_dir = "/tmp/nomad-badagent" client { enabled = true chroot_env { # Note that the source matches the data_dir "/tmp/nomad-badagent" = "/ohno" # ... } } ``` Note that `/ohno/client` (the state_dir) will still be created but not `/ohno/alloc` (the alloc_dir). While I cannot think of a good reason why someone would want to embed Nomad's client (and possibly server) directories in chroots, there should be no cause for harm. chroots are only built when Nomad runs as root, and Nomad disables running exec jobs as root by default. Therefore even if client state is copied into chroots, it will be inaccessible to tasks. Skipping the `data_dir` and `{client,server}.state_dir` is possible, but this PR attempts to implement the minimum viable solution to reduce risk of unintended side effects or bugs. When running tests as root in a vm without the fix, the following error occurs: ``` === RUN TestAllocDir_SkipAllocDir alloc_dir_test.go:520: Error Trace: alloc_dir_test.go:520 Error: Received unexpected error: Couldn't create destination file /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/testtask/nomad/test/testtask/.../nomad/test/testtask/secrets/.nomad-mount: open /tmp/TestAllocDir_SkipAllocDir1457747331/001/nomad/test/.../testtask/secrets/.nomad-mount: file name too long Test: TestAllocDir_SkipAllocDir --- FAIL: TestAllocDir_SkipAllocDir (22.76s) ``` Also removed unused Copy methods on AllocDir and TaskDir structs. Thanks to @eveld for not letting me forget about this!
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
0.5.6
Operating system and Environment details
OS: Centos 7
Consul: 0.7.5
Issue
Job hangs on "Building task directory" when Nomad's data directory is included in the chroot (for example, using /etc/nomad.d/ as a data directory) until either the disk fills up, or memory runs out.
Reproduction steps
Set Nomad's data directory to /etc/nomad.d/, run a Job which utilizes a chroot.
Also, you can simply add your data directory to your "chroot_env" in the client config for Nomad. This will reproduce the same behavior.
Nomad Server logs (if appropriate)
Logs did not produce anything meaningful.
Nomad Client logs (if appropriate)
Setup Failure failed to build task directory for "example": Couldn't create symlink: symlink python2.7.1.gz /etc/nomad.d/alloc/ba334436-96e1-61bf-e6b1-9f8ff3c56a63/example/etc/nomad.d/alloc/ba334436-96e1-61bf-e6b1-9f8ff3c56a63/example/etc/nomad.d/alloc/2bb9b1a0-95c3-c06b-c7c9-6752eda18d2f/example/etc/nomad.d/alloc/2bb9b1a0-95c3-c06b-c7c9-6752eda18d2f/example/etc/nomad.d/alloc/b8318d40-4b90-5a65-90e5-99fdd57ec522/example/usr/share/man/man1/python2.1.gz: no space left on device
Job file (if appropriate)
A lot of this boils down to using a directory such as /etc to actively write data is typically a no-no. The issue is not limited to that, from what I've seen and reproduced, though. If I set the data directory in my chroot_env it will cause this issue regardless. It would be nice to detect this kind of thing and actively prevent the data directory and the chroot from overlapping, though. I didn't see any warnings in the server logs or anything about this in the documentation though.
The text was updated successfully, but these errors were encountered: