-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failing to allocate job in busybox environment #6721
Comments
Pulling some interesting bits here so that if this third-party pastebin disappears we don't lose context for the issue. Otherwise marking for investigation. Jobspec job "metabase" {
region = "global"
datacenters = ["dc1"]
type = "service"
group "metabase" {
count = 1
task "metabase" {
driver = "java"
user = "nobody"
config {
jar_path = "metabase.jar"
jvm_options = ["-Xmx256m","-Xms256m"]
}
artifact {
source= "http://downloads.metabase.com/v0.33.5.1/metabase.jar"
}
service {
name = "metabase"
port = "http"
tags = ["java","metabase"]
check {
type = "http"
port = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
resources {
cpu = 100
memory = 256
network {
mbits = 5
port "http" {
static = 3000
}
}
}
}
}
} Interesting bits of the logs
start script: opkg update && opkg install haproxy
/etc/init.d/haproxy start haproxy config: defaults
mode http
frontend stats
bind *:1936
stats uri /
stats show-legends
no log
frontend http_front
bind *:8080
default_backend http_back
backend http_back
balance roundrobin
server nomad 127.0.0.1:4646 source 192.168.1.1 |
6721.tar.gz
use tty to submit job observe issue either in /tmp/log/nomad.log or visiting http://192.168.1.1:8080 at your desktop;
here i used raw_exec driver and nomad in dev mode to avoid rebuilding openwrt kernel to include cgroups support; if you're wondering how nomad agent is started look into |
Is there something unusual about that tarball, @turbo-cafe ? I get the following when unpacking it in a virtual machine:
If you could provide a gist with the minimal set of files we need to reproduce, rather than an opaque tarball, that would help us out a lot. |
you'll need to gunzip it first, or |
here's gist of included files https://gist.github.com/turbo-cafe/60ad80343688555bf385ed9000fc7f07 |
Yes, I'm aware. GNU tar does the right thing when it encounters a
Can you provide a link to where this WRT image comes from? I really can't go running arbitrary virtual machines someone is sending us. |
you were right, tar xf works fine; i just unpacked it fine and now i'm confused what on earth might be the trouble with file, just for reference sha256sum 6721.tar.gz Makefile:25 uses wget to download imagebuilder for v19.07-rc1 from downloads.openwrt.org; |
downloaded file matched hash with original, to get my peace of mind back i checked file with https://www.virustotal.com/gui/file/9c5ee005099b6255c0a58813fb4b573f7511fea067734e2c7f67e95db8e43ab8/detection |
Ah, gotcha. Thanks. We'll circle back here once we've had a chance to repro. |
an update about where i am on this issue: i can start metabase.job on mainstream linux e.g. glibc, bash, coreutils; similarly i can java -jar on busybox system just fine, and that rules out all moving parts but nomad and busybox; i tend to think about this as a configuration issue, and it would be just fine if there was any kind of reference documentation with requirements about environment nomad expects; knowing how to fill the gap between what nomad expects and what busybox has to offer, would allow me move forward preparing working environment |
Ok, I was able to replicate the problem with the Makefile (with some tweaks, see below). I was originally thinking this was a problem with the chroot environment, but looking at the code that's not the case. This is happening when Nomad sets up the alloc directory so that we can later chroot into it. So that operation is happening as the Nomad user in the context of the host. The error message, again:
But root@OpenWrt:/# ls -lah /etc/passwd
-rw-r--r-- 1 root root 242 Nov 22 21:02 /etc/passwd The error bubbles up from client/allocdir/fs_unix.go#L42, which is calling into the golang stdlib's own I'm not quite sure what's going on there yet. But there are different code paths in So my recommendation is to try this with a glibc-linked build and see if that gets this working for you (or at least new errors! 😀 ). And if you'd like to contribute your expertise in #5643 I'm sure it'd be welcome! re: the Makefile Just a heads up, the build also required 41c41,42
< GOPATH=$(CURDIR)/target/go go build -x -o target -tags "nonvidia release ui" --ldflags '-linkmode external -extldflags "-static"' github.com/hashicorp/nomad
---
> GOPATH=$(CURDIR)/target/go go build -x -o target/nomad -tags "nonvidia release ui" --ldflags '-linkmode external -extldflags "-static"' github.com/hashicorp/nomad
>
46c47,54
< qemu-system-x86_64 -enable-kvm -cpu host -smp cores=2 -m 4096 -drive file=target/openwrt-19.07.0-rc1-x86-64-combined-ext4.img,id=d0,if=none -device ide-hd,drive=d0,bus=ide.0 -soundhw ac97 -device virtio-net-pci,netdev=lan -netdev tap,id=lan,ifname=tap0,script=no,downscript=no -device virtio-net-pci,netdev=wan -netdev user,id=wan
---
> sudo qemu-system-x86_64 -enable-kvm \
> -cpu host -smp cores=1 -m 4096 -nographic \
> -drive file=target/openwrt-19.07.0-rc1-x86-64-combined-ext4.img,id=d0,if=none,format=raw \
> -device ide-hd,drive=d0,bus=ide.0 \
> -device virtio-net-pci,netdev=lan \
> -netdev tap,id=lan,ifname=tap0,script=no,downscript=no \
> -device virtio-net-pci,netdev=wan \
> -netdev user,id=wan |
the key here's that nomad binary was linked to glibc, that's the main reason i was able to start metabase.job on my desktop, but not on busybox system; the solution to my problem was to link nomad with musl and i was able to start metabase.job; i never bothered about static linking, since openwrt provides musl libc |
Great to hear that worked out! I'd love to hear more about what you're doing -- installing Nomad on OpenWRT is definitely a use case we haven't seen before! |
That's really interesting that you were even able to get this far with a glibc binary on a musl system. I'd also be very curious what you were doing since that shouldn't have worked. |
I've been experimenting with this as well building minimal Nomad env. I'm not sure if this is a thread by OP but very handy on linking musl. I managed to get it working with glibc and minimal nsswitch libs and /etc/nsswitch.conf specifying only files but it might be nice to have an alternative userlookup option without dependencies on libnss. @tgross @the-maldridge ? |
Unless you actually need nsswitch functionality, linking fully statically and just using the files database should be enough. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
in busybox environment allocating job fails with "unknown user nobody" or, similarly, with user root when whitelisted; it makes me think message in log file is misleading and the reason must be in environment, guessing i installed coreutils over busybox applets but that didn't help and i didn't find any reference document about what specifically nomad expects e.g. wget to download jars or bash to run jobs;
issue is reproducible with raw_exec driver, as well as with exec or java; you might want to skip job attached below and use any hello-world job with raw_exec, since otherwise you'd have to compile custom kernel for openwrt image;
Nomad version
v0.10.1 built with nonvidia tag
Operating system and Environment details
busybox 1.30.1 openwrt 19.07 with custom kernel 4.14.151
Issue
[ERROR] client.alloc_runner: prerun failed: alloc_id=28223f0d-a174-ae8b-272a-64a107
80aa72 error="pre-run hook "alloc_dir" failed: user: unknown user nobody"
Logs
https://hastebin.com/dupavitido.bash
Job
https://hastebin.com/jukavazesi.bash
Reproduction steps
The text was updated successfully, but these errors were encountered: