-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad crashes "runtime error: index out of range [0] with length 0" #16863
Comments
bridge mode is not going to work for you since you dont have CNI plugins installed on your host (ref) The network config you posted is deprecated, however I cannot reproduce this even if I use that deprecated syntax (in resources). Peeking at the code it seems like you would need a network defined at the group level to pass this check that is panicking, is that the config you tried after reading #8875? Maybe share a minimally reproducing config without eg: volumes and services? Obviously nothing from the user should be able to panic the server, so definitely still a bug. |
@iluminae thank you. I removed data_dir (/opt/nomad/*) from all servers. But it is strange, 'cos on each Nomad UI page I saw no active jobs. Maybe there was some temporary data. Then I restarted all servers, and now Nomad works without crashing. I have to say, my first attempt was using dynamic ports instead of static ones. So now, after deploying, I see the error about `localhost" CNI plugin not being available. But before, I didn't get any errors. The deployment was successful. Just network ports worked as static. Now I've installed CNI plugins, and it works even with dynamic ports without any errors.
|
Hi @wusikijeronii 👋 Thanks for the report, and I'm sorry that you have such a poor first experience with Nomad, this is definitely not the experience we want to provide our users. I tried to reproduce this problem using the steps you described but I was not able to, so I suspect it may have been caused by a bad alloc in state that end up preventing the client from starting because restoring the alloc triggered the crash. Even without a reproduction the fix seemed clear enough from the details you provided so I opened #16921 to fix this. Until the fix is released I would suggest you make sure all network configuration is defined at the |
Hello. |
I was able to find a way to reproduce this. You need to have a task-level network with bridge mode: job "example" {
group "sleep" {
task "sleep" {
driver = "exec"
config {
command = "/bin/bash"
args = ["-c", "while true; do sleep 1; done"]
}
resources {
network {
mode = "bridge"
}
}
}
}
} So as long as you avoid this it should be fine. After #16921 is release this will not cause a panic anymore.
Purging the job only deletes the job itself, so related objects, like allocations, evals, deployments etc. are still kept in state until the garbage collector runs. You can manually trigger the GC with the |
Our company is evaluating using nomad, however nomad frequently crashes when a user submits a job.
The main error is:
Job:
Nomad version
I also tested on Nomad v1.5.0
Operating system and Environment details
Full log:
I also tried to run this task on three different machines using same OL8 and Ubuntu. Same result.
Nomad config:
In according to this issue I also tried to specify bridge mode in the network configuration,
UPD: I was wrong. I purged all jobs. Same issue on the standby time
The text was updated successfully, but these errors were encountered: