-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad segfaults when trying to preempt a docker-based job with lower priority #11342
Comments
Hi @aneutron ! Thanks for letting us know. I was able to reproduce and have a fix. Will PR the fix soon. |
Hey @notnoop ! Thanks a lot for the swift action on your part. Looking forward to build it and keep testing Nomad. Cheers ! |
Fix a bug where the scheduler may panic when preemption is enabled. The conditions are a bit complicated: A job with higher priority that schedule multiple allocations that preempt other multiple allocations on the same node, due to port/network/device assignments. The cause of the bug is incidental mutation of internal cached data. `RankedNode` computes and cache proposed allocations in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L42-L53 . But scheduler then mutates the list to remove pre-emptable allocs in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L293-L294, and `RemoveAllocs` mutates and sets the tail of cached slice with `nil`s triggering a nil-pointer derefencing case. I fixed the issue by avoiding the mutation in `RemoveAllocs` - the micro-optimization there doesn't seem necessary. Fixes #11342
Fix a bug where the scheduler may panic when preemption is enabled. The conditions are a bit complicated: A job with higher priority that schedule multiple allocations that preempt other multiple allocations on the same node, due to port/network/device assignments. The cause of the bug is incidental mutation of internal cached data. `RankedNode` computes and cache proposed allocations in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L42-L53 . But scheduler then mutates the list to remove pre-emptable allocs in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L293-L294, and `RemoveAllocs` mutates and sets the tail of cached slice with `nil`s triggering a nil-pointer derefencing case. I fixed the issue by avoiding the mutation in `RemoveAllocs` - the micro-optimization there doesn't seem necessary. Fixes #11342
Fix a bug where the scheduler may panic when preemption is enabled. The conditions are a bit complicated: A job with higher priority that schedule multiple allocations that preempt other multiple allocations on the same node, due to port/network/device assignments. The cause of the bug is incidental mutation of internal cached data. `RankedNode` computes and cache proposed allocations in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L42-L53 . But scheduler then mutates the list to remove pre-emptable allocs in https://github.com/hashicorp/nomad/blob/v1.1.6/scheduler/rank.go#L293-L294, and `RemoveAllocs` mutates and sets the tail of cached slice with `nil`s triggering a nil-pointer derefencing case. I fixed the issue by avoiding the mutation in `RemoveAllocs` - the micro-optimization there doesn't seem necessary. Fixes #11342
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.1.6 (b83d623fb5ff475d5e40df21e9e7a61834071078)
Operating system and Environment details
Issue
Hi,
First of all, thanks for the amazing product that's Nomad. I'm currently in the process of PoC-ing Nomad for a use case at our company, and it involves running jobs that use GPUs.
As it is a PoC, I'm only running Nomad in dev mode. I'm trying to use the default scheduler w/ the pre-empting feature enabled for all types of jobs.
My test scenario was the following:
Instead what happened is once I tried to run Job 2, the server/client segfaulted (due to a panic).
I successfully reproduced the error at least 5 times, using different configurations of GPU requirements but with the same global idea (multiple single GPU jobs, one multi-GPU job).
The jobs schedule fine on their own, but once I schedule the higher priority job where the lower prio job is already deployed, it crashes.
Reproduction steps
nomad agent -dev -bind 0.0.0.0 -plugin-dir=./plugins -config=./server-config.hcl -log-level=WARN
Then the steps to reproduce are as follows:
Expected Result
Actual Result
Server / Client segfaults.
Job file (if appropriate)
This is the file for Job 1:
The second Job is verbatim except for the Job Name and The Count (both for the group and the GPU).
Nomad Server logs (if appropriate)
with this stack trace:
Nomad Client logs (if appropriate)
(See above)
The text was updated successfully, but these errors were encountered: