-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNI: use check command when restoring from restart #24658
Conversation
98343be
to
66dcfcb
Compare
66dcfcb
to
9daf267
Compare
Your reproduction steps say:
is the podman driver actually affected by it? I thought that one would not use a pause container and always let nomad configure the network namespaces on it's own. |
You know what, I assumed that it used pause containers too (I hardly ever have run it myself) but that's not the case: https://github.com/hashicorp/nomad-driver-podman/blob/main/driver.go#L87. So yeah, just Docker here. Will edit the repro steps. |
Luckily no, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this using an AWS environment with 1 server and 1 client instance. The client instance has CNI v1.3.0 plugins installed.
JobSpec:
job "example" {
group "nginx" {
network {
mode = "bridge"
port "http" {
to = 80
}
}
task "nginx" {
driver = "docker"
config {
image = "nginx:latest"
ports = ["http"]
}
resources {
cpu = 500
memory = 256
}
}
}
}
Running through the steps using a build from main f4529485563924462dbdccdd1b4cacbd9d68616e
when I rebooted the instance the allocation was restarted, but the networking was broken and I was unable to perform a simple curl to NGINX.
I then tested this patch 9daf267e33bd502d4a386dc1dd6d4a728894c194
I performed the same steps as detailed in the PR and performed in the previous test. When the instance was rebooted, the allocation failed and a new allocation scheduled in its place.
Failed allocation events:
2024-12-13T08:59:18Z Terminated Exit Code: 0
2024-12-13T08:59:18Z Setup Failure failed to setup alloc: pre-run hook "network" failed: failed to configure networking for alloc: Interface name nomad not found
2024-12-13T08:57:08Z Started Task started by client
2024-12-13T08:57:08Z Driver Downloading image
2024-12-13T08:57:07Z Task Setup Building Task Directory
2024-12-13T08:57:07Z Received Task received by client
Client logs around time of failure marking:
2024-12-13T08:59:18.385Z [ERROR] allocrunner/alloc_runner.go:403: client.alloc_runner: prerun failed: alloc_id=8f3b3d5a-906a-7f53-8d72-e9c782bb4dee error="pre-run hook \"network\" failed: failed to configure networking for alloc: Interface name nomad not found"
2024-12-13T08:59:18.385Z [INFO] taskrunner/task_runner.go:1468: client.alloc_runner.task_runner: Task event: alloc_id=8f3b3d5a-906a-7f53-8d72-e9c782bb4dee task=nginx type="Setup Failure" msg="failed to setup alloc: pre-run hook \"network\" failed: failed to configure networking for alloc: Interface name nomad not found" failed=true
2024-12-13T08:59:18.387Z [INFO] docker/handle.go:229: client.driver_mgr.docker: stopped container: container_id=9d6cc3a56a51596885f36fec7c4d74ea1002e8727c1fc213851ac42fed53e8cb driver=docker
10760d7
to
04a444a
Compare
I've updated this PR with a retry where if the check fails the first time, we attempt to recreate the network namespace (attempting this once). This should reduce the cases where, after a reboot, the allocation cannot be restored and gets failed. And I've added tests covering that logic. |
04a444a
to
137e7d6
Compare
137e7d6
to
a26e6c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left two very minor and ignorable comments.
I did the same test outlined in my previous review and when the client instance rebooted and the Nomad process started, the allocation was restarted and the task process API still available.
Task events:
Recent Events:
Time Type Description
2025-01-07T09:12:56Z Started Task started by client
2025-01-07T09:12:55Z Driver Downloading image
2025-01-07T09:12:40Z Restarting Task restarting in 15.75160817s
2025-01-07T09:12:40Z Terminated Exit Code: 0
2025-01-07T09:11:12Z Started Task started by client
2025-01-07T09:11:07Z Driver Downloading image
2025-01-07T09:11:07Z Task Setup Building Task Directory
2025-01-07T09:11:06Z Received Task received by client
Networking confirmation:
$ curl 10.0.1.193:29665
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
When the Nomad client restarts and restores allocations, the network namespace for an allocation may exist but no longer be correctly configured. For example, if the host is rebooted and the task was a Docker task using a pause container, the network namespace may be recreated by the docker daemon. When we restore an allocation, use the CNI "check" command to verify that any existing network namespace matches the expected configuration. This requires CNI plugins of at least version 1.2.0 to avoid a bug in older plugin versions that would cause the check to fail. If the check fails, destroy the network namespace and try to recreate it from scratch once. If that fails in the second pass, fail the restore so that the allocation can be recreated (rather than silently having networking fail). This should fix the gap left #24650 for Docker task drivers and any other drivers with the `MustInitiateNetwork` capability. Fixes: #24292 Ref: #24650
a26e6c4
to
fd20ccc
Compare
@tgross Any ETA on Nomad 1.9.5 - bridge networking seems to be unstable when nodes restart / part of jobs are updated. IPTables does not seem to be updated, because I experience issues with containers not being able to communicate to IPs on the subnet that the host is on. When stopping a job and starting again, network communication is working again. Seeing stuff in the logs like:
Indicates that something is wrong / not in sync with the cni network setup. This is on Fedora CoreOS 41.20241215.3.0 |
Hi @ahjohannessen; v1.9.5 is expected to be released within the next 1-2 weeks. |
@jrasell Ok, thanks for heads up. Is there some CI build with this change that one can try out? |
@jrasell Think I found it, https://github.com/hashicorp/nomad/actions/runs/12679755285 |
When the Nomad client restarts and restores allocations, the network namespace for an allocation may exist but no longer be correctly configured. For example, if the host is rebooted and the task was a Docker task using a pause container, the network namespace may be recreated by the docker daemon.
When we restore an allocation, use the CNI "check" command to verify that any existing network namespace matches the expected configuration. This requires CNI plugins of at least version 1.2.0 to avoid a bug in older plugin versions that would cause the check to fail. So we check the plugin fingerprint before performing the check.
If the check fails, destroy the network namespace and try to recreate it from scratch once. If that fails in the second pass, fail the restore so that the allocation can be recreated (rather than silently having networking fail).
This should fix the gap left #24650 for Docker task drivers and any other drivers with the
MustInitiateNetwork
capability.Fixes: #24292
Ref: #24650
Ref: https://hashicorp.atlassian.net/browse/NET-11869
Testing & Reproduction steps
Run a cluster on a set of VMs, with at least one client. This can't be a server+client because we need to reboot the hosts. You should probably set the server.heartbeat_grace = "5m" to give yourself time to work.
Contributor Checklist
changelog entry using the
make cl
command.ensure regressions will be caught.
and job configuration, please update the Nomad website documentation to reflect this. Refer to
the website README for docs guidelines. Please also consider whether the
change requires notes within the upgrade guide.
Reviewer Checklist
backporting document.
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
within the public repository.