-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generated /etc/hosts not shared between tasks #10809
Comments
Hi @kainoaseto, this does look like an unfortunate breaking change which we should document better so folks don't get caught out by it. Do you have a minimal example job that exercises the behavior and one that shows the workaround? That might help us fix up the upgrade guide here. |
Hi @tgross thanks for getting back to me on this! Sure thing, this job just runs two I've uploaded the job file here: https://github.com/kainoaseto/nomad-jobs/blob/main/network-namespace/job.hcl Unfortunately after digging in more, the workaround ended up not being a viable upgrade path for us. The goal for upgrading was to implement a forwards compatible way of setting /etc/hosts on current job files so we could avoid downtime and re-deploying all jobs. Our requirements around the upgrade:
We tried the following workarounds:
With that we've hit the extent of our Nomad skills to solve this and provide a non-downtime upgrade path from 1.1.1->1.1.2. If you have any thoughts we could try it would be greatly appreciated! Otherwise it seems we might be stuck on 1.1.1 |
Thanks @kainoaseto I think I have a better understanding of the problem now. What we should have done when we created the The patch in #10823 fixes the problem you've described here. On a build with that patch, I ran the job you provided and modified the
Something I wanted to check on before getting that PR reviewed and merged was this:
Can you describe what you're seeing here? I want to make sure there's not a second bug. |
Oh, you're probably seeing the error message |
Thank you @tgross for making that patch so quickly! That seems like it'll take care of our problems and yes that's the exact error message I was getting. No problem, just prevented recourse to work around this but for a 1.1.1 -> 1.1.2 upgrade. |
#10823 fixes this and will ship in the upcoming Nomad 1.1.3 |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v1.1.2 (60638a0)
Operating system and Environment details
AWS Linux 2
Docker version:
Client:
Version: 19.03.6-ce
API version: 1.40
Go version: go1.13.4
Git commit: 369ce74
Built: Fri May 29 04:01:26 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6-ce
API version: 1.40 (minimum version 1.12)
Go version: go1.13.4
Git commit: 369ce74
Built: Fri May 29 04:01:57 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.2
GitCommit: ff48f57fc83a8c44cf4ad5d672424a98ba37ded6
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
Issue
In the latest Nomad release v1.1.2 the changelog mentions the improvement from GH-10766:
"docker: Tasks using network.mode = "bridge" that don't set their network_mode will receive a /etc/hosts file that includes the pause container's hostname and any extra_hosts."
I found that this is actually a breaking change from previous behavior in Nomad v1.1.1 when the
network_mode
is not set.Reproduction steps
The scenario that I found this to be an issue in is where:
network_mode
is defined for two tasks in a Task Group using a network namespace/etc/hosts
file is updated at runtime from the docker entrypoint/etc/hosts
file to resolve service-to-service mesh routingIn Nomad version <= v1.1.1 the
/etc/hosts
file is updated in both containers but with this new change in v1.1.2 the hosts file is overwritten in Task B (or is not shared anymore?) which breaks out current network setup for routing between containers.Workaround
To fix this, we are removing the manual updating of
/etc/hosts
from Task A and instead using theextra_hosts
feature with the docker driver so both tasks are updated (this is the way we should have done it from the get go but this was a forcing function). We're currently making this change to restore networking in our environment but it came to us as a surprise when we updated and all new deployments could no longer talk to other services.Expected Result
The CHANGELOG.md reflects the breaking change or a change is implemented that adjusts this behavior to continue shared R/W access of
/etc/hosts
between the pause, task A, and task B containerActual Result
surprised_pikachu.jpg upon updating to the latest Nomad when the
/etc/hosts
no longer appears to be shared between network namespaced containersThe text was updated successfully, but these errors were encountered: