-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exits with "unable to detect hairpin mode (is the docker daemon running?)" #67
Comments
Hi @Rycieos, thanks for the extensive report! Could you try a few things for me?
|
I am running with Trying I forgot to mention, I was running v0.4.1 I think before debugging, and updated to see if it resolved my issue. I just tested versions v0.4.1, v0.4.2, and v0.4.3 with both |
Just to be sure: in that case you're running the ipv6nat container with And before the system upgrade, it was all working fine? v0.4.1 and v0.4.2 have been around for quite a while already, but haven't seen an issue like this before. And if the issue started only after the system upgrade, that makes it only stranger, and doesn't seem related to go-iptables. I've also upgraded to newer versions of Alpine and Go for the v0.4.3 container, but this would be unrelated as well, since you have the same issue with v0.4.1 and v0.4.2. The iptables upgrade of your systems goes from 1.8.4-15.el8.x86_64 to 1.8.4-15.el8_3.3.x86_64, but can't find what the changes are between those. Also, you mentioned doing the check yourself works just fine.. A few more things to try:
|
Good idea! Sanity check from host:
Container:
Well that isn't good. Same thing if I do Any ideas? I'll keep digging. |
From within the container (docker run, again with --privileged --network host), could you run these 3 commands: ls -l `which iptables`
xtables-legacy-multi iptables-save -t nat
xtables-nft-multi iptables-save -t nat |
Sorry, forgot: Before running the |
That is identical to what I see on the host. Does that mean the iptables update I got switched from the legacy backend to a newer backend?
Correct. |
Oops, I missed this before:
That happens with |
Looks like that fixes it to point to the right one. I'm assuming the standard entry point must not be doing that, or it should be working for me. |
It looks like your system has indeed switched backend, but Could you try the previous ls -l command again, but this time after you run Also: do you normally run the ipv6nat container with the entrypoint set? Or does it use the default entrypoint? |
Default. Here is my full docker-compose file: version: '2.3'
services:
ipv6nat:
image: robbertkl/ipv6nat:0.4.3
restart: always
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
network_mode: host
cap_drop:
- ALL
cap_add:
- NET_RAW
- NET_ADMIN
- SYS_MODULE |
Hmm, so it does switch over to xtables-nft-multi (docker-ipv6nat-compat is actually the standard entrypoint), but still doesn't work? That's very strange. |
1 thing that looks off in your output is that:
|
I think I got it! On host:
In container (I hacked the entrypoint to not exit after it crashes):
It shows the address as |
Wow, that should definitely explain it! Any idea why that would happen and why only after a system upgrade? Very strange how iptables-save (both version 1.8.4) show different things on the host and in container for the same rule! |
Actually, I should stop truncating output. It shows that for every single rule!
turns into
Same thing is done on the Agreed, very strange. I'll try digging into why it warns about legacy tables, maybe there is a RedHat bug somewhere about this that could provide some clues. |
Not sure, but perhaps the legacy tables were created by the iptables commands within the container before running The counters are still a bit strange, however. Is Docker's NAT working OK for IPv4 after the upgrade? Can you reach the published ports? |
Yeah, I'll try a reboot.
Yup, IPv4 works just fine. The counters look normal from the host, it's just the chain global policy counters that have 0s:
|
The second reboot did clear up the warnings of legacy tables existing. But it did not fix the problem. Iptables in the container still shows |
But only until |
Yeah, that's what I thought created the legacy rules. But if you're exec'ing into the container, the container would have already executed the |
Yeah, I just did that in a manually started container after rebooting without thinking. I'm still stumped as to how a iptables patch update could cause this. Especially since the version that is showing wrong output is in the container and wasn't changed. |
It could still be that the update switched the backend from legacy to nft and something is not working properly in the "translation" to iptables output, which iptables-nft does. Can't confirm on my current system, as it's using legacy; will have to set up a new machine to test it. |
Could you try installing iptables 1.8.6 from Alpine "edge" within the container:
|
No dice:
I was just able to track down the rpm files of the previous version of iptables I had installed. I'm going to try a manual roll back and see if it fixes my problem. |
Great, let me know. Still blows my mind how it can affect only the result within the container. Also wondering how iptables on the host could have an effect in the first place: I don't think it's even used? Your system uses nftables and I think it's only iptables-nft in the container that talks directly to nftables. I don't think it's even using the iptables installed on the host. |
Well that fixed it. 🤕 Feel free to close this issue if you want, since it seems to be a problem with an external package. Though if you think it is a compatibility issue, I would be happy to help you continue to debug it (though not on my home prod system). Full fix detailed, in case anyone else has the exact same stuck package scenario:
Thanks for all your help tracking down what was causing the problem!
You are right I think. I guess somehow the new package version has some bug that interacts with the kernel incorrectly, and then saves (and then later prints) the rules incorrectly?? Yeah, doesn't make sense to me either. I wouldn't even know how to go about reporting this as a bug to the package maintainer. I guess I would need to prove that the rule actually got saved wrong somehow.
Makes me wonder if I could have mounted the host |
Wow, that was quite the journey. Great you figured it out! And thanks for the detailed fix. Let's leave it at this. I'll keep an eye out for more reports of this issue.
Yeah, that's usually a no-go, for that exact reason. |
I can confirm that this error occurs with a fresh install of CentOS. |
Thanks @thedejavunl, best to keep the issue open then. |
I spoke to Phil Sutter from RedHat, who did both the upstream patch as well as its backport into RHEL8.3. The commit in question is here. To quote Phil:
About the issue we're seeing in the Docker container:
So aside from the workaround (downgrading as detailed here) I guess the only solution would be to either wait for 1.8.7 (and its Alpine edge packages) or build a patched version and ship that in the container image. |
Wow @robbertkl, fantastic detective work! To so to be 100% clear, this "backport" in the RedHat packages happened between versions 1.8.4-15.el8 and 1.8.4-15.el8_3.3? And this change is expected to land in iptables v1.8.7? As running two different versions of iptables against the same kernel probably wasn't ever intended, I can understand why this could happen. As for my own environment, I will just freeze my iptables version to 1.8.4-15.el8 until v1.8.7 is released and updated here. |
Correct, see the changelog here: https://centos.pkgs.org/8/centos-baseos-x86_64/iptables-services-1.8.4-15.el8_3.3.x86_64.rpm.html |
Thank you very much for the explanation @robbertkl. 🥇 |
Hey, does anybody know if there is an redhat bugtracker record to this? |
If this is the issue, couldn't it be fixed by upgrading the ipv6net Docker container to use a newer version of iptables? Maybe as an opt-in (eg. a new tag) |
The new IPTables package is included in Alpine Edge. It is not intended for production usage. The IPTables upgrade doesn't have any security fixes. Only bug fixes so if the firewall is working correctly the need to update the RPM packages is zero. |
That seems to be the plan, once iptables is updated in Alpine. We all seem to agree that we should wait until it is "stable" before doing that. |
Hi all, With Docker 20.10.6 the ipv6nat function is fully intergrated (experimental). |
Heya, as we still run into this issue, i did some research and also spoke to Phil about it a bit to understand it. The fix/backport implemented a more optimized way to store rules in the kernel. Now the issue is the following: if the host supports this type of storing rules, but the container iptables doesn't, the output is messed up. It looks like so:
So the rule is differently displayed, but the rule is "correct" (as inside the kernel). As this is not easy to solve, as the version outside and inside the container must be "the same", may i suggest the following: Line 71 in 4cd961e
If the check would be changed to accept both versions, /8 and /32, the problem should be "gone". Anything i missed? Would it be worth a try? I'm not a go-coder, so i have no clue how to do it myself, but i expect it to be Cheers, |
A backport of "nft: Optimize class-based IP prefix matches" from newer iptables versions broke the hairpin mode detection of ipv6nat. This is caused by newer versions on the host create optimized ipt rules, which will be interpreted different by older versions of iptables. This can be spotted when you dump the rules on the host and compare it to the rules dumped inside the ipv6nat container. The outside rule contains the correct subnet for the detection, 127.0.0.0/8 while inside it is displayed as 127.0.0.0/32 which causes the detection (code in manager.go) to fail. As this is only a display issue (the rule is correct), accepting both versions should be fine to get around this issue. Big thanks to Phil Sutter who provided me the code to implement my idea to cover old and new versions to be matched, as it is very hard to ensure the same iptables version to be used inside and outside the container. A test build is available on docker hub at geektoor/ipv6nat-devel. Closes: robbertkl#67 Cc: Phil Sutter <[email protected]> Signed-off-by: Sven Michels <[email protected]>
Just pushed out a new release v0.4.4 which contains the fix for this issue! Docker images for all architectures are on Docker Hub as |
@robbertkl I still get "unable to detect hairpin mode (is the docker daemon running?)" with 0.4.4 on synology DSM 6.2.4-25556 with synology-current docker version 20.10.3-0554 . I use Option B from the README.md. IPv6 works in general on the system |
Version: v0.4.3 docker.
Docker version: 20.10.1 and 20.10.2
OS: CentOS Linux release 8.3.2011 (Core)
After a system update, upon launching I get this error:
After which the container exits and restarts.
Thinking it might be a permissions issue, I removed all
--cap-add
s, leaving only the--cap-drop ALL
to test, but that broke it more:I then tried to give it
--cap-add ALL
, but that did not fix it.Since part of the system update was
docker-ce
, I thought maybe it had changed the backend rules, but:Clearly the right rule still exists. And checking manually:
The actual checking commands returns correctly as expected. I am using this code section as the reference: https://github.com/robbertkl/docker-ipv6nat/blob/v0.4.3/manager.go#L79-L86
At this point I downgraded dockerd back to 20.10.1, but I got the same error.
What is strange is that when I first did the system upgrade, dockerd restarted itself as usual, and all my containers came back online with IPv6 working. It was after an OS restart that this error started.
I tried to do a system rollback, but the old package versions couldn't be found, so I'm stuck.
Full package list that I upgraded:
Seems like coreos/go-iptables/issues/79 could be related.
The text was updated successfully, but these errors were encountered: