-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Systemd breaks mirrored networking #11672
Comments
View similar issuesPlease view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it! Open similar issues:
Closed similar issues:
Diagnostic information
|
It's also worth pointing out that cloud-config and snapd were disabled for those logs (to save anyone else any trouble-shooting). Enabling/disabling them doesn't seem to have any effect. |
Hi. Can you please collect networking logs by following the instructions below? |
Here's with all of networking working correctly: and with systemd preventing networking from working: I performed the same steps as before. |
Diagnostic information
|
With the new networkingMode=mirrored I had similar issues in wsl 2.1.5 and 2.2.4, hence I left it to nat. This works flawlessly. VMware Photon OS uses systemd as well and it works in wsl by configuring a rootless user same as the logged-in windows user. See https://github.com/dcasota/photonos-scripts/wiki/Photon-OS-on-WSL2, step 4. |
Bump. Anything? |
It appears this might be related to #11450 as running
and then doing a ton of connections will cause this situation, even without systemd (I just had to run a load test from wsl and needed more than 1024 connections). Combined with the "Address already in use" bug mentioned (but not called out explicitly), eventually, there is port starvation, and no connections can be made. So, basically, I suspect that systemd just causes port starvation, since enough ports are not allocated to wsl in mirrored mode. |
|
Do you have any evidence of that @NyaMisty? I saw exactly the same behavior with tools that connected to the internet as I did with systemd enabled, after running that |
TL;DR: You are right, and removing I'm sorry I made a stupid false assertion in the above reply. I'm not taking your guess because all possible issue that's causing network packet get unexpectedly dropped will cause the above issue. In additional to @withinboredom 's previous investigation, I changed systemd's startup target from multi-user.target all the way down to emergency.target (which loads none services), which that makes debug a lot easier. However, even using emergency.target is still killing the network in mirrored mode. Then I opened two terminal, one running a
and another running a watchdog
With the log I can guess things goes wrong during It turns out that Microsoft is using some black magic to implement forwarding in WSL mirrored network. It seems that it will only forward connection with specific source port, while dropping other connection silently, so when we overrided the port range, the network requests fails immediately. Removing all |
I will give this a go asap. Thanks for looking into it; and kinda obvious source in retrospect. |
Windows Version
Microsoft Windows [Version 10.0.22631.3672]
WSL Version
WSL version: 2.1.5.0
Are you using WSL 1 or WSL 2?
Kernel Version
5.15.146.1-2
Distro Version
Ubuntu 24.04
Other Software
Repro Steps
I've tried nearly everything to get mirrored networking mode working again, but for some reason it has stopped working correctly in the last week.
At first, it was similar to other reported issues where mirrored mode would work for about 10-15 minutes and then mysteriously fail. Eventually, it just stopped working altogether. At least that is what I thought. (#11369)
I am still able to ping and I see responses. However, UDP and TCP packets leave the interface, but I never see them return in WSL (though I see their responses and retransmissions in wireshark on the windows side).
I then went on an adventure to uninstall/reinstall network adapters, WSL, etc. None of these things seemed to resolve my issue. It wasn't until I stumbled upon #10842 that I got a crazy idea. My simple idea was to manually set the source port of curl and then use the
iperf
trick to see if that was a related issue.To my surprise, this worked exactly once:
curl -v google.com --local-port 12345
producing the expected output! When I ran it again, I got:curl: (45) bind failed with errno 98: Address already in use
which is weird because there is no longer any process listening on that port. Changing the source port does, in fact, cause it to work exactly once, yet again.This leads me to believe that this might be a kernel issue, or some other software doing something weird. So, I go to disable systemd ... and lo-and-behold, things work again!
I do note that specifying the source port via curl still only works exactly once and I don't see it in
ss
output, which is a bit unusual.I'm kinda stumped at the moment with what systemd might be doing, so any tips would be very much appreciated.
Note that #11143 appears to potentially be a duplicate.
Expected Behavior
Networking to work.
Actual Behavior
Networking does not work.
Diagnostic Logs
Steps followed:
wsl --shutdown
.\collect-wsl-logs.ps1
curl -v 1.1.1.1
(DNS works via tunneling but lets remove as many variables as possible)curl -v 1.1.1.1 --local-port 12345
curl -v 1.1.1.1 --local-port 12345
WSL startup with systemd:
WslLogs-2024-06-09_12-20-00 (2).zip
WSL startup without systemd:
WslLogs-2024-06-09_12-27-08 (2).zip
The text was updated successfully, but these errors were encountered: