-
Notifications
You must be signed in to change notification settings - Fork 846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[has a simply repro] TCP ephemeral ports exhausted after lots of early-closed non-blocking connections #3951
Comments
Phillip's code below so it doesn't get lost.
|
Running the Sysinternals RamMap utility after the above program has terminated provides some insight - the wsl-issue-2913-repro process still exists as a zombie - no surprise considering it owns all those sockets visible in There's actually no need to go to the extreme of port exhaustion to repro the underlying issue. In fact we don't need the above code at all, we can use anything that will call connect() and can be fed a target where there is nothing listening. Zombie wget and curl (sticking with 127.0.0.1:1234):
Can also try for the same result (it's not a localhost thing):
If there is something listening on 127.0.0.1:1234 (I used netcat) there is no zombie process:
So it seems that WSL does not correctly handle failed connects. It looks as though a reference to the socket is maintained internally which results in the owning process becoming a zombie and port exhaustion will eventually (or quickly) occur as per #2913. As mentioned in #2913 terminating all WSL processes running under the distro allows the sockets to be released and the zombies go away once WSL runs its cleanup. |
Great follow up. I was able to reproduce this here on 18865 with RamMap and your This is going to 'splain some other ill-defined reports, if WSL has been leaking like a sieve all this time nd no one noticed. Usually you don't try to connect to something that isn't there, and those NT processes don't show up in a bog-standard Resource Monitor look see. But if, say, you did an |
Fixed in Windows Insider Build 18890 |
I'm on 18362 and I'm experiencing something very similar to this issue, in particular all the WSL However, I can't see this port usage in Also, I can't see zombie processes, and the ports are released when I simply close VSCode (which then terminates all the The output of the above command looks like this:
Does anyone know, is this the same issue? Or a different, maybe related one? |
It objectively isn't the "same issue", that's for sure. Whether it is related or not is hard to tell. Giving you the straight dope: the only way to tell is going to be spinning up Insiders and see if it goes away. I checked my 18932 and bound ports aren't accumulating; but that isn't conclusive evidence of anything really. Thank-you for the report though. Maybe we'll get some me2s. |
Sadly, even if it's fixed in 18890, but Microsoft refused to bump the newer build to customer. We only got a 18362 -> 18363 update in 1909... Unbelievable. |
@NyaMisty you should go to insider build |
Easy to say, I would rather not afford the instability brought by insider build, only for such a simple fix. |
This issue is a repetition of #2913, just to remind Microsoft that #2913 already has a very simple code to reproduce.
In the three months after the reproduction code proposed by philip-searle on 18 Jan, no Microsoft employees paid attention to the old issue. So I can only make a new issue to get attention.
Description
After creating and closing (before established) a large number of non-blocking connections in WSL, all TCP ephemeral ports will be exhausted, then no new TCP connections from WSL or Win32 can be established. Closing related processes in WSL does not release these ports. All new TCP connections or listening will failed and must to restart the
LxssManager
service to recover.Reproducible Demo
The demo is from philip-searle's comment of #2913 on 18 Jan.
You can reliably reproduce this issue using the attached program (~80 lines of C): wsl-issue-2913-repro.c.txt
Output from strace looks normal to me and is attached as wsl-issue-2913-repro.strace.zip
The program performs these steps in a loop:
Environment
In Ubuntu in WSL, build and run the demo with these commands:
Expected Behavior
On a Linux VM I can run the loop several hundred thousand times and see the ports being used cycle through the entire ephemeral range multiple times.
In addition, even if the program has a bug that does not properly release the occupied port, these ports should be automatically released after the program exits.
Observed Behavior
On philip-searle's Windows laptop it loops about 16,000 times and then EINVAL is returned from connect(). At this point the symptoms described in previous comments appear: Win32 programs such as web browsers fail to connect and the output of "netstat -anoq" in a command prompt shows many connections stuck in the "BOUND" state. The only way to get network connections working again is to restart the
LxssManager
service.On YihaoPeng's PC with Windows 1809
build 17763.379
, the ports will be exhausted after 2899 rounds:No ports released after the program exits. If you let the program run repeatedly (so it will immediately take up the ephemeral port released by other programs), you will find that no TCP connections in your Windows can be established. For example, your EDGE browser will not be able to load any page.
Use the following commands to run the program repeatedly:
The text was updated successfully, but these errors were encountered: