-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simultaneous pwsh instance start Mutex.CreateMutexCore IOException race condition #2658
Comments
(Issue happened again, after upgrading macOS to 11.5. Had 4 terminal windows that started pwsh simultaneously, and two threw the same way I tried to reproduce the failure by spawning several
but even though a few pairs of processes started within the same millisecond, I don't get any |
It is dup #1464 (comment) |
@iSazonov sorry I don't understand why you think it is a duplicate. Here's why I said it is not a duplicate:
|
My bad. Perhaps it is "by design" in .Net. https://github.com/dotnet/runtime/blob/d019e70d2b7c2f7cd1137fac084dbcdc3d2e05f5/src/libraries/System.Private.CoreLib/src/System/Threading/WaitSubsystem.Unix.cs#L169-L174 I believe it makes sense to open new issue in .Net Runtime repo and ask there. |
Oh, good find. Do you have a sense of whether dotnet runtime might fix that bug? Agree it seems likely "by design," but I could open the same bug there. (I was having some trouble understanding which resource the mutex is guarding. Before I was thinking the resource is i.e. the file I see the other code interacting with the mutex has retry loops. If we understand mutex creation has by-design some chance of throwing, should PSReadLine also use a retry loop when creating the mutex? |
As for PSRL @daxian-dbw can add more info and make a conclusion about using a loop in the code. |
@darthwalsh and @iSazonov Thank you both for the investigation so far. I will need to get a macOS to try reproducing the issue.
I'd love to understand the root cause before making any changes in the code. |
This issue is quite annoying for dev testing powershell builds on macOS. Workaround is to use It seems the problem is the way a mutex is implemented is different across Windows, Linux, and macOS. On macOS it's a file lock: dotnet/runtime#5211 (comment). So if access to that file can't be retrieved, you get the IOException. |
I think an alternative solution is if the mutex can't be created, just have that instance not save to history and put out a warning message. |
I have a potential fix, but no longer able to repro the issue. Probably because it's a race condition and my changes have changed the timing, but until I can verify it improves it, I'll hold off on a PR |
@SteveL-MSFT Printing a warning message that this powershell instance will not save history isn't much of an improvement (in my opinion) -- if I saw that, I would always manually close and reopen that pwsh instance -- which is what I did after seeing the IOException. But yeah, being able to reproduce the problem on demand is something I wish was easier. What the steps seem to be:
(I retried my earlier "tight loop" but with a C# app just calling |
@darthwalsh PSReadLine currently already has some retry logic. So if that fails, I'm not sure what we can do other than increasing the number of retries, but that's not guaranteed to succeed. Currently if the file lock cannot be obtained, PSReadLine just repeatedly informs you and is not usable. I thought I was more easily repro'ing it with a nested process, but it no longer repros for me. I don't believe I've seen the issue with opening a extra tab in iTerm2 either. If we can figure out a consistent repro, the proper fix would need to be in .NET. |
@SteveL-MSFT I agree a proper fix in .NET seems much better, but to be sure I want to know what happens in a retry loop. Something like: if retrying after 1 ms brings the failure chance from 1% to 0.01%, then to me it makes sense to put the retry in daxian-dbw pointed out the existing PSReadLine retry logic is for abandoned mutex, not for |
@darthwalsh that is true, perhaps we could try adding to the retry logic with an |
After I upgraded macOS to 12.6, only one out of 6 of my iTerm
I think the best way to get a consistent repro would be to snapshot an older macOS in a VM, upgrade the guest OS, with iTerm reopening dozens of pwsh tabs. But I'm not familiar the tools to do this. |
Ok, I ran another experiment with the upgrade to macOS 12.6.1 and I was able to repro this failure without pwsh or PSReadLine; if we ask the owner of dotnet BCL what the Muetx exception policy should be I hope this added context would help. Repo try {
using (var m = new Mutex(false, "darthwalsh_PSReadLine_issues_2658")) {
Console.Write("created but didn't take Mutex darthwalsh_PSReadLine_issues_2658 ...");
var line = Console.ReadLine();
Console.WriteLine("QUITTING: " + line);
}
} catch (Exception e) {
Console.WriteLine(e.ToString());
var line = Console.ReadLine();
Console.WriteLine("QUITTING: " + line);
} Built with I created a new iTerm profile that just runs command
(I expected about a quarter, so something about this minimal program doesn't match what pwsh is doing). I'll open some more tabs of this, and leave it running for a few more OS reboots to see if I can repro it without OS upgrade. |
@darthwalsh did you already open an issue in https://github.com/dotnet/runtime? |
@SteveL-MSFT thanks for the nudge, I had been putting off rebooting but I just updated parallels today... Created dotnet/runtime#79375 |
I was going to raise a new bug but I think it's the same issue. In which case I have an easier way to reproduce it (on linux). Log on to system as a user Example given below: [fastdruid@localhost ~]$ sudo su - root EnvironmentPSReadLine: 2.2.6 Last 0 Keys: ExceptionSystem.IO.IOException: The system cannot open the device or file specified. : 'PSReadLineHistoryFile_1202693972' |
@Fastdruid that looks like it's a different issue: #3692 |
Environment
Exception report
iTerm session 3:
iTerm session 4:
Steps to reproduce
(Steps that shouldn't matter: My macbook was asleep in clamshell mode, I unplugged the display connections, and the OS crashed. I logged in and submitted the bug report to Apple, again.)
I saw this message on two tabs when iTerm2 was trying to restore a terminal window with four tab. The first two tabs did not have any error.
iTerm session restore caused 4 pwsh instances to launch at the same time, and got these two crashes from different pwsh instances.
Expected behavior
Launching multiple pwsh instances at the exact same time should not show error.
Actual behavior
Tabs 3 and 4 each showed the exception message above.
Running some debug commands, there are a lot of
PSReadLineHistoryFile_2886463743
files created at nearly the exact same millisecond:My 2 cents:
System.Threading.Mutex
throw occasionally if the mutex name is contended.IOException
can happen when some other error happens.The text was updated successfully, but these errors were encountered: