Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark threads for suspend and continue exitting. #59

Closed
wants to merge 2 commits into from

Conversation

ivyl
Copy link

@ivyl ivyl commented Mar 26, 2024

If one of the threads is in a native call it won't be able to cooperatively suspend itself.

This may happen e.g. for services using WinSW wrapper which end up StartServiceCtrlDispatcherW() via a call to
System.ServiceProcess.ServiceBase.Run(). The wrapper may tray to call System.Environment.Exit() if the wrapped process exits with a non-zero exit code.

@madewokherd
Copy link
Owner

I'm not sure whether it's safe to move on from this point. If it is safe, then we should just move on after 1 attempt.

The point of co-op suspend was supposed to be that we can just ignore threads that aren't in managed code. It should be possible to just set some state that tells the thread to suspend if it ever makes its way to a managed code entry point, and then move on. Similarly, they should be able to prevent any new managed threads from being created, or existing unmanaged threads from running managed code. This whole retry loop seems ill-advised to me.

@ivyl ivyl force-pushed the exit-native-fix branch 2 times, most recently from 705d139 to d6943ae Compare March 28, 2024 10:15
@ivyl ivyl changed the title Limit number of thread suspend attempts on exit. Mark threads for suspend and continue exitting. Mar 28, 2024
@ivyl
Copy link
Author

ivyl commented Mar 28, 2024

I've read more of the shutdown code, as well as docs/threading and followed the history of changes to mono_thread_suspend_all_other_threads(). Looks like it was reworked heavily but never completely cleaned up.

The biggest thing was addition of the shutting_down "barrier". This guarantees no new threads will be started so we don't need to keep on looping over threads.

I've also fixed up the version I've force-pushed yesterday. I've corrected some typos and removed duplicated early-return conditions in suspend_all_other_threads()

@madewokherd
Copy link
Owner

Would you expect this to work OK on non-Windows platforms? I'm considering committing it to the main branch on Winehq's fork, then merging it into wine-mono.

@ivyl
Copy link
Author

ivyl commented Mar 29, 2024

Yes. I see no reason why it would become problematic on other platforms.

@madewokherd
Copy link
Owner

madewokherd commented Mar 29, 2024

I tried rebasing this onto https://gitlab.winehq.org/wine-mono/mono/-/tree/main?ref_type=heads and testing a Linux build, and I'm getting some test failures when running make -C mono/tests test-jit that I think may be new.

Failed test(s):

finalizer-exit.exe
=============== finalizer-exit.exe.stdout ===============
Set ID: foo
finalizer thread ID: foo
ghashtable.c:334: assertion 'hash != NULL' failed

* Assertion at threads.c:637, condition `offset' not met


=============== EOF ===============
=============== finalizer-exit.exe.stderr ===============

=============== EOF ===============

@ivyl ivyl force-pushed the exit-native-fix branch from d6943ae to d4183d3 Compare April 2, 2024 15:30
mono_threads_unlock ();
mono_thread_info_sleep (50, NULL);
timeout -= mono_msec_ticks() - start_time;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but I think this counts the time quadratically? You're subtracting time since the start each iteration. I think it'd make more sense to store the end time and compare to that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I forgot to stage a line that was reassigning start_time = mono_msec_ticks ();. I'll push that in a moment.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit nit-picky, but I feel like this wait loop is way more complicated than it needs to be. I'm hung up on there being two calls to mono_msec_ticks in the loop, which shouldn't make a practical difference but in theory there could be time passing between them. You could just assign start_time once and check mono_msec_ticks() - start_time < 1000.

Copy link
Owner

@madewokherd madewokherd Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if I'm reading this right, we always wait 50 ms, even if still_suspending was set to FALSE on this iteration. The final iteration will wait, but it won't check whether the threads finished suspending.

I think I would add a helper function for checking for suspended threads and write the loop something like this:

start_time = mono_msec_ticks ();
while ((still_suspending = is_still_suspending ()) && mono_msec_ticks () - start_time < 1000)
{
    mono_thread_info_sleep (50, NULL);
}

if (still_suspending) emit warning and skip shutdown

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've based the original logic on some other timeouts implemented in the same file. I've changed the implementation to match your suggestion.

@madewokherd
Copy link
Owner

If it's really impossible to clean up safely, then I think we should skip cleanup in that case.

@ivyl ivyl force-pushed the exit-native-fix branch from d4183d3 to ae21ec5 Compare April 2, 2024 19:53
ivyl added 2 commits April 8, 2024 20:38
New threads cannot be created due to the `shutting_down` barrier and any
existing thread will get suspended as soon as they try to continue their
managed execution. We have to wait for them to reach suspended state as
otherwise they will race against cleanup done by
mono_runtime_quit_internal() and may reference already tore down data
structure.

Sadly, if one of the threads is in a native call it won't be able to
enter suspend at all.

This may happen e.g. for services using WinSW wrapper which end up
StartServiceCtrlDispatcherW() via a call to
System.ServiceProcess.ServiceBase.Run(). The wrapper may tray to call
System.Environment.Exit() if the wrapped process exits with a non-zero
exit code.

We should wait some sensible amount of time before giving up and trying
continuing the shutdown anyway.

A lot of the old code was there from an older implementation that was
calling `wait_for_tids()`. It's been reworked multiple times but  was
never fully cleaned up. Turns there's no need to open the handles and
have to care about the MONO_W32HANDLE_MAXIMUM_WAIT_OBJECTS limit any
more.
@ivyl ivyl force-pushed the exit-native-fix branch from ae21ec5 to a73208a Compare April 8, 2024 17:38
@madewokherd
Copy link
Owner

Thanks, I'll merge once I confirm the test results for this revision.

@madewokherd
Copy link
Owner

@madewokherd
Copy link
Owner

And the hack in Wine Mono: 3aeb8c1

@madewokherd madewokherd closed this Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants