Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MacOS] Fix deadlock on maximizing window from event callback #2636

Merged

Conversation

Wumpf
Copy link
Contributor

@Wumpf Wumpf commented Jan 17, 2023

This sadly undos #1901 again - cc: @xiaopengli89

The problem is that handle_redraw has no threading/re-entrance protection whatsoever - when calling set_maximized from within an event it will deadlock (which is ofc much worse than what #1901 tried to address)!

I ran into this when testing this egui PR emilk/egui#2292


  • Tested on all platforms changed
  • Added an entry to CHANGELOG.md if knowledge of this change could be valuable to users
  • Updated documentation to reflect any user-facing changes, including notes of platform-specific behavior
  • Created or updated an example program if it would help users understand this functionality
  • Updated feature matrix, if new features were added or implemented

@Wumpf Wumpf requested a review from madsmtm as a code owner January 17, 2023 09:39
@Wumpf Wumpf mentioned this pull request Jan 17, 2023
11 tasks
@Wumpf
Copy link
Contributor Author

Wumpf commented Jan 17, 2023

potentially reopens #1837

@kchibisov
Copy link
Member

Could you elaborate where deadlock happens? Is it some winit structure preventing it or it's deeper in macOS libs?

It just if it happens from the event callback it would mean that other calls are also affected by that? Maybe we can simply copy some state around to release the locks?

@Wumpf
Copy link
Contributor Author

Wumpf commented Jan 17, 2023

What is happening is that handle_nonuser_event eventually bubbles up to user code. That user code then calls set_maximized, calling set_maximized_sync which is already on the main thread so immediately executes.
So far so good, but then the call to window.zoom(None); immediately causes a redraw which hits the code path I removed - it tries to again call handle_nonuser_event and since the lock is non-reentrant gets stuck there.

Here's a full callstack for the stuck situation from the linked egui pr, patched with latest winit:

__psynch_mutexwait (@__psynch_mutexwait:5)
_pthread_mutex_firstfit_lock_wait (@_pthread_mutex_firstfit_lock_wait:24)
_pthread_mutex_firstfit_lock_slow (@_pthread_mutex_firstfit_lock_slow:65)
std::sys::unix::locks::pthread_mutex::Mutex::lock (@winit::platform_impl::platform::app_state::Handler::handle_nonuser_event:32)
std::sys_common::mutex::MovableMutex::raw_lock (@winit::platform_impl::platform::app_state::Handler::handle_nonuser_event:31)
std::sync::mutex::Mutex<T>::lock (@winit::platform_impl::platform::app_state::Handler::handle_nonuser_event:16)
winit::platform_impl::platform::app_state::Handler::handle_nonuser_event (/Users/andreas/dev/winit/src/platform_impl/macos/app_state.rs:196)
winit::platform_impl::platform::app_state::AppState::handle_redraw (/Users/andreas/dev/winit/src/platform_impl/macos/app_state.rs:355)
winit::platform_impl::platform::view::WinitView::draw_rect (/Users/andreas/dev/winit/src/platform_impl/macos/view.rs:237)
-[_NSOpenGLViewBackingLayer display] (@-[_NSOpenGLViewBackingLayer display]:182)
CA::Layer::display_if_needed(CA::Transaction*) (@CA::Layer::display_if_needed(CA::Transaction*):200)
CA::Context::commit_transaction(CA::Transaction*, double, double*) (@CA::Context::commit_transaction(CA::Transaction*, double, double*):117)
CA::Transaction::commit() (@CA::Transaction::commit():166)
-[NSMoveHelper _fireDisplayLink:] (@-[NSMoveHelper _fireDisplayLink:]:60)
-[NSScreenDisplayLink _fire] (@-[NSScreenDisplayLink _fire]:42)
___NSRunLoopTimerCreateWithHandler_block_invoke (@___NSRunLoopTimerCreateWithHandler_block_invoke:17)
__CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ (@__CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__:11)
__CFRunLoopDoTimer (@__CFRunLoopDoTimer:238)
__CFRunLoopDoTimers (@__CFRunLoopDoTimers:92)
__CFRunLoopRun (@__CFRunLoopRun:477)
CFRunLoopRunSpecific (@CFRunLoopRunSpecific:156)
-[NSMoveHelper _doAnimation] (@-[NSMoveHelper _doAnimation]:140)
-[NSResizeMoveHelper animateResizeToFrame:] (@-[NSResizeMoveHelper animateResizeToFrame:]:77)
-[NSWindow setFrame:display:animate:] (@-[NSWindow setFrame:display:animate:]:144)
-[NSWindow _zoomToScreen:isMoveToiPad:] (@-[NSWindow _zoomToScreen:isMoveToiPad:]:73)
<(A,) as objc2::message::MessageArguments>::__invoke (/Users/andreas/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/objc2-0.3.0-beta.3/src/message/mod.rs:390)
objc2::message::platform::send_unverified::{{closure}} (/Users/andreas/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/objc2-0.3.0-beta.3/src/message/apple/mod.rs:36)
objc2::message::conditional_try (/Users/andreas/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/objc2-0.3.0-beta.3/src/message/mod.rs:30)
objc2::message::platform::send_unverified (/Users/andreas/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/objc2-0.3.0-beta.3/src/message/apple/mod.rs:36)
objc2::message::MessageReceiver::send_message (/Users/andreas/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/objc2-0.3.0-beta.3/src/message/mod.rs:195)
winit::platform_impl::platform::appkit::window::NSWindow::zoom (/Users/andreas/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/objc2-0.3.0-beta.3/src/macros/__rewrite_self_arg.rs:42)
winit::platform_impl::platform::util::async::set_maximized_sync::{{closure}} (/Users/andreas/dev/winit/src/platform_impl/macos/util/async.rs:151)
winit::platform_impl::platform::util::async::run_on_main (/Users/andreas/dev/winit/src/platform_impl/macos/util/async.rs:33)
winit::platform_impl::platform::util::async::set_maximized_sync (/Users/andreas/dev/winit/src/platform_impl/macos/util/async.rs:131)
winit::platform_impl::platform::window::WinitWindow::set_maximized (/Users/andreas/dev/winit/src/platform_impl/macos/window.rs:873)
winit::window::Window::set_maximized (/Users/andreas/dev/winit/src/window.rs:851)
eframe::native::epi_integration::handle_app_output (/Users/andreas/dev/egui/crates/eframe/src/native/epi_integration.rs:205)
eframe::native::epi_integration::EpiIntegration::update (/Users/andreas/dev/egui/crates/eframe/src/native/epi_integration.rs:346)
<eframe::native::run::glow_integration::GlowWinitApp as eframe::native::run::WinitApp>::paint (/Users/andreas/dev/egui/crates/eframe/src/native/run.rs:493)
eframe::native::run::run_and_return::{{closure}} (/Users/andreas/dev/egui/crates/eframe/src/native/run.rs:110)
<winit::platform_impl::platform::app_state::EventLoopHandler<T> as winit::platform_impl::platform::app_state::EventHandler>::handle_nonuser_event::{{closure}} (/Users/andreas/dev/winit/src/platform_impl/macos/app_state.rs:96)
winit::platform_impl::platform::app_state::EventLoopHandler<T>::with_callback (/Users/andreas/dev/winit/src/platform_impl/macos/app_state.rs:70)
<winit::platform_impl::platform::app_state::EventLoopHandler<T> as winit::platform_impl::platform::app_state::EventHandler>::handle_nonuser_event (/Users/andreas/dev/winit/src/platform_impl/macos/app_state.rs:91)
winit::platform_impl::platform::app_state::Handler::handle_nonuser_event (/Users/andreas/dev/winit/src/platform_impl/macos/app_state.rs:199)
winit::platform_impl::platform::app_state::AppState::cleared (/Users/andreas/dev/winit/src/platform_impl/macos/app_state.rs:389)
winit::platform_impl::platform::observer::control_flow_end_handler::{{closure}} (/Users/andreas/dev/winit/src/platform_impl/macos/observer.rs:79)
winit::platform_impl::platform::observer::control_flow_handler::{{closure}} (/Users/andreas/dev/winit/src/platform_impl/macos/observer.rs:41)
std::panicking::try::do_call (@std::panicking::try:8)
std::panicking::try (@std::panicking::try:8)
std::panic::catch_unwind (@winit::platform_impl::platform::observer::control_flow_handler:25)
winit::platform_impl::platform::event_loop::stop_app_on_panic (/Users/andreas/dev/winit/src/platform_impl/macos/event_loop.rs:245)
winit::platform_impl::platform::observer::control_flow_handler (/Users/andreas/dev/winit/src/platform_impl/macos/observer.rs:39)
winit::platform_impl::platform::observer::control_flow_end_handler (/Users/andreas/dev/winit/src/platform_impl/macos/observer.rs:74)
__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ (@__CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__:12)
__CFRunLoopDoObservers (@__CFRunLoopDoObservers:136)
__CFRunLoopRun (@__CFRunLoopRun:265)
CFRunLoopRunSpecific (@CFRunLoopRunSpecific:156)
RunCurrentEventLoopInMode (@RunCurrentEventLoopInMode:76)
ReceiveNextEventCommon (@ReceiveNextEventCommon:171)
_BlockUntilNextEventMatchingListInModeWithFilter (@_BlockUntilNextEventMatchingListInModeWithFilter:21)
_DPSNextEvent (@_DPSNextEvent:161)
-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] (@-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]:185)
-[NSApplication run] (@-[NSApplication run]:119)
winit::platform_impl::platform::event_loop::EventLoop<T>::run_return::{{closure}} (/Users/andreas/dev/winit/src/platform_impl/macos/event_loop.rs:220)
objc2::rc::autorelease::autoreleasepool (/Users/andreas/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/objc2-0.3.0-beta.3/src/rc/autorelease.rs:313)
winit::platform_impl::platform::event_loop::EventLoop<T>::run_return (/Users/andreas/dev/winit/src/platform_impl/macos/event_loop.rs:211)
<winit::event_loop::EventLoop<T> as winit::platform::run_return::EventLoopExtRunReturn>::run_return (/Users/andreas/dev/winit/src/platform/run_return.rs:51)
eframe::native::run::run_and_return (/Users/andreas/dev/egui/crates/eframe/src/native/run.rs:94)
eframe::native::run::glow_integration::run_glow::{{closure}} (/Users/andreas/dev/egui/crates/eframe/src/native/run.rs:665)
eframe::native::run::with_event_loop::{{closure}} (/Users/andreas/dev/egui/crates/eframe/src/native/run.rs:83)
std::thread::local::LocalKey<T>::try_with (@std::thread::local::LocalKey<T>::try_with:64)
std::thread::local::LocalKey<T>::with (@std::thread::local::LocalKey<T>::with:16)
eframe::native::run::with_event_loop (/Users/andreas/dev/egui/crates/eframe/src/native/run.rs:76)
eframe::native::run::glow_integration::run_glow (/Users/andreas/dev/egui/crates/eframe/src/native/run.rs:658)
eframe::run_native (/Users/andreas/dev/egui/crates/eframe/src/lib.rs:175)
custom_window_frame::main (/Users/andreas/dev/egui/examples/custom_window_frame/src/main.rs:16)
core::ops::function::FnOnce::call_once (@core::ops::function::FnOnce::call_once:8)
std::sys_common::backtrace::__rust_begin_short_backtrace (@std::sys_common::backtrace::__rust_begin_short_backtrace:9)
std::rt::lang_start::{{closure}} (@std::rt::lang_start::{{closure}}:10)
core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once (@std::rt::lang_start_internal:179)
std::panicking::try::do_call (@std::rt::lang_start_internal:176)
std::panicking::try (@std::rt::lang_start_internal:176)
std::panic::catch_unwind (@std::rt::lang_start_internal:176)
std::rt::lang_start_internal::{{closure}} (@std::rt::lang_start_internal:176)
std::panicking::try::do_call (@std::rt::lang_start_internal:176)
std::panicking::try (@std::rt::lang_start_internal:176)
std::panic::catch_unwind (@std::rt::lang_start_internal:176)
std::rt::lang_start_internal (@std::rt::lang_start_internal:176)
std::rt::lang_start (@std::rt::lang_start:20)
main (@main:11)
start (@start:639)

@Wumpf
Copy link
Contributor Author

Wumpf commented Jan 17, 2023

so maybe we could also fix this by skipping over in handle_redraw like so

   pub fn handle_redraw(window_id: WindowId) {
        if !HANDLER.get_in_callback() {
            HANDLER.handle_nonuser_event(EventWrapper::StaticEvent(Event::RedrawRequested(window_id)));
        }
   }

but it didn't quite check out for me why queue redraw isn't the more correct thing here?
What do you think?

EDIT: Just checked, this would be a sufficient fix if you prefer that :)

@Wumpf
Copy link
Contributor Author

Wumpf commented Jan 17, 2023

another venue would be pushing window.zoom(None); further down into the Queue but I didn't figure out how/if I can copy the necessary state for that

@kchibisov
Copy link
Member

The more I read into the code, the less it makes sense to me.

Winit has a clear policy when RedrawRequested should be sent, yet what I see on macOS is that we have 2 sources of that event being sent, and in one case I don't see it actually be after the MainEventsCleared.

So it means that RedrawRequested on macOS could come out of order? For some clients it might not be an issue, but some rely on direct event delivery. So what it should do here is just set a flag for the winit so it'll be dispatch like it's doing so in cleared?

@madsmtm could you elaborate why macOS has things that way? That just looks wrong compared to 1. our docs. 2. other backends.

So at first I'd like to figure out, why the code is written the way it is, so we can continue with more clear solution other that hot-fixing.

@kchibisov
Copy link
Member

To clarify. The 2 sources I was talking about is

  1. WinitView::draw_rect
  2. AppState::cleared

The 2-nd looks like a canonical and complies with the winit's docs. The 1-st is what looks out of place to me.

@madsmtm
Copy link
Member

madsmtm commented Jan 18, 2023

Some previous discussion on this:

I agree that we should just be able to set a flag, and then handle it after MainEventsCleared, but unfortunately we currently need to handle the event inside drawRect:, lest we see those flickering issues that #1901 tried to solve.

I think I prefer the approach described in #2636 (comment) for now, which fixes the immediate issue as well as keeping roughly the desired behaviour (instant resize). Though yes, the events would still appear of order, I'll try to fix this properly at some point (there are ways to do this properly, they just need more research).

@kchibisov
Copy link
Member

Maybe we need a separate event telling that sync redraw operation was requested? Otherwise it'll break assumptions about the redraw requested? Or at least add a comment that on macOS that event could come out of order.

The issue with new event that it'll basically be utilized only on macOS...

If you're fine with current approach, I'd suggest to document that the RedrawRequested could come out of order on macOS and mention justification for that.

@Wumpf Wumpf force-pushed the fix-deadlock-on-maximize-from-callback branch from 5f90efb to 67b5a25 Compare January 19, 2023 08:42
@Wumpf
Copy link
Contributor Author

Wumpf commented Jan 19, 2023

Changed this PR over to the hotfix meanwhile.
EDIT: Plus comment and setting HANDLER.set_in_callback

@Wumpf Wumpf force-pushed the fix-deadlock-on-maximize-from-callback branch from 67b5a25 to fedc4e6 Compare January 19, 2023 08:44
src/platform_impl/macos/app_state.rs Outdated Show resolved Hide resolved
@Wumpf Wumpf force-pushed the fix-deadlock-on-maximize-from-callback branch from fedc4e6 to bfea199 Compare January 20, 2023 22:44
@Wumpf Wumpf force-pushed the fix-deadlock-on-maximize-from-callback branch from bfea199 to f76fa73 Compare January 20, 2023 22:49
@Wumpf Wumpf requested review from kchibisov and removed request for madsmtm and kchibisov January 20, 2023 22:50
@madsmtm
Copy link
Member

madsmtm commented Jan 21, 2023

Thanks, I went ahead and merged this, and since I didn't have access to edit your branch, opened #2641 for the documentation change as suggested by @kchibisov.

@Wumpf Wumpf deleted the fix-deadlock-on-maximize-from-callback branch January 21, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants