-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wayland: Buffers with old size can get commited already after ack_configure on window resize #4563
Comments
It doesn't crash on any other compositor I checked, but seems it's only because they don't check for this protocol violation. This log was obtained on Mutter:
After setting |
Interestingly, when unsetting fullscreen the new buffer size is commited before the configure event is even received, which is why it also triggers protocol violation:
(this log comes from Weston) |
Odd, do we need to resize the buffer as part of setting fullscreen? I figured setting fullscreen would do that for us, via configure or a similar mechanism... |
No, the problem is that:
|
The second bit is probably the resize in SetWindowFullscreen, which unfortunately was necessary to get compositors to not blow apart the window size when leaving fullscreen (test with Super Hexagon). For the first one... that one's lost me, how long do we have to wait for it to be ready for an ack when they told us to reconfigure in the first place? This might be a bit out of my range, but SupHex is a perfect test case for fullscreen testing as it avoids all APIs except SetWindowFullscreen and uses resize events for everything else, despite not being resizable. |
Wait? No, SDL should send an ack when SDL is ready - that is, the surface has been reconfigured with a new resolution. Currently SDL acks first, commits and only then changes the buffer size, which is obviously wrong. |
I'd be up for reintroducing the first one - the second one basically only works because we put off resizing so late that it skips over a lot of activity in exchange for lag. Odds are we just need to move the ack to be later in HandlePendingResize. |
The flush shouldn't be necessary with correct resizing logic. It was simply hiding the bug under the carpet. From 7f261d3: void
Wayland_SetWindowFullscreen(_THIS, SDL_Window * window,
SDL_VideoDisplay * _display, SDL_bool fullscreen)
{
struct wl_output *output = ((SDL_WaylandOutputData*) _display->driverdata)->output;
SetFullscreen(window, fullscreen ? output : NULL);
/* The window may have been resized to the output size, so reset this when
* returning to a window
*/
if (!fullscreen) {
SDL_WindowData *wind = (SDL_WindowData*) window->driverdata;
wind->resize.width = window->windowed.w;
wind->resize.height = window->windowed.h;
wind->resize.pending = SDL_TRUE;
Wayland_HandlePendingResize(window);
}
} This is wrong - |
That makes perfect sense to me - in messing around with this I think the block should be replaced with a flush, as that was what I was desperately trying to replicate here. It's very possible that, like the original version, a flush should be unconditional. To lay out why this is such a pain in the ass, this is what happens:
So the issue is two-fold: We need the first resized event to ensure we have the correct |
I still don't understand what's going on regarding the case when fullscreen is being set. However, I'm pretty sure now that libdecor backend is broken in exact same way - it just gets away with it because it calls This means that instead of crashing/violating the protocol, we're getting a broken frame displayed on screen until it gets replaced with the next one that's actually correctly sized. |
So there's #4575 that workarounds this issue, but doesn't get rid of visual glitches. I've been trying to fix it properly as well and got the code written for at least the GL case that, to my best knowledge, should be correct, but... it doesn't work - and judging from discussion on #wayland it smells like it may be another Mesa bug. Also, even if it worked, there would still be Vulkan case to handle. I don't know much about Vulkan - does SDL even know when the buffer swap happens there? |
Vulkan handles presentation on its own via I wonder if the glitch has to do with the frame callback? We try to follow it for GL as best as we can, but technically we don't swap inside the call itself. I don't know if that would actually cause an issue, but it's about the only thing I can think of off the top of my head. |
I did Even More Super Hexagon Tests(TM) and it seems like a root cause may have been us ignoring configure events...? With this I get the right window size without having to fire all those arbitrary resizes from before, so it's very possible that this affected what should have been smooth resizes as well. |
Renamed the issue as we don't crash anymore, but the incorrect resize logic is still going to haunt us. |
@flibitijibibo I don't think frame callback is related in any way. At least it shouldn't be. To my best knowledge, this should work in OpenGL case: 785dfdf However, it does not. Even when eglQuerySurface reports the size to be already changed, the next buffer that gets commited can still have the old size. From #wayland:
So, I'm going to try to isolate the test case and file a Mesa bug afterwards. |
BTW. Moving the eglQuerySurface check and xdg_surface_ack_configure after eglSwapBuffers fixes the glitches... most of the time. It still fails sometimes, but most of the time it resizes flawlessly. It also does not make any sense whatsoever, I just tried it to see what will happen :P So yeah, I think we're in the "Mesa does something else under the hood than everyone thinks it does" territory. |
Interesting, if this related to eglSwapBuffers, it might be worth checking with the swap interval set to both 0 and 1. The weird custom vsync waiting has some weird side effects, and one of them was VVV not showing the frame during resize when the interval is set to 1. |
One source of error could be that we use
the surface has to be committed for the limits to apply. This also happens inside the configure callback with the stack trace
but has to wait for
Since Ideally, you will gather the state provided by the |
Oof, yeah that is a messy stack... I think that might have been a carryover from before we had the shared CommitMinMax function, I wonder if moving the commit out of the function and instead put it after CommitMinMaxDimensions in SetWindowResizable/SetWindowMaximumSize/SetWindowMinimumSize would be enough to avoid committing while having everything continue to work. Functionally it'd be the same for every function except SetFullscreen so if it works it'd be pretty safe to sneak into 2.0.16 at the last moment. |
The same goes for I think weston's I take it that the intention with the I think the entire loop should look something like:
The interaction with the compositor via the Wayland file descriptor can be quite complicated, especially when you have to use multiple queues for multiple threads. The documentation about |
While I completely agree that it shouldn't be done this way, commenting it out was one of the first things I tried while looking at this issue and, sadly, it did not help at all. Judging from the protocol logs I really don't think it's SDL who's at fault there (although it wouldn't surprise me if reworking it to work more like a conventional Wayland client could make it not trigger the bug anymore). |
For the record, I tried it without busy-looping SwapWindow hack just in case it was somewhat related, but that didn't change anything. Also, I don't seem to be able to reproduce it with some simpler apps like testgles2; this appears to be some kind of race condition based on when the client is invoking GL calls that cause the buffer to get allocated, and my Allegro-based games when used with SDL backend somehow manage to trigger it consistently. Trying to write a simple reproducer got me nowhere so far, so I'll probably have to work the other way around to simplify the existing reproducer. |
Just tried the latest This list of steps should make it easy to reproduce:
Effect:
|
I messed around with this some more and didn't get anywhere... this works great under Vulkan, GL not so much:
|
Try as I might I cannot get this to happen locally on anything other than the exact test case on the exact compositor. This is at risk of being bumped to .24 as a result. (We really should make window changes a Commit() operation in 3.0...) |
This bug may interact with #5450, probably not but it does touch a lot of the same areas with an added protocol extension. |
We're at the deadline, so bumping to 2.0.24. That said I would be very interested to know if the recent wp_viewporter additions fix this, since the viewporter would presumably scale the surface and thus prevent the protocol error. |
Going to make a potentially controversial call and make this a 3.0 task, rather than a 2.0.x task... see #3519 (comment) for more. |
Any chance of testing this against the latest version and seeing if it's fixed? There's been a lot of cleanup of excess commits and configuration in the Wayland backend, particularly regarding entering/exiting fullscreen. With the |
I played around with this a bit more and I can't replicate this at all with the current version, so I think it's fair to say this has been fixed. |
To reproduce:
weston --width=640 --height=480
SDL_SetWindowFullscreen
(tested withSDL_WINDOW_FULLSCREEN_DESKTOP
)The client gets terminated with a protocol error:
Happens with xdg-shell, does not happen with libdecor. Tested on d784dd2.
/cc @flibitijibibo
The text was updated successfully, but these errors were encountered: