Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Laggy unless XPRA_OPENGL_DRAW_REFRESH=0 #2481

Closed
totaam opened this issue Nov 16, 2019 · 28 comments
Closed

Laggy unless XPRA_OPENGL_DRAW_REFRESH=0 #2481

totaam opened this issue Nov 16, 2019 · 28 comments
Labels

Comments

@totaam
Copy link
Collaborator

totaam commented Nov 16, 2019

Issue migrated from trac ticket # 2481

component: client | priority: major | resolution: fixed | keywords: lag regression

2019-11-16 18:11:49: aerusso created the issue


I appreciate there is some difficulty with drivers on Python 3, and, for some systems, refreshing more frequently solves the problem.

However, I found the recent change (3.0.2) to set the default of XPRA_OPENGL_DRAW_REFRESH=1 makes scrolling uncomfortably laggy in many programs. This is all local.

I have not experienced any issues by setting this to 0 (yet). Should I expect any strangeness if I keep this set? Is this an issue that a whitelist or blacklist could help?

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 02:37:25: antoine changed owner from antoine to aerusso

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 02:37:25: antoine commented


Please specify the exact distro you are using.
Please also include the output of xpra opengl.

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 03:13:49: aerusso uploaded file xpra-opengl.out (0.6 KiB)

Output of xpra opengl

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 03:14:41: aerusso commented


This is Debian unstable, and I've attached the output.

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 03:32:37: antoine changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 03:32:37: antoine changed owner from aerusso to antoine

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 03:32:37: antoine commented


The change that enabled XPRA_OPENGL_DRAW_REFRESH=1 for v3 is in 24352.
More details in #2466.
According to #2466#comment:5, this also affects some Linux drivers..
So r24433 switches back to XPRA_OPENGL_DRAW_REFRESH=0 in trunk, but I am very reluctant to apply this to v3 for now, as I don't want to spend days making new release after new release just to toggle this switch back and forth when people hit issues..

I think that the slowness is caused by the multiple calls to present_fbo.
When scrolling, we're sending the window updates using many chunks. We should bundle all of them together for the refresh code path, the same way that we already do for the non-refresh paint code path.

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 17:23:54: antoine changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 17:23:54: antoine changed owner from antoine to aerusso

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 17:23:54: antoine commented


@aerusso: does r24434 fix things?
It is a better solution than XPRA_OPENGL_DRAW_REFRESH=0.
The only other option worth a try if you have repaint issues with this patch is XPRA_REPAINT_ALL=1.

There are beta builds with this change here: [https://xpra.org/beta].

@totaam
Copy link
Collaborator Author

totaam commented Nov 17, 2019

2019-11-17 20:20:33: aerusso commented


I am confused: testing r24434 with XPRA_OPENGL_DRAW_REFRESH=0 (which is what r24433 sets, if I'm reading things right), is excellent (I didn't actually apply that patch, I only applied r24434). Should I be testing with development tip?

With XPRA_OPENGL_DRAW_REFRESH=1, it seems better, but not as good as XPRA_OPENGL_DRAW_REFRESH=0.

This is a relatively high end machine---Ryzen 7 3700X.

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 03:25:25: antoine commented


Should I be testing with development tip?
Applying r24434 should be enough.

which is what r24433 sets, if I'm reading things right
Damn, I meant to revert r24433, that's now done in r24435.

With XPRA_OPENGL_DRAW_REFRESH=1, it seems better, but not as good as XPRA_OPENGL_DRAW_REFRESH=0.
Better than what?

I think r24434 is what I am going to backport to v3, it's the safest solution: we always request a repaint, but doing it efficiently.
I just need to re-test with wayland first, because GTK does weird things with coordinates because of CSD.

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 04:15:44: aerusso commented


Sorry for the confusion. From best to worse, in terms of lag:

  1. XPRA_OPENGL_DRAW_REFRESH=0, with or without r24434 (I can't tell if there is any difference, though I don't really see how there could be)
  2. XPRA_OPENGL_DRAW_REFRESH=1, with r24434
  3. XPRA_OPENGL_DRAW_REFRESH=1, without r24434

Is this a bug in the Python 3 implementation of GTK3? It seems like, at least most of the time on X, a refresh is not needed.

By the way, coalescing all the refreshes into a single one helps the lag enough that I wouldn't have noticed (at least on my machine):

    def after_draw_refresh(self, success, message=""):
        plog("after_draw_refresh(%s, %s) pending_refresh=%s",
             success, message, self.pending_refresh)
        backing = self._backing
        if not backing:   
            return                  
        if REPAINT_ALL or self._client.xscale!=1 or self._client.yscale!=1 or is_Wayland():
            #easy: just repaint the whole window:
            rw, rh = self.get_size()
            self.idle_add(self.queue_draw_area, 0, 0, rw, rh)
            return           
        pr = self.pending_refresh
        self.pending_refresh = []
        bx, by, bw, bh = None, None, None, None
        for x, y, w, h in pr:
            if bx is None:
                bx, by, bw, bh = x, y, w, h
                continue
            if x < bx:
                bw += bx - x
                bx = x 
            if y < by:
                bh += by - y
                by = y
            if w > bw:
                bw = w
            if h > bh:                                             
                bh = h
 
        rx, ry, rw, rh = self._client.srect(bx, by, bw, bh)
        #if self.window_offset:
        #    rx -= self.window_offset[0]
        #    ry -= self.window_offset[1]
        self.idle_add(self.queue_draw_area, rx, ry, rw, rh)

I don't do very much graphics, so I don't know if this will break something somewhere else.

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 07:37:23: antoine commented


XPRA_OPENGL_DRAW_REFRESH=0, with or without r24434 (I can't tell if there is any difference, though I don't really see how there could be)
With r24434, when draw refresh is enabled =1 and with opengl, we don't paint the fbo on-screen until all the screen updates in the same group have been received. This should be faster for when many screen updates arrive in the same group (ie: when scrolling).

By the way, coalescing all the refreshes into a single one helps the lag enough that I wouldn't have noticed (at least on my machine):
I assume you mean without r24434?
Or is it better than r24434 with draw refresh enabled? (which is the new default)

I don't do very much graphics, so I don't know if this will break something somewhere else.
I don't think the code is correct, we already have a cython accelerated rectangle merging function in xpra: [/browser/xpra/trunk/src/xpra/rectangle.pyx] (with unit tests).
In some cases, it is best not to coalesce: imagine a window updating a single pixel in its 4 corners, we would ask the graphics system to repaint all of it. Seems like a contrived example but similar behaviour happens more than you would think because of the way applications decorate their windows.
It's something I have looked into in the past, and it can be costly, especially when opengl is disabled (high CPU cost).
We could still merge something like this, as long as it isn't the default.

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 09:14:41: antoine commented


The backport to v3 is in: 24441.
Will test some more.

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 11:08:39: aerusso commented


Or is it better than r24434 with draw refresh enabled? (which is the new default)

Yes, for the workload I tested, draw refresh enabled and merging all of the rectangles is best. It winds up essentially being equivalent to XPRA_REPAINT_ALL=1.

The workload is Google Chrome, when scrolling. For example, here is a typical pending_requests list

[(0, 0, 1500, 846), (0, 119, 1500, 4), (0, 564, 1500, 16), (0, 601, 1500, 28), (0, 648, 1500, 4), (0, 697, 1500, 16), (0, 740, 1500, 2), (0, 780, 1500, 10), (0, 798, 1500, 13), (0, 817, 1500, 10)]

It's really bizarre to me that updating 10 regions is actually something that I can "feel". Now that I'm digging into queue_draw_area, I suspect that there's some slow code path there, maybe getting and initializing the gl context? I didn't look very hard, but gl_init looks like it could be slow.

I think the ideal solution probably renames queue_draw_area to queue_draw_areas and makes it take a list of rectangular regions that all get updated using the same gl context.

I would guess that it only makes sense to consider merging rectangles after collecting the refreshes within a single gl context creation.

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 11:18:16: antoine changed status from new to assigned

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 11:18:16: antoine changed owner from aerusso to antoine

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 11:18:16: antoine commented


I didn't look very hard, but gl_init looks like it could be slow.
Of course, you're right.

@totaam
Copy link
Collaborator Author

totaam commented Nov 18, 2019

2019-11-18 11:51:11: antoine commented


The slowdown comes from calling gl_show which ends up calling swap_buffers().
The other thing I had forgotten about is that we already do a full window redraw when double-buffered (which is everywhere):

        if self.is_double_buffered() or bw!=ww or bh!=wh:
            #refresh the whole window:
            rectangles = ((0, 0, bw, bh), )
        else:
            #paint just the rectangles we have accumulated:
            rectangles = self.pending_fbo_paint

Now, we don't want to penalize the non-opengl case, so maybe the code just needs to be delegated to the backend class so opengl can take the shortcut.

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2019

2019-11-19 15:25:49: antoine changed status from assigned to new

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2019

2019-11-19 15:25:49: antoine changed owner from antoine to aerusso

@totaam
Copy link
Collaborator Author

totaam commented Nov 19, 2019

2019-11-19 15:25:49: antoine commented


@aerusso: how about r24460? (this solution may help with #2373)

This does a single queue_draw_area for the full window when using the opengl backend whilst preserving the multiple-regions code for the non-opengl case. (tested briefly)

@totaam
Copy link
Collaborator Author

totaam commented Nov 22, 2019

2019-11-22 09:38:21: antoine commented


This may have caused a regression on macos: #2491.

@totaam
Copy link
Collaborator Author

totaam commented Nov 24, 2019

2019-11-24 19:27:54: aerusso commented


I tested with Debian unstable's version and added r24434, r24435, and r24460. I cannot "feel" any lag with these patches.

Will double buffering always be enabled? If not, I think there may still be value in collecting the multiple calls to queue_draw_area into a single gl context.

Also: thanks for xpra! It's really a fantastic piece of software!

@totaam
Copy link
Collaborator Author

totaam commented Nov 25, 2019

2019-11-25 14:49:49: antoine changed status from new to closed

@totaam
Copy link
Collaborator Author

totaam commented Nov 25, 2019

2019-11-25 14:49:49: antoine set resolution to fixed

@totaam
Copy link
Collaborator Author

totaam commented Nov 25, 2019

2019-11-25 14:49:49: antoine commented


r24460 has been backported to the v3 branch in 24483.

Since this fixes the symptoms, I am going to close this ticket. We can deal with regressions and bugs in other tickets and link back here if needed.

@totaam totaam closed this as completed Nov 25, 2019
@totaam totaam added the v3.0.x label Jan 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant