-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Throttle SetEvent call in render thread #3511
Conversation
Maybe the sluggishness comes from us ignoring all frame triggers during the paint. Previously, if something were to trigger a new frame during a paint, then we'd make sure to start another frame right away on the next timer tick. Now, with just the right timing, if there's not any action between one frame and the next timer tick, then that frame is now skipped until something fully after the first frame triggers a new frame. Maybe if we kept another flag, which is set when EDIT: okay so that kinda sounds like nonsense re-reading it. Lemme try and clear it up. |
@zadjii-msft Thanks for the idea. The sluggishness is gone indeed. I think it's good enough to out of draft now. |
According to cppreference https://en.cppreference.com/w/cpp/atomic/memory_order, quote
Seems volatile is also good for this. |
Voicing disagreement with using |
WaitForSingleObject(_hEvent, INFINITE); | ||
|
||
// Skip waiting if next frame is requested. | ||
if (_fNextFrameRequested.test_and_set(std::memory_order_relaxed)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be relaxed since it happens mostly during when the render thread is sleeping.
@DHowett-MSFT Thanks for the clarification. I came from a Java background and they also have |
6fc66ce
to
cc1dbc0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good to me, though I wouldn't want this merged without @miniksa's signoff.
Do you happen to have a perf trace of what this preforms like after this change?
@zadjii-msft There you go. Pretty good huh? |
@skyline75489 This does put a smile on my face |
@skyline75489, sorry this took me so so so long to get around to. I just started digging back into rendering and performance things this week and am coming back around to these PRs. This looks wonderful to me and I approve. Thanks for your effort and sorry for the delay. |
## Summary of the Pull Request Fix a bug where the `Renderer::PaintFrame` method: 1. is not called until the next `RenderThread::NotifyThread` call but needs to be called because there the terminal was updated (theoretical bug) 2. is called twice but needs to be called only once (verified bug) ## References The bug was introduced by #3511. ## PR Checklist * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA ## Detailed Description of the Pull Request / Additional comments ### Before #### First bug In the original code, `_fNextFrameRequested` is set to `true` in render thread because `std::atomic_flag::test_and_set` is called. This is wrong because it means that the render thread will render the terminal again even if there is no change after the last render. I think the the goal was to load the boolean value for `_fNextFrameRequested` to check whether the thread should sleep or not. The problem is that there is no method on `std::atomic_flag` to load its boolean value. I guess what happened was that the "solution" that was found was to use `std::atomic_flag::test_and_set`, followed by `std::atomic_flag::clear` if the value was `false` originally (if `std::atomic_flag::test_and_set` returned `false`) to restore the original value. I guess that this was believed to be equivalent to just a simple load, without doing any change to the value because it restores it at the end. But it's not: this is dangerous because if the value is changed to `true` between the call to `std::atomic_flag::test_and_set` and the call to `std::atomic_flag::clear`, then the value ends up being `false` at the end which is wrong because we don't want to change it! And if that value ends up being `false`, it means that we miss a render because we will wait on `_hEvent` during the next iteration on the render thread. Well actually, here, this not even a problem because when that code is ran, `_fPainting` is `false` which means that the other thread that modifies the `_fNextFrameRequested` value through `RenderThread::NotifyPaint` will not actually modify `_fNextFrameRequested` but rather call `SetEvent` (see the method's body). But wait! There is a problem there too! `std::atomic_flag::test_and_set` is called for `_fPainting` which sets its value to `true`. It was probably unintended. So actually, the next call to `RenderThread::NotifyPaint` _will_ end up modifying `_fNextFrameRequested` which means that the data race I was talking about _might_ happen! #### Second bug Let's go back a little bit in my explanation. I was talking about the fact that: > I guess what happened was that the "solution" that was found was to use `std::atomic_flag::test_and_set`, followed by `std::atomic_flag::clear` if the value was `false` originally (if `std::atomic_flag::test_and_set` returned `false`) to restore the original value. The problem is that the reverse was done in the implementation: `std::atomic_flag::clear` is called if the value was _`true`_ originally! So at this point, if the value of `_fNextFrameRequested` was `false`, then `std::atomic_flag::test_and_set` sets its is set to `true` and returns `false`. So for the next iteration, `_fNextFrameRequested` is `true` and the render thread will re-render but that was not needed. ### After I used `std::atomic<bool>` instead of `std::atomic_flag` for `_fNextFrameRequested` and the other atomic field because it has a `load` and a `store` method so we can actually load the value without changing it. I also replaced `_fPainting` by `_fWaiting`, which is basically the opposite of `_fPainting` but stays `true` for a little shorter than `_fPainting` would stay `false`. Indeed, I think that it makes more sense to directly wrap/scope _just_ the call to `WaitForSingleObject` by setting my atomic variable to `true` _just_ before and to `false` _just_ after because: * It makes more sense while you're reading the code: it's easier IMO to understand what the purpose of `_fWaiting` is (that is, to call `SetEvent` from `RenderThread::NotifyPaint` if it's `true`). * It's probably a tiny bit better for performance because it will become `true` for a little shorter which means less calls to `SetEvent`. #### Warning I don't really understand [std::memory_order](https://en.cppreference.com/w/cpp/atomic/memory_order)s. So I used the default one (`std::memory_order_seq_cst`) which is the safest. I believe that if no read or write are reordered in the two threads (`RenderThread::NotifyPaint` and `RenderThread::_ThreadProc`), then the code I wrote will behave correctly. I think that `std::memory_order_seq_cst` enforces that so it should be fine, but I'm not sure. ## Validation Steps Performed **I tried to reproduce the second bug that I described in the first section of this PR.** I put a breakpoint on `RenderThread::NotifyPaint` and on `Renderer::PaintFrame`. Initially they are disabled. Then I ran the terminal in Release mode, waited a bit for the prompt to display and the cursor to start blinking. Then I enabled the breakpoints. ### Before Each `RenderThread::NotifyPaint` is followed by 2 `Renderer::PaintFrame` calls. ❌ ### After Each `RenderThread::NotifyPaint` is followed by 1 `Renderer::PaintFrame` call. ✔️
🎉 Once again, thanks for the contribution! This pull request was included in a set of conhost changes that was just |
Summary of the Pull Request
This is an attempt to reduce unnecessary
SetEvent
CPU overhead.References
PR Checklist
Detailed Description of the Pull Request / Additional comments
Redrawing is triggered intensively when terminal is under load. The code here indicates that the redraw call will be throttled:
terminal/src/renderer/base/renderer.cpp
Line 154 in 94fc40e
Still, the
SetEvent
call itself stands out a 7% usage of CPU in Release build. ThoseSetEvent
calls are mostly useless because most of them are invoked while the rendering thread is sleeping (fors_FrameLimitMilliseconds
long).I've tried to guard all the useless calls with
_fPainting
flag, as in this PR. It works great for the CPU. But the rendering became noticeably sluggish.I've figured that the rendering thread design is flawed in the perspective of this certain issue, probably also future unknown issues. Instead of being a thread that can be notified, it should behave more like a timer that invokes rendering opereation every
s_FrameLimitMilliseconds
. In this way, the rendering thread will be more focused on its own work without being distracted from others.Maybe @egmontkob can provide us some insight about this.
Validation Steps Performed