Throttle SetEvent call in render thread #3511

skyline75489 · 2019-11-10T13:21:57Z

Summary of the Pull Request

This is an attempt to reduce unnecessary SetEvent CPU overhead.

References

PR Checklist

Closes #xxx
CLA signed. If not, go over here and sign the CLA
Tests added/passed
Requires documentation to be updated
I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

Redrawing is triggered intensively when terminal is under load. The code here indicates that the redraw call will be throttled:

terminal/src/renderer/base/renderer.cpp

Line 154 in 94fc40e

// The thread will provide throttling for us.

Still, the SetEvent call itself stands out a 7% usage of CPU in Release build. Those SetEvent calls are mostly useless because most of them are invoked while the rendering thread is sleeping (for s_FrameLimitMilliseconds long).

I've tried to guard all the useless calls with _fPainting flag, as in this PR. It works great for the CPU. But the rendering became noticeably sluggish.

I've figured that the rendering thread design is flawed in the perspective of this certain issue, probably also future unknown issues. Instead of being a thread that can be notified, it should behave more like a timer that invokes rendering opereation every s_FrameLimitMilliseconds. In this way, the rendering thread will be more focused on its own work without being distracted from others.

Maybe @egmontkob can provide us some insight about this.

Validation Steps Performed

src/renderer/base/thread.hpp

zadjii-msft · 2019-11-11T15:00:27Z

Maybe the sluggishness comes from us ignoring all frame triggers during the paint. Previously, if something were to trigger a new frame during a paint, then we'd make sure to start another frame right away on the next timer tick. Now, with just the right timing, if there's not any action between one frame and the next timer tick, then that frame is now skipped until something fully after the first frame triggers a new frame.

Maybe if we kept another flag, which is set when TriggerRedraw is called during a frame, we could immediately call SetEvent again at the end of a frame, if something else requested a new frame during that frame.

EDIT: okay so that kinda sounds like nonsense re-reading it. Lemme try and clear it up.
Let's add another flag, _nextFrameRequested. When TriggerRedraw is called, and we're about to early-return because _fPainting is true, let's also set _nextFrameRequested=true. Then at the end of _ThreadProc, if _nextFrameRequested is set, call SetEvent to trigger the next frame right away (or just skip the WaitForSingleObject at the start of the thread proc). Maybe all that will keep the perf improvement, but without the sluggishness.

skyline75489 · 2019-11-11T23:51:40Z

@zadjii-msft Thanks for the idea. The sluggishness is gone indeed. I think it's good enough to out of draft now.

skyline75489 · 2019-11-12T00:04:59Z

According to cppreference https://en.cppreference.com/w/cpp/atomic/memory_order, quote

One notable exception is Visual Studio, where, with default settings, every volatile write has release semantics and every volatile read has acquire semantics

Seems volatile is also good for this.

DHowett-MSFT · 2019-11-12T00:47:36Z

Voicing disagreement with using volatile. If we want specific memory orders, we should request them. volatile is the subject of an active proposal for deprecation, and I don’t think net new code should be taking a dependency on it.

skyline75489 · 2019-11-12T01:57:48Z

src/renderer/base/thread.cpp

-        WaitForSingleObject(_hEvent, INFINITE);
+
+        // Skip waiting if next frame is requested.
+        if (_fNextFrameRequested.test_and_set(std::memory_order_relaxed))


I think this can be relaxed since it happens mostly during when the render thread is sleeping.

skyline75489 · 2019-11-12T02:00:43Z

@DHowett-MSFT Thanks for the clarification. I came from a Java background and they also have volatile in Java. Just learned that essentially that is different from volatile in C/C++.

zadjii-msft

This seems good to me, though I wouldn't want this merged without @miniksa's signoff.

Do you happen to have a perf trace of what this preforms like after this change?

skyline75489 · 2019-11-12T14:23:41Z

@zadjii-msft There you go. Pretty good huh?

zadjii-msft · 2019-11-12T14:27:16Z

@skyline75489 This does put a smile on my face ☺️ Thanks!

miniksa · 2020-02-14T21:57:34Z

@skyline75489, sorry this took me so so so long to get around to. I just started digging back into rendering and performance things this week and am coming back around to these PRs. This looks wonderful to me and I approve. Thanks for your effort and sorry for the delay.

## Summary of the Pull Request Fix a bug where the `Renderer::PaintFrame` method: 1. is not called until the next `RenderThread::NotifyThread` call but needs to be called because there the terminal was updated (theoretical bug) 2. is called twice but needs to be called only once (verified bug) ## References The bug was introduced by #3511. ## PR Checklist * [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA ## Detailed Description of the Pull Request / Additional comments ### Before #### First bug In the original code, `_fNextFrameRequested` is set to `true` in render thread because `std::atomic_flag::test_and_set` is called. This is wrong because it means that the render thread will render the terminal again even if there is no change after the last render. I think the the goal was to load the boolean value for `_fNextFrameRequested` to check whether the thread should sleep or not. The problem is that there is no method on `std::atomic_flag` to load its boolean value. I guess what happened was that the "solution" that was found was to use `std::atomic_flag::test_and_set`, followed by `std::atomic_flag::clear` if the value was `false` originally (if `std::atomic_flag::test_and_set` returned `false`) to restore the original value. I guess that this was believed to be equivalent to just a simple load, without doing any change to the value because it restores it at the end. But it's not: this is dangerous because if the value is changed to `true` between the call to `std::atomic_flag::test_and_set` and the call to `std::atomic_flag::clear`, then the value ends up being `false` at the end which is wrong because we don't want to change it! And if that value ends up being `false`, it means that we miss a render because we will wait on `_hEvent` during the next iteration on the render thread. Well actually, here, this not even a problem because when that code is ran, `_fPainting` is `false` which means that the other thread that modifies the `_fNextFrameRequested` value through `RenderThread::NotifyPaint` will not actually modify `_fNextFrameRequested` but rather call `SetEvent` (see the method's body). But wait! There is a problem there too! `std::atomic_flag::test_and_set` is called for `_fPainting` which sets its value to `true`. It was probably unintended. So actually, the next call to `RenderThread::NotifyPaint` _will_ end up modifying `_fNextFrameRequested` which means that the data race I was talking about _might_ happen! #### Second bug Let's go back a little bit in my explanation. I was talking about the fact that: > I guess what happened was that the "solution" that was found was to use `std::atomic_flag::test_and_set`, followed by `std::atomic_flag::clear` if the value was `false` originally (if `std::atomic_flag::test_and_set` returned `false`) to restore the original value. The problem is that the reverse was done in the implementation: `std::atomic_flag::clear` is called if the value was _`true`_ originally! So at this point, if the value of `_fNextFrameRequested` was `false`, then `std::atomic_flag::test_and_set` sets its is set to `true` and returns `false`. So for the next iteration, `_fNextFrameRequested` is `true` and the render thread will re-render but that was not needed. ### After I used `std::atomic<bool>` instead of `std::atomic_flag` for `_fNextFrameRequested` and the other atomic field because it has a `load` and a `store` method so we can actually load the value without changing it. I also replaced `_fPainting` by `_fWaiting`, which is basically the opposite of `_fPainting` but stays `true` for a little shorter than `_fPainting` would stay `false`. Indeed, I think that it makes more sense to directly wrap/scope _just_ the call to `WaitForSingleObject` by setting my atomic variable to `true` _just_ before and to `false` _just_ after because: * It makes more sense while you're reading the code: it's easier IMO to understand what the purpose of `_fWaiting` is (that is, to call `SetEvent` from `RenderThread::NotifyPaint` if it's `true`). * It's probably a tiny bit better for performance because it will become `true` for a little shorter which means less calls to `SetEvent`. #### Warning I don't really understand [std::memory_order](https://en.cppreference.com/w/cpp/atomic/memory_order)s. So I used the default one (`std::memory_order_seq_cst`) which is the safest. I believe that if no read or write are reordered in the two threads (`RenderThread::NotifyPaint` and `RenderThread::_ThreadProc`), then the code I wrote will behave correctly. I think that `std::memory_order_seq_cst` enforces that so it should be fine, but I'm not sure. ## Validation Steps Performed **I tried to reproduce the second bug that I described in the first section of this PR.** I put a breakpoint on `RenderThread::NotifyPaint` and on `Renderer::PaintFrame`. Initially they are disabled. Then I ran the terminal in Release mode, waited a bit for the prompt to display and the cursor to start blinking. Then I enabled the breakpoints. ### Before Each `RenderThread::NotifyPaint` is followed by 2 `Renderer::PaintFrame` calls. ❌ ### After Each `RenderThread::NotifyPaint` is followed by 1 `Renderer::PaintFrame` call. ✔️

DHowett-MSFT · 2020-04-08T19:59:34Z

🎉 Once again, thanks for the contribution!

This pull request was included in a set of conhost changes that was just
released with Windows Insider Build 19603.

Throttle SetEvent call in render thread

8d1843f

zadjii-msft reviewed Nov 11, 2019

View reviewed changes

src/renderer/base/thread.hpp Outdated Show resolved Hide resolved

zadjii-msft requested a review from miniksa November 11, 2019 14:55

zadjii-msft added Area-Performance Performance-related issue Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues labels Nov 11, 2019

Change for review

722ba64

skyline75489 marked this pull request as ready for review November 11, 2019 23:52

skyline75489 commented Nov 12, 2019

View reviewed changes

Atomic flag

cc1dbc0

skyline75489 force-pushed the fix/setEventOverhead branch from 6fc66ce to cc1dbc0 Compare November 12, 2019 06:07

zadjii-msft approved these changes Nov 12, 2019

View reviewed changes

skyline75489 mentioned this pull request Nov 13, 2019

Throttle scroll position update #3531

Merged

5 tasks

miniksa approved these changes Feb 14, 2020

View reviewed changes

miniksa merged commit 081493e into microsoft:master Feb 14, 2020

beviu mentioned this pull request Feb 22, 2020

Fix RenderThread's notify mechanism #4698

Merged

1 task

skyline75489 deleted the fix/setEventOverhead branch February 9, 2021 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttle SetEvent call in render thread #3511

Throttle SetEvent call in render thread #3511

skyline75489 commented Nov 10, 2019 •

edited

Loading

zadjii-msft commented Nov 11, 2019 •

edited

Loading

skyline75489 commented Nov 11, 2019

skyline75489 commented Nov 12, 2019

DHowett-MSFT commented Nov 12, 2019

skyline75489 Nov 12, 2019

skyline75489 commented Nov 12, 2019

zadjii-msft left a comment

skyline75489 commented Nov 12, 2019

zadjii-msft commented Nov 12, 2019

miniksa commented Feb 14, 2020

DHowett-MSFT commented Apr 8, 2020

Throttle SetEvent call in render thread #3511

Throttle SetEvent call in render thread #3511

Conversation

skyline75489 commented Nov 10, 2019 • edited Loading

Summary of the Pull Request

References

PR Checklist

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

zadjii-msft commented Nov 11, 2019 • edited Loading

skyline75489 commented Nov 11, 2019

skyline75489 commented Nov 12, 2019

DHowett-MSFT commented Nov 12, 2019

skyline75489 Nov 12, 2019

Choose a reason for hiding this comment

skyline75489 commented Nov 12, 2019

zadjii-msft left a comment

Choose a reason for hiding this comment

skyline75489 commented Nov 12, 2019

zadjii-msft commented Nov 12, 2019

miniksa commented Feb 14, 2020

DHowett-MSFT commented Apr 8, 2020

skyline75489 commented Nov 10, 2019 •

edited

Loading

zadjii-msft commented Nov 11, 2019 •

edited

Loading