Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttle SetEvent call in render thread #3511

Merged
merged 3 commits into from
Feb 14, 2020

Conversation

skyline75489
Copy link
Collaborator

@skyline75489 skyline75489 commented Nov 10, 2019

Summary of the Pull Request

This is an attempt to reduce unnecessary SetEvent CPU overhead.

References

PR Checklist

  • Closes #xxx
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Requires documentation to be updated
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

Redrawing is triggered intensively when terminal is under load. The code here indicates that the redraw call will be throttled:

// The thread will provide throttling for us.

Still, the SetEvent call itself stands out a 7% usage of CPU in Release build. Those SetEvent calls are mostly useless because most of them are invoked while the rendering thread is sleeping (for s_FrameLimitMilliseconds long).

图片

I've tried to guard all the useless calls with _fPainting flag, as in this PR. It works great for the CPU. But the rendering became noticeably sluggish.

I've figured that the rendering thread design is flawed in the perspective of this certain issue, probably also future unknown issues. Instead of being a thread that can be notified, it should behave more like a timer that invokes rendering opereation every s_FrameLimitMilliseconds. In this way, the rendering thread will be more focused on its own work without being distracted from others.

Maybe @egmontkob can provide us some insight about this.

Validation Steps Performed

@zadjii-msft zadjii-msft requested a review from miniksa November 11, 2019 14:55
@zadjii-msft zadjii-msft added Area-Performance Performance-related issue Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues labels Nov 11, 2019
@zadjii-msft
Copy link
Member

zadjii-msft commented Nov 11, 2019

Maybe the sluggishness comes from us ignoring all frame triggers during the paint. Previously, if something were to trigger a new frame during a paint, then we'd make sure to start another frame right away on the next timer tick. Now, with just the right timing, if there's not any action between one frame and the next timer tick, then that frame is now skipped until something fully after the first frame triggers a new frame.

Maybe if we kept another flag, which is set when TriggerRedraw is called during a frame, we could immediately call SetEvent again at the end of a frame, if something else requested a new frame during that frame.

EDIT: okay so that kinda sounds like nonsense re-reading it. Lemme try and clear it up.
Let's add another flag, _nextFrameRequested. When TriggerRedraw is called, and we're about to early-return because _fPainting is true, let's also set _nextFrameRequested=true. Then at the end of _ThreadProc, if _nextFrameRequested is set, call SetEvent to trigger the next frame right away (or just skip the WaitForSingleObject at the start of the thread proc). Maybe all that will keep the perf improvement, but without the sluggishness.

@skyline75489
Copy link
Collaborator Author

@zadjii-msft Thanks for the idea. The sluggishness is gone indeed. I think it's good enough to out of draft now.

@skyline75489 skyline75489 marked this pull request as ready for review November 11, 2019 23:52
@skyline75489
Copy link
Collaborator Author

According to cppreference https://en.cppreference.com/w/cpp/atomic/memory_order, quote

One notable exception is Visual Studio, where, with default settings, every volatile write has release semantics and every volatile read has acquire semantics

Seems volatile is also good for this.

@DHowett-MSFT
Copy link
Contributor

Voicing disagreement with using volatile. If we want specific memory orders, we should request them. volatile is the subject of an active proposal for deprecation, and I don’t think net new code should be taking a dependency on it.

WaitForSingleObject(_hEvent, INFINITE);

// Skip waiting if next frame is requested.
if (_fNextFrameRequested.test_and_set(std::memory_order_relaxed))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be relaxed since it happens mostly during when the render thread is sleeping.

@skyline75489
Copy link
Collaborator Author

@DHowett-MSFT Thanks for the clarification. I came from a Java background and they also have volatile in Java. Just learned that essentially that is different from volatile in C/C++.

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me, though I wouldn't want this merged without @miniksa's signoff.

Do you happen to have a perf trace of what this preforms like after this change?

@skyline75489
Copy link
Collaborator Author

图片

@zadjii-msft There you go. Pretty good huh?

@zadjii-msft
Copy link
Member

@skyline75489 This does put a smile on my face ☺️ Thanks!

@miniksa
Copy link
Member

miniksa commented Feb 14, 2020

@skyline75489, sorry this took me so so so long to get around to. I just started digging back into rendering and performance things this week and am coming back around to these PRs. This looks wonderful to me and I approve. Thanks for your effort and sorry for the delay.

@miniksa miniksa merged commit 081493e into microsoft:master Feb 14, 2020
@beviu beviu mentioned this pull request Feb 22, 2020
1 task
ghost pushed a commit that referenced this pull request Feb 27, 2020
## Summary of the Pull Request

Fix a bug where the `Renderer::PaintFrame` method:
1. is not called until the next `RenderThread::NotifyThread` call but needs to be called because there the terminal was updated (theoretical bug)
2. is called twice but needs to be called only once (verified bug)

## References

The bug was introduced by #3511.

## PR Checklist
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA

## Detailed Description of the Pull Request / Additional comments

### Before

#### First bug

In the original code, `_fNextFrameRequested` is set to `true` in render thread because `std::atomic_flag::test_and_set` is called.

This is wrong because it means that the render thread will render the terminal again even if there is no change after the last render.

I think the the goal was to load the boolean value for `_fNextFrameRequested` to check whether the thread should sleep or not.

The problem is that there is no method on `std::atomic_flag` to load its boolean value. I guess what happened was that the "solution" that was found was to use `std::atomic_flag::test_and_set`, followed by `std::atomic_flag::clear` if the value was `false` originally (if `std::atomic_flag::test_and_set` returned `false`) to restore the original value. I guess that this was believed to be equivalent to just a simple load, without doing any change to the value because it restores it at the end.

But it's not: this is dangerous because if the value is changed to `true` between the call to `std::atomic_flag::test_and_set` and the call to `std::atomic_flag::clear`, then the value ends up being `false` at the end which is wrong because we don't want to change it! And if that value ends up being `false`, it means that we miss a render because we will wait on `_hEvent` during the next iteration on the render thread.

Well actually, here, this not even a problem because when that code is ran, `_fPainting` is `false` which means that the other thread that modifies the `_fNextFrameRequested` value through `RenderThread::NotifyPaint` will not actually modify `_fNextFrameRequested` but rather call `SetEvent` (see the method's body).

But wait! There is a problem there too! `std::atomic_flag::test_and_set` is called for `_fPainting` which sets its value to `true`. It was probably unintended. So actually, the next call to `RenderThread::NotifyPaint` _will_ end up modifying `_fNextFrameRequested` which means that the data race I was talking about _might_ happen!

#### Second bug

Let's go back a little bit in my explanation. I was talking about the fact that:

> I guess what happened was that the "solution" that was found was to use `std::atomic_flag::test_and_set`, followed by `std::atomic_flag::clear` if the value was `false` originally (if `std::atomic_flag::test_and_set` returned `false`) to restore the original value.

The problem is that the reverse was done in the implementation: `std::atomic_flag::clear` is called if the value was _`true`_ originally!

So at this point, if the value of `_fNextFrameRequested` was `false`, then `std::atomic_flag::test_and_set` sets its is set to `true` and returns `false`. So for the next iteration, `_fNextFrameRequested` is `true` and the render thread will re-render but that was not needed.

### After

I used `std::atomic<bool>` instead of `std::atomic_flag` for `_fNextFrameRequested` and the other atomic field because it has a `load` and a `store` method so we can actually load the value without changing it.

I also replaced `_fPainting` by `_fWaiting`, which is basically the opposite of `_fPainting` but stays `true` for a little shorter than `_fPainting` would stay `false`. Indeed, I think that it makes more sense to directly wrap/scope _just_ the call to `WaitForSingleObject` by setting my atomic variable to `true` _just_ before and to `false` _just_ after because:
* It makes more sense while you're reading the code: it's easier IMO to understand what the purpose of `_fWaiting` is (that is, to call `SetEvent` from `RenderThread::NotifyPaint` if it's `true`).
* It's probably a tiny bit better for performance because it will become `true` for a little shorter which means less calls to `SetEvent`.

#### Warning

I don't really understand [std::memory_order](https://en.cppreference.com/w/cpp/atomic/memory_order)s.

So I used the default one (`std::memory_order_seq_cst`) which is the safest.

I believe that if no read or write are reordered in the two threads (`RenderThread::NotifyPaint` and `RenderThread::_ThreadProc`), then the code I wrote will behave correctly.

I think that `std::memory_order_seq_cst` enforces that so it should be fine, but I'm not sure.

## Validation Steps Performed

**I tried to reproduce the second bug that I described in the first section of this PR.**

I put a breakpoint on `RenderThread::NotifyPaint` and on `Renderer::PaintFrame`. Initially they are disabled. Then I ran the terminal in Release mode, waited a bit for the prompt to display and the cursor to start blinking. Then I enabled the breakpoints.

### Before

Each `RenderThread::NotifyPaint` is followed by 2 `Renderer::PaintFrame` calls. ❌ 

### After

Each `RenderThread::NotifyPaint` is followed by 1 `Renderer::PaintFrame` call. ✔️
@DHowett-MSFT
Copy link
Contributor

🎉 Once again, thanks for the contribution!

This pull request was included in a set of conhost changes that was just
released with Windows Insider Build 19603.

@skyline75489 skyline75489 deleted the fix/setEventOverhead branch February 9, 2021 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants