Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RenderThread's notify mechanism #4698

Merged
3 commits merged into from
Feb 27, 2020
Merged

Fix RenderThread's notify mechanism #4698

3 commits merged into from
Feb 27, 2020

Conversation

beviu
Copy link
Contributor

@beviu beviu commented Feb 22, 2020

Summary of the Pull Request

Fix a bug where the Renderer::PaintFrame method:

  1. is not called until the next RenderThread::NotifyThread call but needs to be called because there the terminal was updated (theoretical bug)
  2. is called twice but needs to be called only once (verified bug)

References

The bug was introduced by #3511.

PR Checklist

  • CLA signed. If not, go over here and sign the CLA

Detailed Description of the Pull Request / Additional comments

Before

First bug

In the original code, _fNextFrameRequested is set to true in render thread because std::atomic_flag::test_and_set is called.

This is wrong because it means that the render thread will render the terminal again even if there is no change after the last render.

I think the the goal was to load the boolean value for _fNextFrameRequested to check whether the thread should sleep or not.

The problem is that there is no method on std::atomic_flag to load its boolean value. I guess what happened was that the "solution" that was found was to use std::atomic_flag::test_and_set, followed by std::atomic_flag::clear if the value was false originally (if std::atomic_flag::test_and_set returned false) to restore the original value. I guess that this was believed to be equivalent to just a simple load, without doing any change to the value because it restores it at the end.

But it's not: this is dangerous because if the value is changed to true between the call to std::atomic_flag::test_and_set and the call to std::atomic_flag::clear, then the value ends up being false at the end which is wrong because we don't want to change it! And if that value ends up being false, it means that we miss a render because we will wait on _hEvent during the next iteration on the render thread.

Well actually, here, this not even a problem because when that code is ran, _fPainting is false which means that the other thread that modifies the _fNextFrameRequested value through RenderThread::NotifyPaint will not actually modify _fNextFrameRequested but rather call SetEvent (see the method's body).

But wait! There is a problem there too! std::atomic_flag::test_and_set is called for _fPainting which sets its value to true. It was probably unintended. So actually, the next call to RenderThread::NotifyPaint will end up modifying _fNextFrameRequested which means that the data race I was talking about might happen!

Second bug

Let's go back a little bit in my explanation. I was talking about the fact that:

I guess what happened was that the "solution" that was found was to use std::atomic_flag::test_and_set, followed by std::atomic_flag::clear if the value was false originally (if std::atomic_flag::test_and_set returned false) to restore the original value.

The problem is that the reverse was done in the implementation: std::atomic_flag::clear is called if the value was true originally!

So at this point, if the value of _fNextFrameRequested was false, then std::atomic_flag::test_and_set sets its is set to true and returns false. So for the next iteration, _fNextFrameRequested is true and the render thread will re-render but that was not needed.

After

I used std::atomic<bool> instead of std::atomic_flag for _fNextFrameRequested and the other atomic field because it has a load and a store method so we can actually load the value without changing it.

I also replaced _fPainting by _fWaiting, which is basically the opposite of _fPainting but stays true for a little shorter than _fPainting would stay false. Indeed, I think that it makes more sense to directly wrap/scope just the call to WaitForSingleObject by setting my atomic variable to true just before and to false just after because:

  • It makes more sense while you're reading the code: it's easier IMO to understand what the purpose of _fWaiting is (that is, to call SetEvent from RenderThread::NotifyPaint if it's true).
  • It's probably a tiny bit better for performance because it will become true for a little shorter which means less calls to SetEvent.

Warning

I don't really understand std::memory_orders.

So I used the default one (std::memory_order_seq_cst) which is the safest.

I believe that if no read or write are reordered in the two threads (RenderThread::NotifyPaint and RenderThread::_ThreadProc), then the code I wrote will behave correctly.

I think that std::memory_order_seq_cst enforces that so it should be fine, but I'm not sure.

Validation Steps Performed

I tried to reproduce the second bug that I described in the first section of this PR.

I put a breakpoint on RenderThread::NotifyPaint and on Renderer::PaintFrame. Initially they are disabled. Then I ran the terminal in Release mode, waited a bit for the prompt to display and the cursor to start blinking. Then I enabled the breakpoints.

Before

Each RenderThread::NotifyPaint is followed by 2 Renderer::PaintFrame calls. ❌

After

Each RenderThread::NotifyPaint is followed by 1 Renderer::PaintFrame call. ✔️

@beviu
Copy link
Contributor Author

beviu commented Feb 23, 2020

I wonder if the CI failed because it's bugged or if it's because my code is bad. It failed on a test that uses the RenderThread that I modified. Could somebody force it to run again, please?

@miniksa
Copy link
Member

miniksa commented Feb 23, 2020

Reran failed x86 leg. That test has been a pain in our side. It may or may not be your fault.

@zadjii-msft zadjii-msft added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Product-Meta The product is the management of the products. labels Feb 24, 2020
@zadjii-msft zadjii-msft requested a review from miniksa February 24, 2020 14:18
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I've read this code about 8 times, and I think it makes sense to me now. I think we'll have enough runway before the next release to make sure that this doesn't break anything mysteriously. Thanks for debugging this!

@zadjii-msft
Copy link
Member

@msftbot make sure @miniksa signs off on this

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label Feb 24, 2020
@ghost
Copy link

ghost commented Feb 24, 2020

Hello @zadjii-msft!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I'll only merge this pull request if it's approved by @miniksa

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

@ghost ghost removed the AutoMerge Marked for automatic merge by the bot when requirements are met label Feb 24, 2020
@beviu
Copy link
Contributor Author

beviu commented Feb 24, 2020

Last commit is a little perf improvement:
If NotifyPaint is called twice very quickly while the render thread is waiting, it will end up calling SetEvent twice which will make the render thread not wait for the next time when it is supposed to wait which results in two renders while only one was needed. So I added a ResetEvent to make sure that this doesn't happen.

(Sorry for removing AutoMerge, only after re-reading the code another time I figured out that this could be a problem, so I added it now before the PR is merged. Now I'm done.)

@beviu beviu requested a review from zadjii-msft February 24, 2020 15:22
@miniksa miniksa requested a review from DHowett-MSFT February 27, 2020 17:03
@miniksa
Copy link
Member

miniksa commented Feb 27, 2020

@msftbot make sure @DHowett-MSFT signs off

}

// <--
// If `NotifyPaint` is called at this point, then it _will_ set
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent commenting. Thank you.

Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I like having @DHowett-MSFT as a double check on certain things and this is one of them.

@DHowett-MSFT DHowett-MSFT added the AutoMerge Marked for automatic merge by the bot when requirements are met label Feb 27, 2020
@ghost
Copy link

ghost commented Feb 27, 2020

Hello @DHowett-MSFT!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 2f60cf0 into microsoft:master Feb 27, 2020
@DHowett-MSFT
Copy link
Contributor

(It looked good to me; GH didn't show that @zadjii-msft had signed off, but I would have been the third.)

@beviu beviu deleted the fix-render-notify branch March 1, 2020 19:23
@DHowett-MSFT
Copy link
Contributor

🎉 Once again, thanks for the contribution!

This pull request was included in a set of conhost changes that was just
released with Windows Insider Build 19603.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues AutoMerge Marked for automatic merge by the bot when requirements are met Product-Meta The product is the management of the products.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants