-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rendering performance of chafa is very slow #410
Comments
WPR analysis:
For conhost.exe...
On the I/O thread, hot areas include:
On the render thread, hot areas include:
I didn't do a wait chain analysis yet to see if the locking/threading was slowing things down because at this point, we have a few areas with obvious routes to improvement that might alleviate the whole deal: Therefore, my conclusion is:
And I have now filed MSFT:21167256 to do these things at some point and hopefully we'll have fixed the performance issue. |
Thanks Michael, that's a nice analysis. So, this is probably a moot question, but I'm assuming PolyTextOutW is called once per render frame and only emits text for invalidated regions? |
Correct. |
Okay, excuse the debugging by proxy ;) I'll get back to bundling up a PR for you... :) |
No problem. It was a fair question to ask. We make dumb mistakes all the time. |
@oising, somehow this morning when I'm looking at this, it's not as slow on my machine as it was in one of your videos. I implemented the 3rd thing above (GDI size measurement caching) quickly as I thought it was the best cost-benefit ratio and it improved things by 20-30%, but I want to make sure that I'm actually fixing your problem. Can you possibly send me a WPR trace of your specific repro? (First Level Triage + CPU Usage Profiles? Let me know if you need help on how to do this.) |
@miniksa Sure, I've got the WPR trace now. Where shall I send it? |
Email me the attachment or a link to a share at Microsoft.com. My GitHub alias is unoriginal and is my work address too. Just don't sign me up for spam please. |
I'm sorry, I don't understand -- your github alias is unoriginal? I don't know what you mean. |
My e-mail is my Github alias @microsoft.com. Sorry for being obtuse, I'm trying to avoid spam bots picking it up if I write the real mailto: |
The major time spent on the WinEvent turns out to be only if Node.js is running on your system. If Node.js is running, it registers for the WinEvent notifications for EVENT_CONSOLE_LAYOUT to know when the window size has changed. Given WinEvents require kernel work to broadcast and tend to be registered globally, this causes a system-wide slowdown of all of your consoles when it is listening here. If you kill all node.js runtimes (including the one that Visual Studio 2017 launches), that performance drag goes away. I need to:
|
I checked Given Node.js in the tty file is already reading through the queue with Of course, this also screams of #281 needing to implement a better way overall of receiving these sorts of events, but it's going to be a bit before we get to that. |
That rabbit hole is getting deep :D Very interesting to read. |
Yes... It is... |
The answer to this is "no". We'd have to make categories for each event that we wanted to register separately and I'm not sure there are enough category flags left. Also, no one wants to touch this given it's legacy tech. We will need to drive improvement of this through the other options. |
So, the upshot of this is that any time NodeJS is running (e.g. Visual Studio) then all console windows suffer an approximate 20% slowdown (or worse) due to accessibility eventing broadcasts. Urgh. |
The first quick fix for this (GDI measurement caching) just went out with insider build 18932! |
Continuing improvement of SIGWINCH from PR libuv#2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR libuv#1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was alos updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: libuv#2308
Continuing improvement of SIGWINCH from PR libuv#2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR libuv#1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was alos updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: libuv#2308
Continuing improvement of SIGWINCH from PR #2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of the entire system. This PR changes the way SIGWINCH is handled. The SetWinEventHook callback now signals a separate thread, uv__tty_console_resize_watcher_thread. This thread calls uv__tty_console_signal_resize() which checks if the console was actually resized. The uv__tty_console_resize_watcher_thread makes sure to not to call the uv__tty_console_signal_resize function more than 30 times per second. The SetWinEventHook will not be installed, if the PID of the conhost.exe process that owns the console window cannot be determinated. This can happen when a 32bit libuv app is running on a 64bit Windows. For such cases PR #1408 is partially reverted - when tty reads WINDOW_BUFFER_SIZE_EVENT, it will also trigger a call to uv__tty_console_signal_resize(). This will also help when the app is running under console emulators. Documentation was also updated to reflect that. Refs: microsoft/terminal#1811 Refs: microsoft/terminal#410 Refs: #2308 PR-URL: #2381 Reviewed-By: Ben Noordhuis <[email protected]>
I wonder how this issue looks now? |
Not too bad -- the slowdown at the start was when I enabled the win+g gamebar recording... oisin@BEASTIEBOOK3_.2021-03-26.23-43-47.mp4 |
Also @oising let's leave this here: https://github.com/hzeller/timg/ |
* #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be useful as a column counter as we pursue NxM storage and presentation. * #3075 - The new iterators allow skipping forward by multiple units, which wasn't possible under `TextBuffer-/OutputCellIterator`. Additionally it also allows a bulk insertions. * #8787 and #410 - High probability this should be `pmr`-ified like `bitmap` for things like `chafa` and `cacafire` which are changing the run length frequently. * [x] Closes #8741 * [x] I work here. * [x] Tests added. * [x] Tests passed. - [x] Ran `cacafire` in `OpenConsole.exe` and it looked beautiful - [x] Ran new suite of `RunLengthEncodingTests.cpp` Co-authored-by: Michael Niksa <[email protected]>
## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This allows efficient compaction of repeated elements within the vector. ## References * #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be useful as a column counter as we pursue NxM storage and presentation. * #3075 - The new iterators allow skipping forward by multiple units, which wasn't possible under `TextBuffer-/OutputCellIterator`. Additionally it also allows a bulk insertions. * #8787 and #410 - High probability this should be `pmr`-ified like `bitmap` for things like `chafa` and `cacafire` which are changing the run length frequently. ## PR Checklist * [x] Closes #8741 * [x] I work here. * [x] Tests added. * [x] Tests passed. ## Validation Steps Performed * [x] Ran `cacafire` in `OpenConsole.exe` and it looked beautiful * [x] Ran new suite of `RunLengthEncodingTests.cpp` Co-authored-by: Michael Niksa <[email protected]>
## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This allows efficient compaction of repeated elements within the vector. ## References * #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be useful as a column counter as we pursue NxM storage and presentation. * #3075 - The new iterators allow skipping forward by multiple units, which wasn't possible under `TextBuffer-/OutputCellIterator`. Additionally it also allows a bulk insertions. * #8787 and #410 - High probability this should be `pmr`-ified like `bitmap` for things like `chafa` and `cacafire` which are changing the run length frequently. ## PR Checklist * [x] Closes #8741 * [x] I work here. * [x] Tests added. * [x] Tests passed. ## Validation Steps Performed * [x] Ran `cacafire` in `OpenConsole.exe` and it looked beautiful * [x] Ran new suite of `RunLengthEncodingTests.cpp` Co-authored-by: Michael Niksa <[email protected]>
## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This allows efficient compaction of repeated elements within the vector. ## References * #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be useful as a column counter as we pursue NxM storage and presentation. * #3075 - The new iterators allow skipping forward by multiple units, which wasn't possible under `TextBuffer-/OutputCellIterator`. Additionally it also allows a bulk insertions. * #8787 and #410 - High probability this should be `pmr`-ified like `bitmap` for things like `chafa` and `cacafire` which are changing the run length frequently. ## PR Checklist * [x] Closes #8741 * [x] I work here. * [x] Tests added. * [x] Tests passed. ## Validation Steps Performed * [x] Ran `cacafire` in `OpenConsole.exe` and it looked beautiful * [x] Ran new suite of `RunLengthEncodingTests.cpp` Co-authored-by: Michael Niksa <[email protected]>
commit 4b0eeef Author: Leonard Hecker <[email protected]> Date: Fri May 14 23:56:08 2021 +0200 Introduce til::rle - a run length encoded vector ## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This allows efficient compaction of repeated elements within the vector. ## References * #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be useful as a column counter as we pursue NxM storage and presentation. * #3075 - The new iterators allow skipping forward by multiple units, which wasn't possible under `TextBuffer-/OutputCellIterator`. Additionally it also allows a bulk insertions. * #8787 and #410 - High probability this should be `pmr`-ified like `bitmap` for things like `chafa` and `cacafire` which are changing the run length frequently. ## PR Checklist * [x] Closes #8741 * [x] I work here. * [x] Tests added. * [x] Tests passed. ## Validation Steps Performed * [x] Ran `cacafire` in `OpenConsole.exe` and it looked beautiful * [x] Ran new suite of `RunLengthEncodingTests.cpp` Co-authored-by: Michael Niksa <[email protected]>
## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This allows efficient compaction of repeated elements within the vector. ## References * #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be useful as a column counter as we pursue NxM storage and presentation. * #3075 - The new iterators allow skipping forward by multiple units, which wasn't possible under `TextBuffer-/OutputCellIterator`. Additionally it also allows a bulk insertions. * #8787 and #410 - High probability this should be `pmr`-ified like `bitmap` for things like `chafa` and `cacafire` which are changing the run length frequently. ## PR Checklist * [x] Closes #8741 * [x] I work here. * [x] Tests added. * [x] Tests passed. ## Validation Steps Performed * [x] Ran `cacafire` in `OpenConsole.exe` and it looked beautiful * [x] Ran new suite of `RunLengthEncodingTests.cpp` Co-authored-by: Michael Niksa <[email protected]>
@zadjii-msft Should we close this issue in favor of #10462? The rendering performance was fixed with AtlasEngine, now only the VT performance is subpar and #10462 seems more "specific". |
@lhecker, if you profiled this scenario with chafa and all that's left is the VT engine being the super hot path, I'm OK with that. Chafa's just a good test and we shouldn't forget about using it in 10462 then. |
Yea I dig it. Let's focus our efforts there for remaining improvements to make in the space. |
sudo apt install chafa
curl https://media.giphy.com/media/12UwsVgQCYL3H2/giphy.gif --output winanim.gif
chafa winanim.gif --font-ratio 1/3
Edit: On Ubuntu 18.04, follow directions here to get chafa sources and build 'em:
https://hpjansson.org/chafa/
Edit2: after doing a
./configure
andsudo apt install
loop as you realize stuff is missing, you'll get all the way through and it will whine about not being able to find the lib. Doldconfig
and it will shut up.Edit3:
The text was updated successfully, but these errors were encountered: