-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce AtlasEngine - A new text rendering prototype #11623
Conversation
bc106c5
to
8c4c708
Compare
8c4c708
to
602714c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well I'm at 29/56 but obviously haven't reviewed the scarygood stuff yet
@@ -74,5 +74,6 @@ HRESULT RenderEngineBase::PrepareLineTransform(const LineRendition /*lineRenditi | |||
// - Blocks until the engine is able to render without blocking. | |||
void RenderEngineBase::WaitUntilCanRender() noexcept | |||
{ | |||
// do nothing by default | |||
// Throttle the render loop a bit by default. | |||
Sleep(8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
‽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Sleep(8)
was removed from RenderThread
and moved here, allowing AtlasEngine
to run at more than 60 FPS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
‽
Look at that beautiful use of an interrobang.
src/renderer/dx/DxRenderer.hpp
Outdated
@@ -40,98 +40,61 @@ namespace Microsoft::Console::Render | |||
{ | |||
public: | |||
DxEngine(); | |||
~DxEngine(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a lot of churn for a renderer that isn't changing in this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the changes to this file include...
- rearrange the functions
- remove the
<ctor> = default
stuff - add
const noexcept
toToggleShaderEffects
- clean up the
#includes
at the top - remove
_EnableDisplayAccess
- remove the
const
after pointers (i.e.const XXX* const
-->const XXX*
)
Honestly, this seems mostly fine to me, just a bit jarring. Mainly concerned about the last two removals.
@@ -14,12 +14,16 @@ Author(s): | |||
|
|||
#pragma once | |||
|
|||
#include <d2d1.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw this is some unusual layering -- giving the general interface knowledge of D2D primitives seems like the wrong way to go given that 4/6 of the engines aren't D2D-based
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good! Because I also don't like having D2D stuff in my API. The solution here would be to have our own enum instead. I think that would be a good fit for a followup PR, since it's a minor change.
Splitting this interface into one for 4/6 engines and one for 2/6 engines, would not just unnecessarily increase the binary size further, but also unnecessarily Javarize the code base if I'm honest. I think we are better served by both reducing the number of interfaces as well as the number of functions on those interfaces.
There's technically no reason for the IRenderEngine to have that many functions if we:
- use inversion of control and write "config providers" instead of having setters for everything
- use fat data transfer objects instead of finely granular setters
- we only render at ~240 FPS at most. 240 COM calls per second are nothing to the CPU, so if we use both of the previous points, we can just ask the IRenderData provider 240 per second for the complete set of configuration anew and performance would be better than it is now (due to lack of CFG checks on each virtual call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember discussing with @lhecker about something like this, which eventually narrowed down to a single question: should we pretend that we don't know what exact engines are
there in the codebase. Code-wise, we all know what the "perfect" solution might look like ("Javazire" perhaps?), but if that's causing too much performance penalty than it should, I'm OK with a more practical solution, even though it may seem a bit unorthodox.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been sorely tempted to do this in the past as well (add the d2d headers... its sort of where the initial til::point/til::coord/til::rectangle came from was avoiding that.)
We've proven @lhecker is super correct about us needing to reduce the number of calls and make big data transfers in a future work item.
// be abstracted away and integrated into the above or simply get removed. | ||
|
||
// DxRenderer - getter | ||
virtual HRESULT Enable() noexcept { return S_OK; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is our first foray into having a base implementation in this renderer; we should observe the cost it has on conhost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha! I knew this question would come up. The answer is: none. At least not one we will care about for now.
The reason for this is that abstract classes aren't actually compiled into binaries. Only vtables are and those only exist on concrete classes. Since the number of concrete classes hasn't changed the size increase is limited to virtual_functions * concrete_classes * sizeof(void*)
which is 168 byte for inbox conhost on x64. If I had put these functions into a new, special purpose interface just for AtlasEngine and DxEngine, we'd only save 84 byte. If we merge IRenderEngine and BaseEngine together however we already save like 2kB alone.
src/inc/til/pair.h
Outdated
// pair is a simple clone of std::pair, with one difference: | ||
// copy and move constructors and operators are explicitly defaulted. | ||
// This allows pair to be std::is_trivially_copyable, if both T and S are. | ||
// --> pair can be used with memcpy(), unlike std::pair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a followup to deduplicate til::rle_pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I think it's fine if their live their independent lives...
til::pair
has first/second members and til::rle_pair
calls them length/value, which is a lot more descriptive.
src/renderer/atlas/AtlasEngine.cpp
Outdated
{ | ||
_recreateSizeDependentResources(); | ||
} | ||
if (WI_IsFlagSet(_invalidations, InvalidationFlags::font)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like James' enumset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After merging my enumset PR, I'll gladly use it.
Until then I'd like to refrain from depending on std::bitset
.
src/renderer/atlas/AtlasEngine.cpp
Outdated
static constexpr | ||
#endif | ||
// Why D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS: | ||
// This flag prevents the driver from creating a large thread pool for things like shader computations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this flag help DxEngine as well?
src/renderer/atlas/AtlasEngine.cpp
Outdated
{ | ||
deviceFlags |= D3D11_CREATE_DEVICE_DEBUG; | ||
|
||
const auto DXGIGetDebugInterface = reinterpret_cast<HRESULT(WINAPI*)(REFIID, void**)>(GetProcAddress(module.get(), "DXGIGetDebugInterface")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is all of this stuff publicly documented? Is there a header you could use to get the signature for DXGIGetDebugxxx
, so we can simplify this and use the wil
module helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use D3D11 and not D3D12?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me turn this question around: Why use D3D12 and not D3D11? Newer software isn't faster than old software.
But hardware is and D3D12 allows you to better access these newer capabilities.
However this engine is so simple that the core rendering loop only needs 3 graphics commands per frame. There's not much that the newer API could help with performance here.
But most importantly, this engine is supposed to work on Windows 7 which only supports D3D10 (accessible through D3D11), because it is used in Visual Studio 2019 (and that one supports Windows 7 until 2029 - not kidding). As such D3D12 could only ever be added as an alternative engine and not be the only implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I just found out that there's a "D3D12onWin7" document!
But it's only for x64 and doesn't support WARP, which we need because we have to support DirectX 9 hardware. 😔
One day I'll probably check out D3D12 anyways, because it's interesting. 😄
I suspect that it won't improve performance, because - as far as I understand it - D3D12 helps parallelize things, whereas this engine doesn't have enough complexity to benefit from this. But maybe I'm wrong and in that case it'd be fun having a cool D3D12 engine for modern hardware.
src/renderer/atlas/AtlasEngine.cpp
Outdated
if (SUCCEEDED(DXGIGetDebugInterface(IID_PPV_ARGS(infoQueue.addressof())))) | ||
{ | ||
// I didn't want to link with dxguid.lib just for getting DXGI_DEBUG_ALL. This GUID is publicly documented. | ||
static constexpr GUID dxgiDebguAll = { 0xe48ae283, 0xda80, 0x490b, { 0x87, 0xe6, 0x43, 0xe9, 0xa9, 0xcf, 0xda, 0x8 } }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debgu.
i'd encourage not excluding the entire atlas renderer from the spell checker ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The atlas engine introduces about 20 new words to the spell checker.
Personally I expect that I'm going to rewrite large chunks of the renderer in the future, potentially multiple times, until I find a good solution that fits all requirements.
That's why I've decided to put spell checking on hold for now. I'd like to enable spell checking when I feel that the implementation is good enough. Until then this saves me quite a headache since I can't run the spell check locally.
Bad news! It doesn't work on ARM64. |
I'm curious what those proposals are, if you've got a good idea for what they will be. Is the fact that it doesn't really improve WT throughput mostly because of conpty limiting frames to 60FPS / painting a frame every time the screen scrolls? Or is there some other reason? this kind of chart: #1064 (comment) might be helpful for illustrative purposes |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Regarding to @zadjii-msft's question about throughput in WT: besides the reason you mentioned, the "Text Buffer Rewrite of 2021" epic (deliberately no link here to reduce the noise) is a huge deal when it comes to pure throughput. I'm pretty sure @lhecker has a lot to say here. If this PR is to be merged later, I think it definitely deserves its own follow-up issue to track future works. We don't want to touch those old issues any more, right? Because of...obvious reasons. |
@@ -923,6 +923,10 @@ | |||
<value>Controls what happens when the application emits a BEL character.</value> | |||
<comment>A description for what the "bell style" setting does. Presented near "Profile_BellStyle".{Locked="BEL"}</comment> | |||
</data> | |||
<data name="Profile_UseAtlasEngine.Header" xml:space="preserve"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting this to be an option in "Rendering" section, where you can choose to use "Software rendering". Like a dropdown menu to choose between DX, Atlas and DX-software. But this is just my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meh, I kinda like this being per-profile. For example, WSL profiles with this + passthrough (#11264) would probably have the highest possible throughput, but powershell has a super hard time with passthrough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way as it simplifies comparing the DxEngine and AtlasEngine with each other.
You can have two profiles for the same shell using different engines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also expecting it to be in the other location... but I agree with the utility of setting it per profile.
} | ||
|
||
// Due to the current IRenderEngine interface (that wasn't refactored yet) we need to assemble | ||
// the current buffer line first as the remaining function operates on whole lines of text. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zadjii-msft An example of how text buffer implementation bottlenecks the throughput.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this really needs a better way of bulk transferring this information.
src/renderer/atlas/AtlasEngine.cpp
Outdated
// | ||
// # What do we want? | ||
// | ||
// Segment a line of text (_api.bufferLine) into unicode "clusters". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a small library https://github.com/contour-terminal/libunicode here, which is used in https://github.com/contour-terminal/contour to do similar thing. I think there might be other OSS libraries that can also be helpful.
By know the clusters segmentation before generating glyphs, you can safely avoid "call IDWriteFontFallback::MapCharacters first". Same idea as in #10858
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last I recall, @christianparpart hadn't yet finished up on that lib though I do agree that WT could very much benefit from it, once he's ready to declare it suitable for others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i can help on that. I am in commute but can carefully read and reply tonight. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Hello!
So libunicode I specifically developed to serve the Unicode needs in my terminal emulator. I could have used ICU, but I really did not feel comfortable with it nor did the API convince me to be suitable for my usecase.
So all in all, I'd love to get libunicode evolve into a general purpose unicode library, in fact it is, but I'd like it to be more complete (no BiDi algo implemented yet for example).
When i was learning about how to properly render all kinds of emoji and other complex unicode sequences including having support for programming ligatures, I did read up on "Blink's text stack" as well as Blink's source code. You may see some API similarity when you look at libunicode
, such as emoji segmentation and script segmentation is inspired by Blink's API.
Since the whole subject was so god damn complicated (at least if you start from scratch with like-zero prior knowledge) I decided to write it all up such that I can read it myself whenever I need to get into the subject again.
I can highly recommend you having a look at it: A Look into a terminal emulator's text stack. (so I do not need to spam this thread with details or unnecessarily duplicate text in here).
If there are any questions (or corrections!), I'm all open and in. :-)
p.s.: wrt libunicode, I'd be happy if it serves a purpose outside contour. Licensing should not be an issue. Maybe this lib will also help others :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsubscribing this issue due to my own personal reason. Huge thanks to @christianparpart for jumping in and offering valuable information on this specific topic.
src/renderer/atlas/AtlasEngine.cpp
Outdated
// ## The unfortunate preface | ||
// | ||
// DirectWrite seems "reluctant" to segment text into clusters and I found no API which offers simply that. | ||
// What it offers are a large number of low level APIs that can sort of be used in combination to do this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly what I have implemented in libunicode. As I use harfbuzz for text shaping, harfbuzz also does not provide this functionality.
src/renderer/atlas/AtlasEngine.cpp
Outdated
// | ||
// DirectWrite seems "reluctant" to segment text into clusters and I found no API which offers simply that. | ||
// What it offers are a large number of low level APIs that can sort of be used in combination to do this. | ||
// The resulting text parsing is very slow unfortunately, consuming up to 95% of rendering time in extreme cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text segmentation isn't too expensive. Text shaping indeed the hell is. But since you can also segment by word boundaries, it should be easy to use a "word" as cache key for the text shaping result and the resulting glyph indices in that result again can be used as cache key into the texture atlas. I know there are some terminal emulators out there trying hard not to implement complex text shaping but in my personal experience (and with the caching explained also in the document above), I do not see any reason not to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently I'm using "clusters" the way harfbuzz defines them:
In text shaping, a cluster is a sequence of characters that needs to be treated as a single, indivisible unit. A single letter or symbol can be a cluster of its own. Other clusters correspond to longer subsequences of the input code points — such as a ligature or conjunct form — and require the shaper to ensure that the cluster is not broken during the shaping process.
I wrote this DirectWrite code to break the text into such clusters and use the characters in each cluster as the key for a hash map that holds texture atlas indices. I tried using glyph indices as the hash key first, but if you do that you need to pair the indices with the font face they come from (since each font has it's own list of glyph indices right?). But how do you hash a IDWriteFontFace
? I didn't find a good answer to this question so I'm just hashing the characters of clusters for now instead of the glyphs they result in.
The reason I wrote it this way (segmentation into clusters) is because it seemed to me the most straightforward choice of building a Unicode glyph atlas.
Does caching the shaping result of words bring a worthwhile performance advantage, despite the overhead of the cache itself?
I'm asking as the biggest CPU consumption occurs during MapCharacters
in this renderer, or in other words during font fallback. This contributes about 85% CPU usage when drawing a viewport full of Hindi characters. Both IDWriteTextAnalyzer1::GetTextComplexity
as well as IDWriteTextAnalyzer::GetGlyphs
which do the cluster segmentation for me are rather fast fortunately (~10% CPU usage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In text shaping, a cluster is a sequence of characters that needs to be treated as a single, indivisible unit
...
I wrote this DirectWrite code to break the text into such clusters
Exactly. And especially with regards to what harfbuzz says, that is a grapheme cluster (user perceived character composed of one or more codepoints). This segmentation step should NOT be done by the renderer but instead on the VT parser side already. At least that is how I do it, and I highly recommend doing so because this information is also required for correct cursor placement. The link I posted above goes in more detail on that matter. And apart from that, I started drafting a Unicode Core spec for terminals that you can read up here. I'd love you (all) to have a look at it. It clearly defines the semantics on that matter and I wanted to ping the WT team on that spec anyways in order to find a common concensus in the TE land (at least a little) such that toolkits/apps can rely on this new extended behaviour. Feedback on this we do not need to spam here, just use that repo's discussion area or there where also jerch and j4james helped me already on getting this started.
I tried using glyph indices as the hash key first, but if you do that you need to pair the indices with the font face they come from
Yup. I'm doing that here and there - mind, I'm not using the most efficient way for cache keys here either. I'll very soon move to strong cache keys so that I can avoid that double-mapping and to also have simply integer values as keys.
Does caching the shaping result of words bring a worthwhile performance advantage, despite the overhead of the cache itself?
I know that my code was definitely slow before I started to implement caching. So I started naive, too. Then moved to unordered_map for caching, which is naive and dangerous, and now it's an LRU-cache, much better in terms of resource usage. the key should (as mentioned) by en integer to further improve lookup performance and avoid an extra strcmp inside the hash map impl.
Text shaping is definitely expensive! I chose caching words (delimited by spaces but also by common SGR) because the text shaper cannot deal with mixed font faces in one shape call as I am passing in one font face per shape call (either regular or italic/bold/...), and then walk down the font-fallback chain until a shape call could fully shape the whole word. This result ends up in the cache, so that I do not need to rerun the whole font-fallback queue again and again ... .
I'm asking as the biggest CPU consumption occurs during MapCharacters in this renderer, or in other words during font fallback. This contributes about 85% CPU usage when drawing a viewport full of Hindi characters.
That is exactly the part that is hidden behind the cache. I shape only if I don't have it in cache, and the shape() function does all the fallback logic.
Ok, sorry, all those hidden links point to master instead of explicit commit hash. They may not work for years. ;-( (I may fix that later?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This segmentation step should NOT be done by the renderer but instead on the VT parser side already. At least that is how I do it, and I highly recommend doing so because this information is also required for correct cursor placement.
This worries me. (It's possible I'm misunderstanding -- I am jumping between threads!) I know that ideally we'd be doing segmentation earlier, but I don't think we can do it that early². Cursor positioning with regards to the measurable size of a set of codepoints specifically cannot depend on the font in the terminal, right? Apart from any changes resulting from future Unicode spec docs¹, there's no easy way for a client over e.g. ssh to be able to do cursor positioning (or glyph positioning, even) accounting for the font.
¹ This is really cool, by the way! @reli-msft over here is working on one as well. Renzhi, please make sure you're looped in on Christian's spec proposal.
² Under the assumption that the font would be required for segmentation. If it's not, carry on, we should almost certainly have a better text buffer.
³ (Thanks for jumping in to discuss this with us!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I believe my misapprehension was that we would need to use the segmented stream as the backing store. "Doing segmentation" while data streams in doesn't mean that we need to impose any requirement on the font that a client application would be able to observe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey; so segmentation is done in multiple stages.
- grapheme cluster (GC) segmentation can be done on the VT backend side. (font independant). this is the required here because with that you can reliably associate characters (GC) to grid cells.
- glyph run segmentation is done on the renderer side. so after the characters (grapheme clusters) have been aligned into the grid already. this does therefore not even impact VT backend performance and is a step that can be easily cached (tm). (also font independant). This stage of segmentation is required to properly feed the text shaping engine (harfbuzz, DirectWrite, CoreText), see my text-stack-link I posted above :-)
Apart from any changes resulting from future Unicode spec docs¹, there's no easy way for a client over e.g. ssh to be able to do cursor positioning (or glyph positioning, even) accounting for the font.
Luckily this should never depend on font and currently also does not. All the algorithms (I used) are font file independent.
¹ This is really cool, by the way! @reli-msft over here is working on one as well. Renzhi, please make sure you're looped in on Christian's spec proposal.
Yeah would be nice if we can converge here on something. In my TE I do not yet implement the VT sequence proposed in that spec, but the semantics are on by default in my case anyways. I'll be adding this mode soon though - unless you have strong objections against such a thing, then we need to rethink, but I'm pretty certain that is the right way to go. Mind: BiDi is explicitly not addressed. That's another big undefined thing in the TE-land that could be specced at a later point. :)
This comment has been minimized.
This comment has been minimized.
const auto& features = _fontRenderData->DefaultFontFeatures(); | ||
#pragma warning(suppress : 26492) // Don't use const_cast to cast away const or volatile (type.3). | ||
DWRITE_TYPOGRAPHIC_FEATURES typographicFeatures = { const_cast<DWRITE_FONT_FEATURE*>(features.data()), gsl::narrow<uint32_t>(features.size()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this code is passing right now, but this fails the AuditMode for me locally so I was forced to change this code to get this to pass. &featureList[0]
isn't valid code for null pointers so that got fixed as well (dereferencing a null pointer to a struct is UB IIRC).
Bonus effect: The std::vector
for features
isn't copied on every call anymore.
Negative effect: Const-correctness for the DWrite headers really isn't good. DWRITE_TYPOGRAPHIC_FEATURES
wants its argument to be a mutable pointer. Had to use const_cast
to circumvent that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh well. if it works it works. I'm not that sad about casting away const for APIs that didn't define it.
FontInfoBase and it's descendents are missing noexcept annotations, which virally forces other code to not be noexcept as well during AuditMode checks. Apart from adding noexcept, this commit also * Passes std::wstring_view by reference. * Pass the FillLegacyNameBuffer argument as a simple pointer-to-array, allowing us to fill the buffer with a single memcpy. (gsl::span's iterators inhibit any internal STL optimizations.) * Move operator== declarations inside the class to reduce code size. All other changes are an effect of the virality of noexcept. This is an offshoot from #11623. ## Validation Steps Performed * It still compiles ✔️
// expected read is either the size of the buffer or the number of characters remaining, whichever is smaller. | ||
DWORD const dwReadExpected = (DWORD)std::min(cbBuffer, cchExpectedText - cchRead); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case you wonder: I can't compile these tests locally otherwise, as the v143 toolchain introduced a few new warnings at /W4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you have to #pragma suppress
these? Or... do we just not need dwReadExpected
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warnings were due to these variables never being used.
// Additionally IDWriteTextAnalyzer::GetGlyphs requires an instance of DWRITE_SCRIPT_ANALYSIS, | ||
// which can only be attained by running IDWriteTextAnalyzer::AnalyzeScript first. | ||
// | ||
// Font fallback with IDWriteFontFallback::MapCharacters is very slow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nnnn ok. Now I understand why you want that other segmenting/shaping.
// > IDXGISwapChain::Present: Partial Presentation (using a dirty rects or scroll) is not supported | ||
// > for SwapChains created with DXGI_SWAP_EFFECT_DISCARD or DXGI_SWAP_EFFECT_FLIP_DISCARD. | ||
// ---> No need to call IDXGISwapChain1::Present1. | ||
// TODO: Would IDXGISwapChain1::Present1 and its dirty rects have benefits for remote desktop? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid maybe. Do note that the most recent versions of Remote Desktop have an automatic identification of areas of high motion on the screen and can effectively MPEG-video-streamify what's happening there to the remote client as a subsection of the display. I've successfully played Guild Wars 2 through an RDP session and watched YouTube videos through one at reasonable resolution (720p+). I bet we don't have to worry about this as much as we think we do (or did in the past).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RDP maintainers confirmed poggers.
|
||
AtlasEngine::f32x4 AtlasEngine::_getGammaRatios(float gamma) noexcept | ||
{ | ||
static constexpr f32x4 gammaIncorrectTargetRatios[13]{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there's also gamma-correct correction, uhm, somewhere iykwim.
Now you don't start assuming that I know what I'm doing here, alright? I think - and that's a huge, the way Donald calls stuff huge - stretch of my saying "I think"... I think gamma incorrect correction applies to display-native gamma surfaces and gamma correct correction (lol) applies to sRGB surfaces. gammaIncorrectTargetRatios
was """definitely""" the table to be used for D3D non-sRGB surfaces. I think. Maybe, but probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I thought you were making a joke on the opposite of "correct" as you explained to me the inverse correction required over chat the other day.
// (avoiding either a batch flush or the system maintaining multiple copies of the resource behind the scenes). | ||
// | ||
// Since our shader only draws whatever is in the atlas, and since we don't replace glyph tiles that are in use, | ||
// we can safely (?) tell the GPU that we don't overwrite parts of our atlas that are in use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you know how people need to print "don't eat the plastic" on food because people would choke on it?
Well the D3D folks didn't do that. I'm choking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to hold this back. Overall, this is fantastic and I'm so excited. It's exactly what I hoped for and more. There's still plenty to do, but it's definitely more with-it than DxEngine was when it launched, so I think it's fair to roll with it and improve from here.
Thank you so much for all of this, Leonard!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still have AtlasEngine.*.cpp to review, but I don't think I'll get through them today. So figured I'd post a few comments for now.
@@ -0,0 +1,759 @@ | |||
// Copyright (c) Microsoft Corporation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header file is huge. Would there be any value/detriment to breaking it up into a few separate header files in src/renderer/atlas
? I get that they're all a part of the atlas engine, but it's a lot.
That might also help provide descriptions for each one and make this a bit more clear (as abstracts)? It's just pretty overwhelming and for somebody like me who doesn't know much about the renderer, it's pretty easy to get lost and lose context by the time I get to line 200/759 of this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitely be helpful to break it up. I intend to do so in the future when I'm more certain about what's needed and what isn't. The other AtlasEngine is just as messy as the header file and needs a similar treatment.
It would be difficult for me to clean it up at this point already, because it's highly likely that I'll rewrite at least half of the code in the header file (anything related to the glyph cache basically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
52/56, obviously deferred the hard ones for ROUND II
@@ -1873,36 +1865,47 @@ namespace winrt::Microsoft::Terminal::Control::implementation | |||
// The family is only used to determine if the font is truetype or | |||
// not, but DX doesn't use that info at all. | |||
// The Codepage is additionally not actually used by the DX engine at all. | |||
FontInfo actualFont = { fontFace, 0, fontWeight.Weight, { 0, gsl::narrow_cast<short>(fontHeight) }, CP_UTF8, false }; | |||
FontInfo actualFont = { fontFace, 0, fontWeight.Weight, { 0, gsl::narrow_cast<short>(fontSize) }, CP_UTF8, false }; | |||
FontInfoDesired desiredFont = { actualFont }; | |||
|
|||
// Create a DX engine and initialize it with our font and DPI. We'll | |||
// then use it to measure how much space the requested rows and columns | |||
// will take up. | |||
// TODO: MSFT:21254947 - use a static function to do this instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm moving this MSFT backlog item out to GitHub -- we really should do this.
<HeaderFileOutput>$(OutDir)$(ProjectName)\%(Filename).h</HeaderFileOutput> | ||
<TreatWarningAsError>true</TreatWarningAsError> | ||
<AdditionalOptions>/Zpc %(AdditionalOptions)</AdditionalOptions> | ||
<AdditionalOptions Condition="'$(Configuration)'=='Release'">/O3 /Qstrip_debug /Qstrip_reflect %(AdditionalOptions)</AdditionalOptions> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
magic compiler flags -- cause for concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're not that magic at all! For some weird reason neither of these 3 parameters are available as msbuild options.
But they're what we at Microsoft use ourselves in our own sample code. If you search for these inside this GitHub org you'll find dozens if not hundreds of results. 🙂
/O3
is for future compat and enabled all available optimizations and /Qstrip_debug/reflect
remove debug/reflection metadata from the bytecode, reducing it's size a lot.
<ItemDefinitionGroup> | ||
<ClCompile> | ||
<PrecompiledHeaderFile>pch.h</PrecompiledHeaderFile> | ||
<AdditionalIncludeDirectories>$(OutDir)$(ProjectName)\;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this is only necessary for projects that produce header files as part of the build. does Atlas have that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the shaders are compiled into header files at $(OutDir)$(ProjectName)\%(Filename).h
.
[[nodiscard]] HRESULT GetFontSize(_Out_ COORD* const pFontSize) noexcept override; | ||
[[nodiscard]] HRESULT IsGlyphWideByFont(const std::wstring_view glyph, _Out_ bool* const pResult) noexcept override; | ||
[[nodiscard]] HRESULT GetFontSize(_Out_ COORD* pFontSize) noexcept override; | ||
[[nodiscard]] HRESULT IsGlyphWideByFont(std::wstring_view glyph, _Out_ bool* pResult) noexcept override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these signature changes either (1) do not matter for Bgfx/Wddmcon, or (2) will fail the windows build because we did not update Bgfx/Wddm 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A while ago I hacked our project files to build the WDDM one and it worked.
If you're just asking about const-ness of these declarations then those don't matter: #11623 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Header: reviewed.
|
||
bool is_inline() const noexcept | ||
{ | ||
return (__builtin_bit_cast(uintptr_t, allocated) & 1) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't totally understand. Since this is a union, how do we ensure that the inline value (which shares storage with allocated
) doesn't have this bit set in this position?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I had to think about how this works for a while. This will always hold a T, either internal or allocated, but sometimes that T will have been allocated with extra space at the end.
The T inside this union needs to be complicit - it has to have a member at the end that is effectively flexible but also has the appropriate amount of padding (if it wants).
I still don't understand how we ensure that the inlined value doesn't have this bit set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SmallObjectOptimizer
is pretty much an implementation detail of AtlasKey
and AtlasValue
and the code doesn't make any sense whatsoever outside of that context.
The reason this works is because both AtlasKeyData
as well as AtlasValueData
coincidentially don't use the first bit for anything except for flagging whether inlined storage is used (for instance CellFlags::Inlined
).
This code doesn't have to be super duper good, because I'm planning to make it a lot more robust and simpler with my custom hashmap implementation, which is required in order to get the LRU behavior we need. It will simultaneously allow us write simpler code here, as we can choose ourselves how to manage our linking behavior. For instance we don't need to allocate AtlasValue
on the heap, if we simply use the linked list of the LRU itself to chain multiple atlas positions together.
In either case I expect this code to be rewritten and only the idea to remain, but in an improved form.
} | ||
|
||
T* initialize(size_t byteSize) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's UB to not placement-new a T
into the returned space, even if T is_trivially_copyable
and has_unique_object_representation
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it totally is! I can't even use malloc()
here, because malloc()
is UB in C++ in general. P0593R6 should deal with this. Until then we should just assume that the compiler will be reasonable and treat neither malloc()
nor operator new()
as UB....
// * Minimum alignment is 4 bytes (like `#pragma pack 4`) | ||
// * Members cannot straddle 16 byte boundaries | ||
// This means a structure like {u32; u32; u32; u32x2} would require | ||
// padding so that it is {u32; u32; u32; <4 byte padding>; u32x2}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can the job here be done with alignas
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is! The way this code is written, as long as others adhere to this style, it'll produce compilation errors if you violate this warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LET'S DO THIS.
Hello @DHowett! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
## Summary of the Pull Request Currently, the TermControl and ControlCore recieve a settings object that implements `IControlSettings`. They use for this for both reading the settings they should use, and also storing some runtime overrides to those settings (namely, `Opacity`). The object they recieve currently is a `T.S.M.TerminalSettings` object, as well as another `TerminalSettings` object if the user wants to have an `unfocusedAppearance`. All these are all hosted in the same process, so everything is fine and dandy. With the upcoming move to having the Terminal split into multiple processes, this will no longer work. If the `ControlCore` in the Content Process is given a pointer to a `TerminalSettings` in a certain Window Process, and that control is subsequently moved to another window, then there's no guarantee that the original `TerminalSettings` object continues to exist. In this scenario, when window 1 is closed, now the Core is unable to read any settings, because the process that owned that object no longer exists. The solution to this issue is to have the `ControlCore`'s own their own copy of the settings they were created with. that way, they can be confident those settings will always exist. Enter `ControlSettings`, a dumb struct for just storing all the contents of the Settings. I used x-macros for this, so that we don't need to copy-paste into this file every time we add a setting. Changing this has all sorts of other fallout effects: * Previewing a scheme/anything is a tad bit more annoying. Before, we could just sneak the previewed scheme into a `TerminalSettings` that lived between the settings we created the control with, and the settings they were actually using, and it would _just work_. Even explaining that here, it sounds like magic, because it was. However, now, the TermControl can't use a layered `TerminalSettings` for the settings anymore. Now we need to actually read out the current color table, and set the whole scheme when we change it. So now there's also a `Microsoft.Terminal.Core.Scheme` _struct_ for holding that data. - Why a `struct`? Because that will go across the process boundary as a blob, rather than as a pointer to an object in the other process. That way we can transit the whole struct from window to core safely. * A TermControl doesn't have a `IControlSettings` at all anymore - it initalizes itself via the settings in the `Core`. This will be useful for tear-out, when we need to have the `TermControl` initialize itself from just a `ControlCore`, without being able to rebuild the settings from scratch. * The `TabTests` that were written under the assumption that the Control had a layered `TerminalSettings` obviously broke, as they were designed to. They've been modified to reflect the new reality. * When we initialize the Control, we give it the settings and the `UnfocusedAppearance` all at once. If we don't give it an `unfocusedAppearance`, it will just use the focused appearance as the unfocused appearance. * The Control no longer can _write_ settings to the `ControlSettings`. We don't want to be storing things in there. Pretty much everything we set in the control, we store somewhere other than in the settings object itself. However, `opacity` and `useAcrylic`, we need to store in a handy new `RUNTIME_SETTING` property. We can write those runtime overrides to those properties. * We no longer store the color scheme for a pane in the persisted state. I'm tracking that in #9800. I don't think it's too hard to add back, but I wanted this in front of eyes sooner than later. ## References * #1256 * #5000 * #9794 has the scheme previewing in it. * #9818 is WAY more possible now. ## PR Checklist * [x] Surprisingly there wasn't ever a card or issue for this one. This was only ever a bullet point in #5000. * A bunch of these issues were fixed along the way, though I never intended to fix them: * [x] Closes #11571 * [x] Closes #11586 * [x] Closes #7219 * [x] Closes #11067 * [x] I think #11623 actually ended up resolving this one, but I'm double tapping on it here: Closes #5703 * [x] I work here * [x] Tests added/passed * [n/a] Requires documentation to be updated ## Detailed Description of the Pull Request / Additional comments Along the way I tried to clean up code where possible, but not too agressively. I didn't end up converting the various `MockTerminalSettings` classes used in tests to the x macros quite yet. I wanted to merge this with #11416 in `main` before I went too crazy. ## Validation Steps Performed * [x] Scheme previewing works * [x] Adjusting the font size works * [x] focused/unfocused appearances still work * [x] mouse-wheeling opacity still works * [x] acrylic & cleartype still does the right thing * [x] saving the settings still works * [x] going wild on sliding the opacity slider in the settings doesn't crash the terminal * [x] toggling retro effects with a keybinding still works * [x] toggling retro effects with the command palette works * [x] The matrix of (`useAcrylic(true,false)`)x(`opacity(50,100)`)x(`antialiasingMode(cleartype, grayscale)`) works as expected. Slightly changed, falls back to grayscale more often, but looks more right.
This commit introduces "AtlasEngine", a new text renderer based on DxEngine.
But unlike it, DirectWrite and Direct2D are only used to rasterize glyphs.
Blending and placing these glyphs into the target view is being done using
Direct3D and a simple HLSL shader. Since this new renderer more aggressively
assumes that the text is monospace, it simplifies the implementation:
The viewport is divided into cells, and its data is stored as a simple matrix.
Modifications to this matrix involve only simple pointer arithmetic and is easy
to understand. But just like with DxEngine however, DirectWrite
related code remains extremely complex and hard to understand.
Supported features:
Unsupported features:
The backing texture atlas for glyphs is grow-only and will not shrink.
After 256MB of memory is used up (~20k glyphs) text output
will be broken until the renderer is restarted.
Performance:
Unfortunately the frame rate often drops below refresh rate, due us
fighting over the buffer lock with other parts of the application.
AtlasEngine is still highly unoptimized. Glyph hashing
consumes up to a third of the current CPU time.
VT parsing and related buffer management takes up most of the CPU time (~85%),
due to which the AtlasEngine can't show any further improvements.
compared to DxEngine running at 144 FPS
compared to DxEngine running at 144 FPS