-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious spaces appear when printing some character from Unicode Private Use Area #15086
Comments
This is a well-known issue that is very, very difficult to resolve, because it requires undoing like 2 decades of code built on UCS2 assumptions. In other words, this happens, because your code points are surrogate pairs and this code base assumes that each UTF-16 character is at least 1 column wide. A surrogate pair can thus not be narrower than 2 columns. I'm actively working on this issue however. It's a duplicate of #3546. |
Thanks for the link. This explains the output of |
To be honest, I'm not 100% sure where the different behavior is coming from, and I don't think it's easy to determine. Your Windows 10 version uses a much much older version of the text processing code than Windows Terminal 1.16 and so there's a huge number of places that might be responsible for this. I've just tested your repro on Windows Terminal Preview (1.17) by the way and it appears it doesn't reproduce anymore: It doesn't matter whether I have AtlasEngine enabled or not. I'm pretty sure it was fixed by PR #14640, because it closes a suspiciously similar issue: #6162. Since #6162 is so similar I'll close this issue as a duplicate. /dup #6162 |
Hi! We've identified this issue as a duplicate of another one that already exists on this Issue Tracker. This specific instance is being closed in favor of tracking the concern over on the referenced thread. Thanks for your report! |
BTW I should add that you'll find many more similar issues around our Unicode support, because what I said previously unfortunately still applies. It's one of my top priorities to address this. If you find any other Unicode issues, please do feel free to file more issues on us however! |
As I mentioned above, this does not depend on font. I didn't mention Atlas Engine but the answer is the same: it does not depend on it.
That's great to hear, and it makes a lot more sense than "this code base assumes that each UTF-16 character is at least 1 column wide", which contradicted my observations. |
You know, this is pretty close to the truth today. Up until Windows Terminal 1.17, the text buffer assumed that each UTF-16 code unit¹ was at least one column wide. Beyond 1.17, the text buffer assumes that each UTF-16 code point is at least one column wide. That is, we don't support zero-width characters or grapheme clusters composed of multiple code points. ¹ This is, of course, where "surrogate pairs require at least two columns" comes from. 🙂 |
Good observation! Now, for the real secret. The rendering engine in 1.16 hasn't been informed about which columns to put which characters in, so it renders everything of the same color in a single run that gets compressed down to the advance width of every glyph included in that run. If you add another color, it suddenly snaps that new run of text to the correct position: this results in a couple of fun things: an emoji composed of a number of joiners takes up 5, 7, or 9 columns a line that contains mis-measured characters wraps at the wrong width (this has another bug in it, from some 100-codeunit buffer we have also gotten rid of recently; plus, I realize that I broke the |
Windows Terminal version
1.16.10261.0
Windows build number
10.0.19045.0
Other Software
WSL
Steps to reproduce
printf '\UF0737\033[41mx\033[0m\n'
Expected Behavior
The output of the command should occupy two columns. The content of the first column is unspecified (it depends on your font). The second column should contain
x
.Actual Behavior
The output occupies 3 columns: there is an extra space in the middle.
It may appear that the space is a part of the first character. This, however, is not the case, as can be demonstrated by running
printf '\UF0737x\033[41my\033[0m\n'
.Not all characters from Unicode Private Use Area exhibit this issue. For example,
printf '\UE617\033[41mx\033[0m\n'
works as intended.The text was updated successfully, but these errors were encountered: