-
Notifications
You must be signed in to change notification settings - Fork 8.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Font / Emoji rendering (spacing issue) #16852
Comments
Hi I'm an AI powered bot that finds similar issues based off the issue title. Please view the issues below to see if they solve your problem, and if the issue describes your problem please consider closing this one and thumbs upping the other issue to help us prioritize it. Thank you! Closed similar issues:
|
I checked the mentioned issues and I believe this new issue is not a duplicate of one of the mentioned. |
In this case--and for all pre-emoji-standardization iconographic codepoints--a one-cell overlap is correct. This is the same treatment given by iTerm2 and Terminal.app on macOS (one of which I believe was the first terminal emulator to support emoji(?).) Those characters are expected to occupy a single column in the backing buffer, have a standard emoji representation, and be displayed as though they occupy two columns. This is one of the weird quirks of being correct. |
@DHowett We'll be facing the opposite issue with MouseText in Unicode 16.0.
The original characters are the following: I have a single column version of the glyph ready, but it is useless as long as it's handled as double-width. |
Sorry, I was writing my comment up on my phone and didn't put in the right amount of explanation! 😄 The critical difference here is that a terminal emulator (of any type) is expected to act in a consistent way for another application to interface with. A text editor does not have the same requirement, as the only correctness loop exists between the user and the editor. So, terminals receive text from other applications and display that text on a screen. Those other applications can be microseconds away (on the same machine) or thousands of miles away (running on a remote server, connected over SSH). That immediately imposes a couple constraints on how that text gets displayed:
Now, an application can instruct a terminal to position the cursor somewhere with absolute or relative coordinates. The same application can display the "right" amount of text to fill up the screen. Consider this example of an application called "Midnight Commander". It has to be able to predict where every bit of text will be placed on the screen, or it will put the pseudographic characters (for the borders and stuff) in the wrong places. Because of (2), that prediction has to be the same prediction the terminal emulator would have made. And because of (1), it will not be able to use the font to predict those things. There's this handful of bad C APIs that every application these days seems to use:
That leaves us with three options:
Of all the options, 1 is the most incorrect. This is what happens in Midnight Commander if we do that: wezterm chooses treatment 2, as did we before the new rendering engine: iTerm2 and Terminal.app choose treatment 3. We elected to go with treatment 3 because it also improves the display of text from other languages, where there may be ascenders and descenders that poke out of the top or bottom of the cell. It's not more right, but it's definitely less wrong. Hope that helps!
@PhMajerus - I've got no idea how Unicode expects this to work. Maybe by some manner of variation selector? |
I'm sure DHowett has a much better grasp on all of this than I do, and I don't know how the decision process went internally, but I can provide some context to justify their decision as I see it from the outside, and so he can spend more time on getting things done and just correct me if he sees something I got wrong. I agree it's counter intuitive and seems wrong, but I don't think the Terminal is wrong in this case. HistoryTerminals and consoles on the other hand come from a day when text screens were a grid of character cells, OEMs could design their own character sets, but each character would take exactly one cell. The app didn't need to ask a renderer to calculate the width for a string of text to know how to align UI elements, it could calculate everything in term of columns and rows, a 10 characters string would always take 10 columns, regardless of the "font" (character set) and characters it contained. Then Japan joined the party, and needed both more than 256 characters and wider characters, so we got MBCS (multibyte character set). Fortunately, since characters on PCs were basically half-square rectangles, they could fit their characters in two cells, making square ideograms, and encode those as two characters, extending the number of characters available by basically having a mix of 8-bit and 16-bit character values. Chinese and Korean used the same principle. Then Unicode came along and decided to unify all character sets. One of the original rule of Unicode was that it doesn't care about font or looks, only about characters intent. That was the beginning of the mess to build a terminal or console emulator using Unicode, because a text-based app doesn't use a font renderer, it cannot query how wide some text string is, it knows from the text content how many columns it will use on screen… except now it doesn't anymore because it depends on whether the terminal is in western or CJK (asian) mode. At that point, fonts included some semigraphic characters from legacy code pages such as line drawing, box elements, cards suits, etc…, and these were normal-width characters because they came from original computers character sets. Except that, again, text-mode apps do not use a font renderer, they send code points to display to a terminal emulator and have no idea of fonts and renderer options. Unicode should probably have encoded them as two different characters but here we are, living with past decisions. It's easy to be wiser in retrospect once you know which issues each choice will bring over 30 years later. In the future we'll probably see both some option for a text-based app to tell the terminal to use semigraphics or Emojis, and to query the terminal for the cells-width or a specific text string (I believe DHowett is pushing for this already). Current solution isn't that bad, probably the best with what we have todayIn the meantime, handling these characters as using a single cell is actually a pretty good solution. This means text-mode apps that expect them to be Emojis can just add a space after the Emojis to make the terminal use two cells for it. The proper font will have a double-width glyph for that character that will extend from the first cell into the following space, and it will display as the app intended. It's far from perfect, I agree, but decades of history with an evolving technology is never perfect, at most it is working in a predictable way. |
Wow, these comments were literally seconds apart! High five! |
@DHowett 🫸🫷 |
Wow, thank you very much for your deep and awesome insights to the history and current situation! 😎 To you also a big thank you for the deep explanations! 👍 In the following I response to the comments of both of you. I understand that from the history point of view. From the user point of view, I personally would expect this:
As a summarize of both of your explanations I would say: To achieve this for the long term there can be 2 options. This leads me to 3 possible scenarios:
Are you agree from your side from the available options and scenarios? Personally, I would prefer scenario 3 and pick the option to increase the used cells to 2. So, I think there is the chance at least to choose the best fitting solution for the future. For the meanwhile, personally I would prefer the old way, to render it in one cell. I don't think that insert extra spaces is a way to go. How can I know if I need to insert a space and where? (Sorry, I am running out of time now. Maybe I will add something more or clarify at a later time.) |
I have some additions to my last comment: Now, after I thought again about your comment, I come to my personal conclusion, that from your mentioned variants item 2 is the best fitting solution:
This will not break any grid and does not produce any overlapping. So, there is neither something broken nor it produces unreadable text / requires extra work. For the long term it would be great if the size can be increased to 2 cells (I mean for rendering and the stored information in Are you aware of a Windows variant of Is there a way to programmatically detect if my app is running in Windows Terminal and which version of it? (For C and C++) I may have discovered a similar issue. If I use the Thai sentence for nicely say hello: สวัสดี ครับ This is, I believe, because the Thai letters are sometimes combined from 2 Unicode signs but then forming only 1 letter. |
Surprisingly, this isn't true. Over the past decade terminal applications have come to expect Emojis to not be scaled down if they don't fit. We did downscale overly large glyphs before and it was heavily disliked by a lot of people.
That actually already exists: You simply need to use ☠️ (U+2620 U+FE0F) instead of ☠ (U+2620). That's the "variation selector" that Dustin mentioned. For "ambiguous" emojis like this (technically called "unqualified emoji"), its existence is the difference between whether the glyph is drawn colored (U+FE0F), black/white (U+FE0E), or whether the presentation is unspecified and left open to the text renderer. You can find the full list of such emojis here: https://unicode.org/emoji/charts/emoji-variants.html However, Windows Terminal up until the current version 1.20 is not really Unicode aware at all, as it relies on something like It's not quite there yet (in particular the couple marks that are too far to the left at the start of the line), but it's significantly better than what we have now: It allocates at least 1 cell per character, making a complete mess of the Zalgo text. In my opinion, kitty does an excellent job when it comes to Unicode in Terminals and I'm trying to replicate its behavior for Windows Terminal. If I succeed, your issue will be consistently gone as long as you ensure to use either minimally or fully qualified emojis. Unqualified emojis will always have these overlap issues, because it's what both terminal applications and its users have come to expect. |
@lhecker Oh, you just solved my problem with the hourglass:⌛ |
@PhMajerus Note that the VS15 selector can change the rendering from an emoji presentation to a text presentation, but it can't change the width. You can make a narrow character wider with VS16, but you can't make a wide character narrower with VS15. And as far as I understand, |
Yes, sure, I dislike it also.
Ah, I didn't know that this exist. Thank you for providing these details. The Unicode specification is more complex than I thought. With the current approach you always must check if a selector is present after the last sign of some defined range for not lose it by accident. OK, so for this issue it means, it is working as designed / specified and can be closed? Do you have some input for my item 4 of my last comment?
|
The emoji-variants.html I linked shows all Emojis that are affected by the selector.
Wikipedia has a nice list what the other VS are used for: https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
Yes, if we ignore bugs in our implementation, I'd say it's working as intended. I'll close the issue then. 🙂
Ah, I apologize! Your text contains 3 non-spacing marks (Unicode category "Mn"): ั, ี, and ั. As the "non-spacing" indicates, they're generally supposed to not take up any space. But Windows Terminal doesn't yet support Unicode beyond basic surrogate pairs. That's why it allocates 1 cell for each non-spacing mark. I'm hoping to fix this issue in version 1.21 that will release in a few months. It already works in my debug build: |
@lhecker |
@lhecker, what might be related: Windows Terminal renders 🟥 and 🟩 correctly (as 2-char-width), but ⬜ is rendered as a traditional character (see screenshot), which causes mis-alignments, also as shown on the screenshot. Is it possible to change how ⬜ is displayed? I think it should be consistent to the other square-"emojis"? Thanks! |
If you paste ⬜ into a cmd.exe prompt, does it still treat it as narrow? On my end this works. |
Oh, right you were probably not asking just about the misalignment... |
Thanks @lhecker. I just opened pwsh in Windows Terminal and pasted As you can see on the screenshot below:
I'm using oh-my-posh (v19.19.0), Windows Terminal (v1.19.10821.0) and the font is: |
This is because the black and white large squares predate emojis, they are part of the misc. symbols and arrows: |
Thanks, that makes sense, but to my understanding it's up to the application how to display it, e.g., the browser and VS Code use the "modern" version of ⬜. Maybe Windows Terminal can be changed so that it uses the modern version as well. Possible side-effects might occur for apps that expect the "old-style" version to display some kind of terminal user-interface; however, rendering as-is is definitely a bug, see the cursor. |
I believe that's a bug with PowerShell, I can reproduce it with PowerShell, but not with cmd.exe or my ActiveScript Shell in the same Terminal. |
Actually, that's what I meant with the following:
PowerShell's support for modern Unicode is not great. They assume that ⬜ is 1 cell wide, but the Unicode spec clearly says it's 2 cells wide ( BTW this is also why I can't quite agree with you, @PhMajerus, either: Cascadia has designed both glyphs (⬛ and ⬜) to be 1 cell wide, but that doesn't match the East Asian property. I'm not sure why they're designated as wide by Unicode, but I suppose that's beside the point. And so while the 1-cell wide variant works just fine outside of terminals, it looks a little weird inside them. Additionally, as far as I can tell, it doesn't supply a colored version for the Emojis either (i.e. with U+FE0F). Segoe UI on the other hand ships both, a colored and a grayscale version of these glyphs. |
@lhecker I didn't mean to imply that Cascadia was right. I just think they included it because it's part of the more "core" symbols, so that explains the discrepancy between the black and white squares and the color emoji-only squares. I think support for emojis will improve over time, and these code points that existed as symbols and now have emojis versions as well will require some work on Cascadia's side. especially if it tries to support both symbol and emoji versions. |
Windows Terminal version
1.19.10573.0
Windows build number
10.0.19045.4046
Other Software
not required
Steps to reproduce
Open the Windows Terminal (I tried cmd.exe, PC, and Ubuntu 2204 LTS but I believe it does not matter what is used).
The Font is the default Cascadia Mono, I believe the size does not matter (occurs with the default 12 as well as 13).
Paste the Skull emoji ☠ into the command prompt. (Note that the cursor is now in the middle of the skull)
Type a letter, e.g., x
The x is not visible because it is behind the skull.
I don't know if other emojis are affected. I discovered this issue by accident since I use this string "🚀 🍀 ☠ 🔥" for some of my internal UTf-8 testing. From these 4 emojis only the skull is affected.
I believe after the auto update to 1.19.10573.0 this issue occurred. I don't know exactly which version was installed before but since it is maintained by the system it should be the prior released version.
If I remember correctly the skull ☠ was rendered too small in the old version. This is fixed now.
I also noticed that pasting of some emojis now produces a correct echo instead of question marks. Thank you for these fixes!
Maybe the new issue is caused by the bigger rendering? Is the saved size / space information which is used for calculate the start position of the next char still a too small one?
See the screenshot for see how it looks.
Expected Behavior
I can see
☠x
Actual Behavior
I only see
☠
because the x is behind the skull and not right of it.
(See screenshot in the 'steps to reproduce' section)
The text was updated successfully, but these errors were encountered: