-
Notifications
You must be signed in to change notification settings - Fork 8.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make most emojis full-width #5795
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm fine with this, but let's get the expert in on this.
Also: Add some screenshots 😄
@msftbot make sure @miniksa signs off on this |
Hello @zadjii-msft! Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:
If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you". |
src/types/CodepointWidthDetector.cpp
Outdated
UnicodeRange{ 0x23f0, 0x23f0, CodepointWidth::Wide }, | ||
UnicodeRange{ 0x23f3, 0x23f3, CodepointWidth::Wide }, | ||
UnicodeRange{ 0x2460, 0x24e9, CodepointWidth::Ambiguous }, | ||
UnicodeRange{ 0x2328, 0x232a, CodepointWidth::Wide }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible -- and I know this is a lot of work -- i'd like a comment on each line that refers to a footnote about us overriding these. Just so we can come find it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Changes requested in comment)
I'm not an expert on this, but I've followed a few other discussions on this subject, and my understanding was that it was vital that we exactly follow whatever standard everyone else uses for determining character width. Applications like editors need to be able to predicate exactly how much of a line of text they can fit on the screen before it wraps. They can't do that if the standard says a character should be narrow, but we're rendering it as wide. I don't know exactly what the standards are that we should be following, so maybe everything in this PR is good, but the mention of "Microsoft specific emojis" makes it sound like we're making up own rules for some characters, and I suspect that's going to cause problems. |
I agree with James here. If Microsoft are the only one emojifying a particular Unicode codepoint, we should just center that glyph inside the 1 cell that it's supposed to be taking up. EDIT: Or we should put just those ones behind a setting called |
That's fair, I was unsure about putting the Microsoft specific emojis in, because if I didn't, the |
Yeah that's why I said maybe we make it an experimental toggle that is off by default. So purists can have what Unicode dictates and Microsoft fanboys can have their full width pencils. |
I am deeply concerned with us plugging a configuration option into this global singleton function (since right now we don't have a good place to store state fo codepoint width detection, and we need conpty to agree on the widths with terminal, and there's no good way to communicate that state down to conpty) |
I'll leave the MS codepoints I've found in the comments so we can come back to implement them as we wish in the future then? |
Oof. OK. Yeah you're right.
Yeah that's probably best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉
James is right, but right now the best thing we have is "vague consensus of various Unfortunately, emoji are just one place where column-counting breaks down for applications. |
The table that we refer to in `CodepointWidthDetector.cpp` to determine whether or not a codepoint should be rendered as Wide vs Narrow was based off EastAsianWidth[1]. If a codepoint wasn't included in this table, they're considered Narrow. Many emojis aren't specified in the EAW list, so this PR supplements our table with emoji codepoints from emoji-data[2] in order to render most, if not all, emojis as full-width. There are certain codepoints I've added to the comments (in case we want to add them officially to the table in the future) that Microsoft decided to give an emoji presentation even if it's specified as Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode emoji list. These include all of the Mahjong Tiles block, different direction pencils (✎✐), different pointing index fingers (☜, ☞) among others. I have no idea if I've captured all of them, as I don't know of an easy way to detect which are Microsoft specific emojis. ## Validation Steps Performed I have looked at so many emojis that I dream emoji. These screenshots aren't encompassing _all_ emoji but I've tried to grab a couple from all across the codepoint ranges: Before: ![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png) After: ![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png) [1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt [2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt Closes #900 (cherry picked from commit 7ae3433)
A couple of codepoints, namely the card suites, male and female signs, and white and black smiling faces were changed to have a two-column width as part of #5795 since they were specified as emoji in Unicode's emoji list v13.0[1]. These particular glyphs also show up in some of the most fundamental code pages, such as CP437[2] and WGL4[3]. We should not be touching the width of the glyphs in these codepages, as suddenly changing a long-time-running narrow glyph to use two-columns all of a sudden will surely break (and has already broken) things. [1] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt [2] https://en.wikipedia.org/wiki/Code_page_437 [3] https://en.wikipedia.org/wiki/Windows_Glyph_List_4 Closes #5822
A couple of codepoints, namely the card suites, male and female signs, and white and black smiling faces were changed to have a two-column width as part of #5795 since they were specified as emoji in Unicode's emoji list v13.0[1]. These particular glyphs also show up in some of the most fundamental code pages, such as CP437[2] and WGL4[3]. We should not be touching the width of the glyphs in these codepages, as suddenly changing a long-time-running narrow glyph to use two-columns all of a sudden will surely break (and has already broken) things. [1] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt [2] https://en.wikipedia.org/wiki/Code_page_437 [3] https://en.wikipedia.org/wiki/Windows_Glyph_List_4 Closes #5822 (cherry picked from commit cf62922)
🎉 Handy links: |
This reverts commit 7ae3433.
When I type a ☢ emoji and backspace it in Ubuntu 18.04 I need to Ctrl-L to make it disappear. This does not happen on cmd. Should I open a separate issue for this? |
Nope. We fixed this with #5934 😄 we got some of the widths wrong |
The table that we refer to in `CodepointWidthDetector.cpp` to determine whether or not a codepoint should be rendered as Wide vs Narrow was based off EastAsianWidth[1]. If a codepoint wasn't included in this table, they're considered Narrow. Many emojis aren't specified in the EAW list, so this PR supplements our table with emoji codepoints from emoji-data[2] in order to render most, if not all, emojis as full-width. There are certain codepoints I've added to the comments (in case we want to add them officially to the table in the future) that Microsoft decided to give an emoji presentation even if it's specified as Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode emoji list. These include all of the Mahjong Tiles block, different direction pencils (✎✐), different pointing index fingers (☜, ☞) among others. I have no idea if I've captured all of them, as I don't know of an easy way to detect which are Microsoft specific emojis. ## Validation Steps Performed I have looked at so many emojis that I dream emoji. These screenshots aren't encompassing _all_ emoji but I've tried to grab a couple from all across the codepoint ranges: Before: ![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png) After: ![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png) [1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt [2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt Closes microsoft#900
A couple of codepoints, namely the card suites, male and female signs, and white and black smiling faces were changed to have a two-column width as part of microsoft#5795 since they were specified as emoji in Unicode's emoji list v13.0[1]. These particular glyphs also show up in some of the most fundamental code pages, such as CP437[2] and WGL4[3]. We should not be touching the width of the glyphs in these codepages, as suddenly changing a long-time-running narrow glyph to use two-columns all of a sudden will surely break (and has already broken) things. [1] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt [2] https://en.wikipedia.org/wiki/Code_page_437 [3] https://en.wikipedia.org/wiki/Windows_Glyph_List_4 Closes microsoft#5822
The table that we refer to in
CodepointWidthDetector.cpp
to determinewhether or not a codepoint should be rendered as Wide vs Narrow was
based off EastAsianWidth[1]. If a codepoint wasn't included in this
table, they're considered Narrow. Many emojis aren't specified in the
EAW list, so this PR supplements our table with emoji codepoints from
emoji-data[2] in order to render most, if not all, emojis as full-width.
There are certain codepoints I've added to the comments (in case we want
to add them officially to the table in the future) that Microsoft
decided to give an emoji presentation even if it's specified as
Narrow/Ambiguous in the EAW list and are not specified in the Unicode
emoji list. These include all of the Mahjong Tiles block, different
direction pencils (✎✐), different pointing index fingers (☜, ☞) among
others. I have no idea if I've captured all of them, as I don't know of
an easy way to detect which are Microsoft specific emojis.
Validation Steps Performed
I have looked at so many emojis that I dream emoji.
These screenshots aren't encompassing all emoji but I've tried to grab
a couple from all across the codepoint ranges:
Before:
![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png)
After:
![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png)
[1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
[2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt
Closes #900