Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for additional named entities for invisible/ambiguous characters #1841

Closed
r12a opened this issue Mar 28, 2024 · 1 comment
Closed
Labels
duplicate s:html https://html.spec.whatwg.org/multipage/ t:char_ref Referring to Unicode characters

Comments

@r12a
Copy link
Contributor

r12a commented Mar 28, 2024

This is a request from the W3C i18n WG that the HTML Standard define character entities to cover key invisible/ambiguous Unicode characters.

The following is a list of candidates that we are proposing, including (for convenience) a list of already existing named entities. The latter are marked in bold. Entity names are proposed after the character name, and are based on abbreviations used by Unicode, where they exist. Lower priority items are italicised.

Latin 1 Supplement — Latin-1 punctuation and symbols

  • U+00A0 NO-BREAK SPACE  
  • U+00AD SOFT HYPHEN ­

Combining Diacritical Marks — Grapheme joiner

  • U+034F COMBINING GRAPHEME JOINER &cgj;

Arabic — Format character

  • U+061C ARABIC LETTER MARK &alm;

Hangul Jamo — Old initial consonants

  • U+115F HANGUL CHOSEONG FILLER &hcf;

Hangul Jamo — Medial vowels

  • U+1160 HANGUL JUNGSEONG FILLER &hjf;

Ogham — Space

  • U+1680 OGHAM SPACE MARK &osm;

Mongolian — Format controls

  • U+180B MONGOLIAN FREE VARIATION SELECTOR ONE &fvs1;
  • U+180C MONGOLIAN FREE VARIATION SELECTOR TWO &fvs2;
  • U+180D MONGOLIAN FREE VARIATION SELECTOR THREE&fvs3;
  • U+180E MONGOLIAN VOWEL SEPARATOR &mvs;
  • U+180F MONGOLIAN FREE VARIATION SELECTOR FOUR &fvs4;

General Punctuation — Spaces

  • U+2000 EN QUAD &nqsp;
  • U+2001 EM QUAD &mqsp;
  • U+2002 EN SPACE  
  • U+2003 EM SPACE  
  • U+2004 THREE-PER-EM SPACE  
  • U+2005 FOUR-PER-EM SPACE  
  • U+2006 SIX-PER-EM SPACE &6msp;
  • U+2007 FIGURE SPACE  
  • U+2008 PUNCTUATION SPACE  
  • U+2009 THIN SPACE   AND  
  • U+200A HAIR SPACE   AND   AND part of    (U+0205F U+200A)

General Punctuation — Format character

  • U+200B ZERO WIDTH SPACE ​ AND ​ AND ​ AND ​ AND ​
  • U+200C ZERO WIDTH NON-JOINER ‌
  • U+200D ZERO WIDTH JOINER ‍
  • U+200E LEFT-TO-RIGHT MARK ‎
  • U+200F RIGHT-TO-LEFT MARK ‏
  • U+2066 LEFT-TO-RIGHT ISOLATE &lri;
  • U+2067 RIGHT-TO-LEFT ISOLATE &rli;
  • U+2068 FIRST STRONG ISOLATE &fsi;
  • U+2069 POP DIRECTIONAL ISOLATE &pdi;
  • U+202D LEFT-TO-RIGHT OVERRIDE &lro;
  • U+202E RIGHT-TO-LEFT OVERRIDE &rlo;
  • U+2060 WORD JOINER ⁠
  • U+202A LEFT-TO-RIGHT EMBEDDING &lre;
  • U+202B RIGHT-TO-LEFT EMBEDDING &rle;
  • U+202C POP DIRECTIONAL FORMATTING &pdf;

General Punctuation — Separators

  • U+2028 LINE SEPARATOR &lsep;
  • U+2029 PARAGRAPH SEPARATOR &psep;

General Punctuation — Space

  • U+202F NARROW NO-BREAK SPACE &nnbsp;
  • U+205F MEDIUM MATHEMATICAL SPACE   AND part of    (U+205F U+200A)

General Punctuation — Invisible operators

  • U+2061 FUNCTION APPLICATION ⁡ AND ⁡
  • U+2062 INVISIBLE TIMES ⁢ AND ⁢
  • U+2063 INVISIBLE SEPARATOR ⁣ AND ⁣
  • U+2064 INVISIBLE PLUS &InvisiblePlus;
  • U+206D ACTIVATE ARABIC FORM SHAPING &aafs;

CJK Symbols And Punctuation — CJK symbols and punctuation

  • U+3000 IDEOGRAPHIC SPACE &idsp;

Hangul Compatibility Jamo — Special character

  • U+3164 HANGUL FILLER &hf;

Halfwidth And Fullwidth Forms — Halfwidth Hangul variants

  • U+FFA0 HALFWIDTH HANGUL FILLER &hwhf;

Shorthand Format Controls — Shorthand format controls

  • U+1BCA0 SHORTHAND FORMAT LETTER OVERLAP
  • U+1BCA1 SHORTHAND FORMAT CONTINUING OVERLAP
  • U+1BCA2 SHORTHAND FORMAT DOWN STEP
  • U+1BCA3 SHORTHAND FORMAT UP STEP

Musical Symbols — Beams and slurs

  • U+1D173 MUSICAL SYMBOL BEGIN BEAM
  • U+1D174 MUSICAL SYMBOL END BEAM
  • U+1D175 MUSICAL SYMBOL BEGIN TIE
  • U+1D176 MUSICAL SYMBOL END TIE
  • U+1D177 MUSICAL SYMBOL BEGIN SLUR
  • U+1D178 MUSICAL SYMBOL END SLUR
  • U+1D179 MUSICAL SYMBOL BEGIN PHRASE
  • U+1D17A MUSICAL SYMBOL END PHRASE

Emoji Variation Selectors - turns on and off colour

  • U+FE0E: VARIATION SELECTOR-15 &vs15;
  • U+FE0F: VARIATION SELECTOR-16 &vs16;

We would also like to have a &zwsp; alias in addition to ​ for U+200B.

Instructions:

This follows the process at https://w3c.github.io/i18n-activity/guidelines/review-instructions.html

  1. Create the review comment you want to propose by replacing the prompts above these instructions, but LEAVE ALL THE INSTRUCTIONS INTACT

  2. Add one or more t:... labels. These should use ids from specdev establish a link to that doc.

  3. Set a label to identify the spec: this starts with s: followed by the spec's short name. If you are unable to do that, ask a W3C staff contact to help.

  4. Ask the i18n WG to review your comment.

  5. After discussion with the i18n WG, raise an issue in the repository of the WG that owns the spec. Use the text above these instructions as the starting point for that comment, but add any suggestions that arose from the i18n WG. In the other WG's repo, add an 'i18n-needs-resolution' label to the new issue. If you think any of the participants in layout requirements task force groups would be interested in following the discussion, add also the appropriate i18n-*lreq label(s).

  6. Delete the text below that says 'url_for_the_issue_raised', then add in its place the URL for the issue you raised in the other WG's repository. Do NOT remove the initial '§ '. Do NOT use [...](...) notation – you need to delete the placeholder, then paste the URL.

  7. Remove the 'pending' label, and add a 'needs-resolution' tag to this tracker issue.

  8. If you added an *lreq label, add the label 'spec-type-issue', add the corresponding language label, and a label to indicate the relevant typographic feature(s), eg. 'i:line_breaking'. The latter represent categories related to the Language Enablement Index, and all start with i:.

  9. Edit this issue to REMOVE ALL THE INSTRUCTIONS & THE PROPOSED COMMENT, ie. the line below that is '---' and all the text before it to the very start of the issue.


This is a tracker issue. Only discuss things here if they are i18n WG internal meta-discussions about the issue. Contribute to the actual discussion at the following link:

§ url_for_the_issue_raised

@r12a r12a added pending Issue not yet sent to WG, or raised by tracker tool & needing labels. s:html https://html.spec.whatwg.org/multipage/ t:char_ref Referring to Unicode characters labels Mar 28, 2024
@xfq
Copy link
Member

xfq commented Apr 28, 2024

Since we have #1847 now, can we close this issue?

@aphillips aphillips added duplicate and removed pending Issue not yet sent to WG, or raised by tracker tool & needing labels. labels May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate s:html https://html.spec.whatwg.org/multipage/ t:char_ref Referring to Unicode characters
Projects
None yet
Development

No branches or pull requests

3 participants