Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RGA.diff: Ivestigate what bad can happen when splitting multi-codepoint "characters" into codepoints #146

Open
cblp opened this issue Dec 2, 2019 · 0 comments
Labels
component_RDT level_Research type_Question Further information is requested

Comments

@cblp
Copy link
Member

cblp commented Dec 2, 2019

What bad can happen if we split é into e + ´?

See also 2-codepoint country flags.

For this, Unicode has a concept of “grapheme cluster”. There’s also “extended grapheme cluster” (EGC), which is basically an updated version of the concept.

http://unicode.org/glossary/#grapheme_cluster

http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

@cblp cblp added type_Question Further information is requested level_Research component_RDT labels Dec 2, 2019
@cblp cblp changed the title RGA: Ivestigate what bad can happen when splitting multi-codepoint "characters" into codepoints RGA.diff: Ivestigate what bad can happen when splitting multi-codepoint "characters" into codepoints Dec 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component_RDT level_Research type_Question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant