Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filters not unicode safe #16

Open
oberhamsi opened this issue Jul 13, 2015 · 1 comment
Open

filters not unicode safe #16

oberhamsi opened this issue Jul 13, 2015 · 1 comment

Comments

@oberhamsi
Copy link
Contributor

  • strings.capfirst, strings.lower, strings.upper - These do not take care of locale-sensitive character capitalization rules for Turkish, Serbian, Croatian, etc.
  • strings.truncatechars - This is not sensitive to Unicode surrogate characters or combining characters, and therefore may truncate in the middle of a character. This is particularly troublesome in writing systems that rely heavily on combining characters such as Thai or most of the Indic writing systems like Devanagari, Kannada, Tamil, etc.
@ehoogerbeets
Copy link

Missed a line in the first point:

  • strings.capfirst, strings.lower, strings.upper - These do not take care of locale-sensitive character capitalization rules for Turkish, Serbian, Croatian, etc.

All 4 of these can be handled properly by ilib. Eventually, I will create an ilib-filters ringo package that will have plugin filters for these things based on ilib.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants