Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #74.
Internally, the .NET String class uses UTF-16 which encodes characters from the Basic Multilingual Plane.
Characters outside this plan are encoded as a sequence of two UTF-16 code units called a surrogate pair.
In that case, a single Unicode character – identified by its given codepoint – is encoded as two UTF-16 code units.
In that case, the .NET String
length
property returns2
even though there is a single character.Another specials case is composite characters. For instance, the character
é
can be encoded as two different Unicode sequences:As a result, the builtin functions
length()
,reverse()
,sort()
andsort_by()
did not handle those cases correctly.This PR fixes #74 by introducing proper support for strings as a sequence of codepoints.
However, this still raises some issues with regards to composite characters.