Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Unicode conversion support #68

Closed
wants to merge 2 commits into from
Closed

feat: add Unicode conversion support #68

wants to merge 2 commits into from

Conversation

qazxcdswe123
Copy link
Contributor

@qazxcdswe123 qazxcdswe123 commented Oct 23, 2024

  • String to utf8 bytes
  • utf8 bytes to String

TODO: use buffer?

convert utf16 string to utf8
@qazxcdswe123 qazxcdswe123 changed the title feat: add Unicode support feat: add Unicode conversion support Oct 23, 2024
Copy link

peter-jerry-ye-code-review bot commented Oct 23, 2024

Observations and Suggestions

  1. Potential Bug in UTF-8 Encoding Function (to_utf8):

    • The function to_utf8 assumes that the input string is in UTF-16 LE format but does not explicitly check or convert the input string from other encodings. This could lead to incorrect encoding if the input string is not in UTF-16 LE format.
    • Suggestion: Add a validation or conversion step to ensure the input string is in UTF-16 LE format before processing.
  2. Potential Bug in UTF-8 Decoding Function (from_utf8):

    • The function from_utf8 assumes that the input byte sequence is a valid UTF-8 encoded string. If the input contains invalid UTF-8 sequences, the function will fail with an error.
    • Suggestion: Consider adding more robust error handling or recovery mechanisms to handle invalid UTF-8 sequences gracefully, possibly by replacing or skipping invalid bytes.
  3. Typo in Test Case Description:

    • In the test case "from utf8", the description mentions "from utf8" but the actual function call uses the correct naming convention (@unicode.from_utf8!).
    • Suggestion: Correct the description in the test case to match the function name for clarity.

These observations highlight potential areas for improvement to ensure the functions handle various input scenarios more robustly and adhere to coding standards for clarity and consistency.

@peter-jerry-ye
Copy link
Collaborator

Goal or non-goal:

  • streaming conversion using Iter
    fn to_utf8(string: Iter[Char]) -> Iter[Byte]!Failure
    
  • streaming conversion of segmented data
    let decoder = Decoder::new()
    inspect!(decoder.decode([0xf0]), content="")
    inspect!(decoder.decode([0x9f, 0x98, 0x90]), content="😐")
    

@peter-jerry-ye
Copy link
Collaborator

Closed as in favor of moonbitlang/core#1236

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants