Skip to content
This repository has been archived by the owner on Jan 14, 2022. It is now read-only.

Commit

Permalink
TerminalEmulator: conform to standard on handling of invalid UTF-8 se…
Browse files Browse the repository at this point in the history
…quences

The Unicode standard requires that, when dealing with ill-formed UTF-8
(version 6.2, page 96):

    If the converter encounters an ill-formed UTF-8 code unit sequence
    which starts with a valid first byte, but which does not continue
    with valid successor bytes [...], it must not consume the successor
    bytes as part of the ill-formed subsequence whenever those successor
    bytes themselves constitute part of a well-formed UTF-8 code unit
    subsequence.

This implies that when we hit a byte in the input stream which cannot
fit into the sequence currently being decoded, we must attempt to decode
that byte again after resetting our decoder state.
  • Loading branch information
steven676 committed Jun 8, 2014
1 parent 0da354d commit 7335c64
Showing 1 changed file with 5 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -823,7 +823,11 @@ private boolean handleUTF8Sequence(byte b) {
mUTF8ToFollow = 0;
mUTF8ByteBuffer.clear();
emit(UNICODE_REPLACEMENT_CHAR);
return true;

/* The Unicode standard (section 3.9, definition D93) requires
* that we now attempt to process this byte as though it were
* the beginning of another possibly-valid sequence */
return handleUTF8Sequence(b);
}

mUTF8ByteBuffer.put(b);
Expand Down

0 comments on commit 7335c64

Please sign in to comment.