Skip to content

Commit

Permalink
Handle invalid encoded strings
Browse files Browse the repository at this point in the history
When trying to calculate the width of such strings, it would previously
crash with either `Encoding::InvalidByteSequenceError` or
`Encoding::UndefinedConversionError`. Totally invalid characters are now simply
replaced with a replacement character when converting to UTF8.

Especially binary encoded strings (i.e. no encoding) don't make much sense but at least it doesn't
crash now and tries to return a sensible default (assume the string is actually valid UTF8
  • Loading branch information
Earlopain committed Dec 25, 2024
1 parent b00c5bf commit bc47d28
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 2 deletions.
10 changes: 8 additions & 2 deletions lib/unicode/display_width.rb
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,14 @@ class DisplayWidth

# Returns monospace display width of string
def self.of(string, ambiguous = nil, overwrite = nil, old_options = {}, **options)
string = string.encode(Encoding::UTF_8) unless string.encoding == Encoding::UTF_8
# Binary strings don't make much sense when calculating display width.
# Assume it's valid UTF-8
if string.encoding == Encoding::BINARY && !string.force_encoding(Encoding::UTF_8).valid_encoding?
# Didn't work out, go back to binary
string.force_encoding(Encoding::BINARY)
end

string = string.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless string.encoding == Encoding::UTF_8
options = normalize_options(string, ambiguous, overwrite, old_options, **options)

width = 0
Expand Down Expand Up @@ -236,4 +243,3 @@ def of(string, **kwargs)
end
end
end

11 changes: 11 additions & 0 deletions spec/display_width_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,17 @@
it 'works with non-utf8 Unicode encodings' do
expect( 'À'.encode("UTF-16LE").display_width ).to eq 1
end

it 'works with a string that is invalid in its encoding' do
s = "\x81\x39".dup.force_encoding(Encoding::SHIFT_JIS)

# Would print as �9 on the terminal
expect( s.display_width ).to eq 2
end

it 'works with a binary encoded string that is valid in UTF-8' do
expect( '€'.b.display_width ).to eq 1
end
end

describe '[emoji]' do
Expand Down

0 comments on commit bc47d28

Please sign in to comment.