You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IO::Memory#to_s assumes the caller stores UTF-8 byte sequences, so if some other encoding is explicitly set and the IO::Memory's string methods are used, the result will be incorrect:
The first overload effectively calls String.new(to_slice). It should use this String constructor instead, which will perform the decoding on construction, whenever the IO::Memory has a non-default encoding. (If the IO::Memory already uses UTF-8, the returned String will expose invalid characters as U+FFFD automatically.)
The second overload is exactly io.write(to_slice). This one should similarly use the undocumented String.encode:
What's the point of appending strings, encoding them, and have to_s decode them? Just use String.build without an encoding in this case. String isn't always utf-8 so the current behavior is fine.
String isn't always utf-8 so the current behavior is fine.
But it really is. At least in theory. There may be non-UTF-8 bytes for practical reasons. But the general idea is still that String is UTF-8. It can contain some invalid bytes, but it is most certainly not meant to contain data with an entirely different encoding.
Original: #11011 (comment)
IO::Memory#to_s
assumes the caller stores UTF-8 byte sequences, so if some other encoding is explicitly set and theIO::Memory
's string methods are used, the result will be incorrect:Likewise,
#to_s(IO)
writes the underlying bytes unmodified:The first overload effectively calls
String.new(to_slice)
. It should use thisString
constructor instead, which will perform the decoding on construction, whenever theIO::Memory
has a non-default encoding. (If theIO::Memory
already uses UTF-8, the returnedString
will expose invalid characters as U+FFFD automatically.)The second overload is exactly
io.write(to_slice)
. This one should similarly use the undocumentedString.encode
:Such a rewrite will in fact provide the only way to convert between arbitrary encodings without going through UTF-8, unless #11018 is resolved.
The text was updated successfully, but these errors were encountered: