Inconsistent JS/JVM behavior with byte order mark (BOM) #112

jpd236 · 2020-05-23T00:02:58Z

I'm not sure if this problem is still relevant as it looks like the function in question has been commented out of the current tree:

https://github.com/Kotlin/kotlinx-io/blame/master/core/commonMain/src/kotlinx/io/text/CharsetEncoder.kt

but in case it is still an issue under the covers - with the 0.1.16 version of the library, I'm seeing inconsistent behavior when calling String(<bytes>, charset = Charsets.UTF_8) when bytes begins with a Byte order mark depending on whether I'm targeting the JVM or JS.

In the JVM, the BOM (0xEF, 0xBB, 0xBF) gets converted to a U+FEFF as the first character of the resulting string.

In JS, the BOM appears to be stripped out.

The text was updated successfully, but these errors were encountered:

fzhinkin · 2023-06-12T11:13:16Z

We're rebooting the kotlinx-io development (see #131), all issues related to the previous versions will be closed. Consider reopening it if the issue remains (or the feature is still missing) in a new version.

fzhinkin closed this as completed Jun 12, 2023

AzimMuradov mentioned this issue Oct 2, 2023

Support encodings other than UTF-8 and support BOM handling #226

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent JS/JVM behavior with byte order mark (BOM) #112

Inconsistent JS/JVM behavior with byte order mark (BOM) #112

jpd236 commented May 23, 2020

fzhinkin commented Jun 12, 2023

Inconsistent JS/JVM behavior with byte order mark (BOM) #112

Inconsistent JS/JVM behavior with byte order mark (BOM) #112

Comments

jpd236 commented May 23, 2020

fzhinkin commented Jun 12, 2023