-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace the yaml.js library with another YAML parser #165
Comments
So basically both nodeca/js-yaml and eemeli/yaml look great. However, there are a few practical differences after all. YAML 1.1 vs. YAML 1.2eemeli/yamlMore focused on the compliance with YAML specs. It claims to pass all of tests in the YAML test suite (https://github.com/eemeli/yaml/blob/b7696fc001837a2e9d66ad78d7c04f47943daeca/README.md?plain=1#L7):
It allows choosing between YAML 1.1 and YAML 1.2 mode. The differences are listed at https://yaml.org/spec/1.2.2/ext/changes/#changes-in-version-12-revision-120-2009-07-21. These are relevant to our purposes (I'll add numbering for easier reference):
Basically, we want number 1 quite badly, because otherwise (BTW, SnakeYAML used in KSC builds for JVM targets YAML 1.1, so it parses However, at the moment, KSC code implicitly relies on the YAML parser to behave according to YAML 1.1 in points 2 and 3 (i.e. recognize underscores
This is quite easy to fix on the KSC side, though. Even if the YAML library parsed all integers as strings (this would be the case if we'd ever configure the YAML parser to use failsafe schema, BTW), it wouldn't be a huge problem, but it would require some additional code in KSC. nodeca/js-yamlAlthough it claims to follow YAML 1.2 (https://github.com/nodeca/js-yaml/blob/2cef47bebf60da141b78b085f3dea3b5733dcc12/README.md?plain=1#L1):
... in practice it still supports some features of YAML 1.1 removed in YAML 1.2. In particular underscores Error messagesI personally find error messages of nodeca/js-yaml nicer and more understandable, even though eemeli/yaml's errors are somewhat more descriptive instead of just "bad indentation" (but that's probably enough in many cases). Details--- 1/results/yaml-1.2.txt
+++ 2/results/js-yaml.txt
@@ ... @@
-> YAMLParseError: Nested mappings are not allowed in compact mappings at line 3, column 12:
->
-> value: condition ? 4 : 8
-> ^
+> YAMLException: bad indentation of a mapping entry in "input.ksy" (3:26)
>
+> 1 | instances:
+> 2 | foo:
+> 3 | value: condition ? 4 : 8
+> ------------------------------^ -> YAMLParseError: Nested mappings are not allowed in compact mappings at line 2, column 7:
->
-> id: yaml_1
-> ^
+> YAMLException: bad indentation of a mapping entry in "input.ksy" (3:8)
>
+> 1 | meta:
+> 2 | id: yaml_1
+> 3 | seq:
+> ------------^ -> YAMLParseError: All mapping items must start at the same column at line 8, column 1:
->
-> doc: An animal species
-> seq: # line 8, but yaml.js says line 4!
-> ^
+> YAMLException: bad indentation of a mapping entry in "input.ksy" (8:5)
>
+> 5 | types:
+> 6 | animal:
+> 7 | doc: An animal species
+> 8 | seq: # line 8, but yaml.js says ...
+> ---------^
+> 9 | - id: species
+> 10 | type: s4 -> YAMLParseError: Map keys must be unique at line 6, column 1:
->
-> type: u1
-> seq:
-> ^
+> YAMLException: duplicated mapping key in "input.ksy" (6:1)
>
+> 3 | seq:
+> 4 | - id: foo
+> 5 | type: u1
+> 6 | seq:
+> -----^
+> 7 | - id: bar
+> 8 | type: u1 nodeca/js-yaml also seems to have better detection of duplicate keys - it's able to handle even this case (this is rather an edge case, but still): 2: "bar"
"2": "foo" --- 1/results/yaml-1.2.txt
+++ 2/results/js-yaml.txt
@@ ... @@
-{ '2': 'foo' }
+ERROR:
+> YAMLException: duplicated mapping key in "input.ksy" (2:1)
+>
+> 1 | 2: "bar"
+> 2 | "2": "foo"
+> -----^ Ease of integrationnodeca/js-yaml is easier to integrate in the Web IDE, because there's already a pre-built minified JS file: https://github.com/nodeca/js-yaml/blob/4.1.0/dist/js-yaml.min.js eemeli/yaml doesn't provide a single packaged minified file, see eemeli/yaml#480 (comment). When you install it from npm, there's just the pp@DESKTOP-89OPGF3 MINGW64 /c/temp/js-yaml-parsers-test/node_modules/yaml/browser (master)
$ find . -type f -exec wc -c {} +
...
274427 total In comparison, nodeca/js-yaml has a pp@DESKTOP-89OPGF3 MINGW64 /c/temp/js-yaml-parsers-test/node_modules/js-yaml/dist (master)
$ wc -c js-yaml.min.js
39430 js-yaml.min.js That's almost 7 times smaller, and it's only 1 file instead of 75 individual ones, meaning the browser has to send only 1 request instead of 75, so it should all be faster. (Yes, technically it's not really fair to compare unminified files to a minified one, but unfortunately no proper minification/packaging is part of our existing Web IDE infrastructure, and I don't want to spend time on it at this point.) ConclusionIn the end I chose nodeca/js-yaml for the reasons I stated above. It wouldn't be a big problem to switch to eemeli/yaml if we ever wanted to, but it would require some changes to KSC and the Web IDE infrastructure (add minification step etc.). For now, nodeca/js-yaml is easier to switch to, and I suppose it will work a bit better for us than eemeli/yaml would. Obviously, it would be best to use a YAML parser in Scala that works in all environments (kaitai-io/kaitai_struct#229), but until that becomes a reality, I believe switching to nodeca/js-yaml is much better than staying with jeremyfa/yaml.js. |
Resolves #165 * Fixes #63 * Fixes kaitai-io/kaitai_struct#456 * Fixes #150 * Fixes #27 * Fixes #62 * Fixes kaitai-io/kaitai_struct#693
Hey, @generalmimon, as always, this is very very thorough and I can only thank you for doing all this work. This indeed looks like a magnificent improvement and fixes quite a lot of bugs that have pestered us and our users for years! I agree with all your assessments. For YAML 1.1 vs YAML 1.2 — I believe ultimately it we'll have to settle on certain pattern of treatment these values holistically — e.g.:
So far, looks like most of the places when we had it leans towards the latter (it makes it very useful for authoring YAML by hand). |
@GreyCat Thanks for your comment!
Yeah, the latter option sounds more reasonable of the two. I don't see a reason for option one, that sounds too radical and unnecessary (but at least using two pairs of quotes to express a string literal in a KS expression as in https://doc.kaitai.io/user_guide.html#_switching_over_strings wouldn't feel weird anymore 🙂). Instead, as you've already mentioned in kaitai-io/kaitai_struct#229 (comment):
... we can eventually configure our YAML parsers to use failsafe schema, which basically disables scalar interpretation done by the YAML parser. In this mode, there are only 3 data types the YAML parser recognizes: mapping, sequence and string. This means that it no longer knows anything about For KSC purposes, I suppose this would be pretty much perfect and all we really need. In our case, having the YAML parser produce all these different data types is more annoying than helpful. For example, right now if you want to have a But even on the JVM with SnakeYAML that reads big integers as case _: java.math.BigInteger =>
src.toString But at least there's no loss of precision. Looking at https://doc.kaitai.io/ksy_diagram.html, most of YAML keys that we recognize in .ksy specs are either pure strings (e.g. There are a few pure boolean keys (see also https://github.com/search?q=repo%3Akaitai-io%2Fkaitai_struct_compiler%20getOptValueBool&type=code):
and pure integer keys (see https://github.com/search?q=repo%3Akaitai-io%2Fkaitai_struct_compiler+getOptValueInt&type=code):
And as already mentioned, the integer keys of enum entries would have to be parsed too. But it's not a big problem to adapt these in KSC to accept strings instead and do the appropriate parsing. Perhaps the only potential problem is that this change will create some slight incompatibilities with other standalone tools that consume .ksy files and don't use failsafe schema (most likely some situations that were allowed with a YAML parser using implicit typing won't be allowed anymore and vice versa). This is because with failsafe schema, we simply always get strings, but we have no way of knowing whether a scalar was unquoted (meaning that it may be subject to implicit typing in YAML parsers that don't use failsafe schema) or not. For example, Both nodeca/js-yaml and eemeli/yaml that I've talked about in this issue support failsafe schema. SnakeYAML we're currently using doesn't seem to, but SnakeYAML Engine claiming to be a YAML 1.2 processor (as opposed to SnakeYAML, which targets YAML 1.1) apparently supports it (see https://bitbucket.org/snakeyaml/snakeyaml-engine/src/f4b8cdb1e846d5354fb4791916ba3c5932496770/src/test/java/org/snakeyaml/engine/schema/FailsafeTest.java). |
So far we've been using jeremyfa/yaml.js to parse YAML in the Web IDE. However, it is more or less abandoned. Last release 0.3.0 was published on 2017-06-24 and there's a note at the top of its README that it is unlikely to receive new features or bugfixes (https://github.com/jeremyfa/yaml.js/tree/efe8ce18704ae43383e4177aa7de1a2619bd4e67#readme):
But more importantly, we've encountered a number of real problems with jeremyfa/yaml.js in the Web IDE over time (more on that in following comments). So it has long been clear that it's time to switch to another library.
As described at https://philna.sh/blog/2023/02/02/yaml-document-from-hell-javascript-edition/#yaml-in-javascript, there are 3 JavaScript libraries available for YAML parsing:
I wasn't sure whether option 2 or 3 would be a better choice, so I wanted to do some tests. Also, @GreyCat mentioned in https://github.com/kaitai-io/kaitai_struct_webide/pull/84/files#r276498978 that he'd want to make sure that the new YAML parsing library addresses the issues of the old one:
So I wrote a few scripts for testing and comparing the parsing results and put them in a repo: https://github.com/generalmimon/js-yaml-parsers-test
Both libraries seem to be pretty solid, definitely better than jeremyfa/yaml.js that we've been using. They both solve many pain points of jeremyfa/yaml.js - I'll list below the issues solved by switching to either library.
Legend to all diffs below:
No more broken hex literal parsing
Fixes YAML parser does not parse hex expressions correctly #63
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L6-L9)
Fixes Inconsistent hex literals in pos: kaitai_struct#456
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L11-L14)
No more parsing YAML 1.1 binary literals (
0b...
) or YAML 1.2 octal literals (0o...
) as0
Fixes "unable to find enum member" when using binary notation #150
Details
This example also shows another problematic behavior of jeremyfa/yaml.js: duplicate keys are silently allowed (leading to data loss) instead of being rejected with an error.
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L16-L22)
https://yaml.org/spec/1.2.2/#example-integers
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L114-L120)
Flow-style multi-line strings don't trigger a parse error and are parsed correctly as per the spec (i.e. single newlines should not translate to
\n
in the output string, a double newline is needed for this, see https://yaml-multiline.info/#flow-scalars-plain)Fixes YAML parser doesn't recognize flow-style multi-line strings #27
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L29-L33)
Also fixes the parse error with https://github.com/kaitai-io/kaitai_struct_formats/blob/acdf0733633568c68869af15846abaf1c0eaa59a/image/tga.ksy#L17-L21
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L35-L41)
A mapping key-value pair of a mapping indented more than the previous one is considered an error, not included in the value of the previous key-value pair
Fixes incorrect parsing of https://github.com/kaitai-io/kaitai_struct_tests/blob/7d7ecf076cc02c5032ec399655b7c3d50bde96ad/formats_err/yaml_1.ksy
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L138-L149)
The notorious colon
:
as part of the ternary operator in an unquoted string is no longer allowed, which is consistent with the SnakeYAML library used in the JVM compiler (and it is in fact correct behavior according to the YAML spec, see https://matrix.yaml.info/details/ZCZ6.html and https://matrix.yaml.info/details/ZL4Z.html)https://doc.kaitai.io/user_guide.html#_ternary_if_then_else_operator
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L43-L54)
Duplicate mapping keys are rejected with an error (this is the default behavior in both nodeca/js-yaml and eemeli/yaml), not silently allowed as in jeremyfa/yaml.js, where only the first entry is kept (i.e. resulting in a data loss, which tends to be surprising for many people that don't know this property of YAML)
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L151-L170)
Note that this behavior has been suggested several times before:
In 0.9, the feature of treating duplicate keys as errors was enabled in SnakeYAML used in JVM compiler builds: Multiple type declarations behaviour kaitai_struct#641 (comment)
Comment lines do not suspend counting of line numbers displayed in error messages like they do in jeremyfa/yaml.js
Fixes Invalid line number while parsing YAML if comments are used #62
Details
JSON is accepted
Fixes KSY (YAML) parser support for JSON kaitai_struct#693
Details
(see https://github.com/generalmimon/js-yaml-parsers-test/blob/662870c1c092b8092f458f8ad94c23b3ad62f93c/results-diffs/yamljs_vs_js-yaml.diff#L206-L242)
The text was updated successfully, but these errors were encountered: