switch VRL parser from Pest to Nom #6139
Labels
domain: vrl
Anything related to the Vector Remap Language
type: tech debt
A code change that does not add user value.
This issue is intended to track discussion around switching from Pest to Nom for the VRL parser.
I'm unsure of the reason why Pest was originally chosen (neither the RFC nor the initial implementation mention the rationale), but one reason appears to be the improved (pretty) error messages it produces.
We're no longer leveraging those messages though, since, while the output in the terminal looked nice, the actual message itself could be confusing, for example:
The fact that this mentions "path_index" is confusing to users. The reason why this is, is that you have to add negative expectations to each rule, and then convert that back into an error message for a syntax error like the above to be understandable, but it's tricky to add those, as you have to manipulate the syntax stack as you are parsing, which isn't easy to do in Pest.
Nom has been brought up a few times in the past, people seem to like it, and its strength over Pest is that the parser is written in pure Rust, and thus we get to leverage the type system, whereas Pest generates one big enum of "rules", requiring each part of the parser to accept one or two of those rules, and hard-error if any other unexpected rules match.
The reason why I'm creating this issue now, is that I'm starting to see how Pest is affecting the VRL development negatively. It's slowing us down when adding new features, and as more people start to iterate on the grammar — because we lack types to guide us — we're introducing minor hacks to get us to the desired state, without considering the impact on other areas of the parser. This in turns leads to a growth in the grammar file with duplicate or contradictory rules.
I have a strong sense that the function-based parser design promoted by Nom would help us with that situation. I suspect we'll get three positives outcome of this change:
grammar.pest
file.Obviously, the biggest downside is the investment we made into the current parser, and the fact that we'd need to make another investment into a Nom-based parser.
I would suspect the following, though:
We're not taking on this task right now, but this issue exists to solicit discussion, and track the potential work for after we ship 0.12.
The text was updated successfully, but these errors were encountered: