switch VRL parser from Pest to Nom #6139

JeanMertz · 2021-01-19T16:44:40Z

This issue is intended to track discussion around switching from Pest to Nom for the VRL parser.

I'm unsure of the reason why Pest was originally chosen (neither the RFC nor the initial implementation mention the rationale), but one reason appears to be the improved (pretty) error messages it produces.

We're no longer leveraging those messages though, since, while the output in the terminal looked nice, the actual message itself could be confusing, for example:

 --> 1:8
  |
1 | 1.4true
  |        ^---
  |
  = expected path_index

The fact that this mentions "path_index" is confusing to users. The reason why this is, is that you have to add negative expectations to each rule, and then convert that back into an error message for a syntax error like the above to be understandable, but it's tricky to add those, as you have to manipulate the syntax stack as you are parsing, which isn't easy to do in Pest.

Nom has been brought up a few times in the past, people seem to like it, and its strength over Pest is that the parser is written in pure Rust, and thus we get to leverage the type system, whereas Pest generates one big enum of "rules", requiring each part of the parser to accept one or two of those rules, and hard-error if any other unexpected rules match.

The reason why I'm creating this issue now, is that I'm starting to see how Pest is affecting the VRL development negatively. It's slowing us down when adding new features, and as more people start to iterate on the grammar — because we lack types to guide us — we're introducing minor hacks to get us to the desired state, without considering the impact on other areas of the parser. This in turns leads to a growth in the grammar file with duplicate or contradictory rules.

I have a strong sense that the function-based parser design promoted by Nom would help us with that situation. I suspect we'll get three positives outcome of this change:

It makes it easier for others to iterate on the parser (as it's all Rust based, and all type checked).
It removes ambiguity what's going on in the parser, as we no longer depend on an external macro that parses the external grammar.pest file.
It would allow us to significantly improve our syntax error messages.
It would make it easier to test individual parts of our parser instead of testing the entire parser as a whole.

Obviously, the biggest downside is the investment we made into the current parser, and the fact that we'd need to make another investment into a Nom-based parser.

I would suspect the following, though:

Large chunks of our parser can remain, even if they have to be updated in minor ways.
All logic outside the parser (individual expression types, etc) will remain untouched.
Now that we have UI tests, we should have a clean upgrade path, allowing us to validate backward compatibility.
The general "shape" of the parser will remain the same; each rule within our grammar has its own parsing function, even if the function implementation itself will be changed to use Nom instead of Pest.

We're not taking on this task right now, but this issue exists to solicit discussion, and track the potential work for after we ship 0.12.

The text was updated successfully, but these errors were encountered:

JeanMertz added type: tech debt A code change that does not add user value. needs: approval Needs review & approval before work can begin. domain: vrl Anything related to the Vector Remap Language labels Jan 19, 2021

binarylogic mentioned this issue Jan 20, 2021

VRL AST for program reflection and visualization #6152

Closed

JeanMertz removed the needs: approval Needs review & approval before work can begin. label Jan 27, 2021

JeanMertz self-assigned this Jan 27, 2021

JeanMertz mentioned this issue Feb 4, 2021

chore(remap): re-implement VRL parser/compiler #6353

Merged

7 tasks

JeanMertz closed this as completed in #6353 Feb 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch VRL parser from Pest to Nom #6139

switch VRL parser from Pest to Nom #6139

JeanMertz commented Jan 19, 2021 •

edited

Loading

switch VRL parser from Pest to Nom #6139

switch VRL parser from Pest to Nom #6139

Comments

JeanMertz commented Jan 19, 2021 • edited Loading

JeanMertz commented Jan 19, 2021 •

edited

Loading