Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch VRL parser from Pest to Nom #6139

Closed
JeanMertz opened this issue Jan 19, 2021 · 0 comments · Fixed by #6353
Closed

switch VRL parser from Pest to Nom #6139

JeanMertz opened this issue Jan 19, 2021 · 0 comments · Fixed by #6353
Assignees
Labels
domain: vrl Anything related to the Vector Remap Language type: tech debt A code change that does not add user value.

Comments

@JeanMertz
Copy link
Contributor

JeanMertz commented Jan 19, 2021

This issue is intended to track discussion around switching from Pest to Nom for the VRL parser.

I'm unsure of the reason why Pest was originally chosen (neither the RFC nor the initial implementation mention the rationale), but one reason appears to be the improved (pretty) error messages it produces.

We're no longer leveraging those messages though, since, while the output in the terminal looked nice, the actual message itself could be confusing, for example:

 --> 1:8
  |
1 | 1.4true
  |        ^---
  |
  = expected path_index

The fact that this mentions "path_index" is confusing to users. The reason why this is, is that you have to add negative expectations to each rule, and then convert that back into an error message for a syntax error like the above to be understandable, but it's tricky to add those, as you have to manipulate the syntax stack as you are parsing, which isn't easy to do in Pest.

Nom has been brought up a few times in the past, people seem to like it, and its strength over Pest is that the parser is written in pure Rust, and thus we get to leverage the type system, whereas Pest generates one big enum of "rules", requiring each part of the parser to accept one or two of those rules, and hard-error if any other unexpected rules match.


The reason why I'm creating this issue now, is that I'm starting to see how Pest is affecting the VRL development negatively. It's slowing us down when adding new features, and as more people start to iterate on the grammar — because we lack types to guide us — we're introducing minor hacks to get us to the desired state, without considering the impact on other areas of the parser. This in turns leads to a growth in the grammar file with duplicate or contradictory rules.

I have a strong sense that the function-based parser design promoted by Nom would help us with that situation. I suspect we'll get three positives outcome of this change:

  1. It makes it easier for others to iterate on the parser (as it's all Rust based, and all type checked).
  2. It removes ambiguity what's going on in the parser, as we no longer depend on an external macro that parses the external grammar.pest file.
  3. It would allow us to significantly improve our syntax error messages.
  4. It would make it easier to test individual parts of our parser instead of testing the entire parser as a whole.

Obviously, the biggest downside is the investment we made into the current parser, and the fact that we'd need to make another investment into a Nom-based parser.

I would suspect the following, though:

  1. Large chunks of our parser can remain, even if they have to be updated in minor ways.
  2. All logic outside the parser (individual expression types, etc) will remain untouched.
  3. Now that we have UI tests, we should have a clean upgrade path, allowing us to validate backward compatibility.
  4. The general "shape" of the parser will remain the same; each rule within our grammar has its own parsing function, even if the function implementation itself will be changed to use Nom instead of Pest.

We're not taking on this task right now, but this issue exists to solicit discussion, and track the potential work for after we ship 0.12.

@JeanMertz JeanMertz added type: tech debt A code change that does not add user value. needs: approval Needs review & approval before work can begin. domain: vrl Anything related to the Vector Remap Language labels Jan 19, 2021
@JeanMertz JeanMertz removed the needs: approval Needs review & approval before work can begin. label Jan 27, 2021
@JeanMertz JeanMertz self-assigned this Jan 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language type: tech debt A code change that does not add user value.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant