-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diagnostics as pattern matching #93
Comments
One concern I had (not sure how real it is though) is that there's no canonical representation of invalid syntax, so pattern matching will end up being tied to a the parser internals. |
My thought is that the canonical representation of invalid syntax is the text itself, as a string of broken source code. Then we "just" need the parser recovery to be predictable enough that we can map these broken syntax prototypes into a pattern automatically. (To be clear, I feel this is a very big "just". But it seems like the right approach.) |
IMO, the key is to set things up so that we're building a database of broken syntax examples and the errors they should map to in "structured enough" form. Then maintaining and adding to this database becomes the primary work of "having good syntax errors", and adding new examples should be easy. This data driven approach is similar to building databases of linter errors such as semgrep seem to be doing with great success. Bonus points for structuring the database so that it can be input to a more machine-learning style of pattern matching if necessary in the future. |
There are an infinite number of ways to write a syntactically invalid program. The complement of a context-free language isn't necessarily context-free. Based on that, if we do have a system for reporting specific errors for specific patterns of input, the parser should have a fallback mechanism for reporting a syntax error when no pattern matches. |
I've been thinking about what we'd need for a diagnostics system which can really solve a couple of core problem I'm worrying about:
Accessibility: end users should be easily able to contribute new helpful and friendly diagnostics without understanding the code of the compiler frontend. Friendly comprehensible errors are most helpful to beginners, and beginners should be able to help writing these. But beginners will rarely be able to dive into JuliaSyntax.jl and make changes.
Cleanliness and separation of concerns: If possible I don't want to clutter the parser itself with large amounts of heuristic code and error/warning message formatting.
With these in mind, I want to claim that:
Therefore, we should be inspired linters like semgrep in using pattern matching techniques to match warnings and errors against the (partially broken) AST that the compiler produces. Ideally, errors and warnings could be expressed declaratively as a piece of malformed Julia code with placeholders which capture parts of that code and an error message template.
Discuss :-)
The text was updated successfully, but these errors were encountered: