Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnostics as pattern matching #93

Open
c42f opened this issue Sep 5, 2022 · 4 comments
Open

Diagnostics as pattern matching #93

c42f opened this issue Sep 5, 2022 · 4 comments
Labels
design error messages Better, more actionable diagnostics

Comments

@c42f
Copy link
Member

c42f commented Sep 5, 2022

I've been thinking about what we'd need for a diagnostics system which can really solve a couple of core problem I'm worrying about:

Accessibility: end users should be easily able to contribute new helpful and friendly diagnostics without understanding the code of the compiler frontend. Friendly comprehensible errors are most helpful to beginners, and beginners should be able to help writing these. But beginners will rarely be able to dive into JuliaSyntax.jl and make changes.

Cleanliness and separation of concerns: If possible I don't want to clutter the parser itself with large amounts of heuristic code and error/warning message formatting.

With these in mind, I want to claim that:

For a parser system where a syntax tree is always produced, compiler diagnostics (warnings, errors) are not really different from linter messages based on symbolic pattern matching

Therefore, we should be inspired linters like semgrep in using pattern matching techniques to match warnings and errors against the (partially broken) AST that the compiler produces. Ideally, errors and warnings could be expressed declaratively as a piece of malformed Julia code with placeholders which capture parts of that code and an error message template.

Discuss :-)

@pfitzseb
Copy link
Member

One concern I had (not sure how real it is though) is that there's no canonical representation of invalid syntax, so pattern matching will end up being tied to a the parser internals.

@c42f
Copy link
Member Author

c42f commented Sep 14, 2022

there's no canonical representation of invalid syntax, so pattern matching will end up being tied to a the parser internals

My thought is that the canonical representation of invalid syntax is the text itself, as a string of broken source code. Then we "just" need the parser recovery to be predictable enough that we can map these broken syntax prototypes into a pattern automatically. (To be clear, I feel this is a very big "just". But it seems like the right approach.)

@c42f
Copy link
Member Author

c42f commented Sep 14, 2022

IMO, the key is to set things up so that we're building a database of broken syntax examples and the errors they should map to in "structured enough" form.

Then maintaining and adding to this database becomes the primary work of "having good syntax errors", and adding new examples should be easy.

This data driven approach is similar to building databases of linter errors such as semgrep seem to be doing with great success.

Bonus points for structuring the database so that it can be input to a more machine-learning style of pattern matching if necessary in the future.

@c42f c42f added the error messages Better, more actionable diagnostics label Sep 21, 2022
@gafter
Copy link

gafter commented Aug 12, 2023

There are an infinite number of ways to write a syntactically invalid program. The complement of a context-free language isn't necessarily context-free. Based on that, if we do have a system for reporting specific errors for specific patterns of input, the parser should have a fallback mechanism for reporting a syntax error when no pattern matches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design error messages Better, more actionable diagnostics
Projects
None yet
Development

No branches or pull requests

3 participants