Diagnostics as pattern matching #93

c42f · 2022-09-05T11:04:58Z

I've been thinking about what we'd need for a diagnostics system which can really solve a couple of core problem I'm worrying about:

Accessibility: end users should be easily able to contribute new helpful and friendly diagnostics without understanding the code of the compiler frontend. Friendly comprehensible errors are most helpful to beginners, and beginners should be able to help writing these. But beginners will rarely be able to dive into JuliaSyntax.jl and make changes.

Cleanliness and separation of concerns: If possible I don't want to clutter the parser itself with large amounts of heuristic code and error/warning message formatting.

With these in mind, I want to claim that:

For a parser system where a syntax tree is always produced, compiler diagnostics (warnings, errors) are not really different from linter messages based on symbolic pattern matching

Therefore, we should be inspired linters like semgrep in using pattern matching techniques to match warnings and errors against the (partially broken) AST that the compiler produces. Ideally, errors and warnings could be expressed declaratively as a piece of malformed Julia code with placeholders which capture parts of that code and an error message template.

Discuss :-)

pfitzseb · 2022-09-13T09:53:22Z

One concern I had (not sure how real it is though) is that there's no canonical representation of invalid syntax, so pattern matching will end up being tied to a the parser internals.

c42f · 2022-09-14T04:27:49Z

there's no canonical representation of invalid syntax, so pattern matching will end up being tied to a the parser internals

My thought is that the canonical representation of invalid syntax is the text itself, as a string of broken source code. Then we "just" need the parser recovery to be predictable enough that we can map these broken syntax prototypes into a pattern automatically. (To be clear, I feel this is a very big "just". But it seems like the right approach.)

c42f · 2022-09-14T04:33:13Z

IMO, the key is to set things up so that we're building a database of broken syntax examples and the errors they should map to in "structured enough" form.

Then maintaining and adding to this database becomes the primary work of "having good syntax errors", and adding new examples should be easy.

This data driven approach is similar to building databases of linter errors such as semgrep seem to be doing with great success.

Bonus points for structuring the database so that it can be input to a more machine-learning style of pattern matching if necessary in the future.

gafter · 2023-08-12T21:51:06Z

There are an infinite number of ways to write a syntactically invalid program. The complement of a context-free language isn't necessarily context-free. Based on that, if we do have a system for reporting specific errors for specific patterns of input, the parser should have a fallback mechanism for reporting a syntax error when no pattern matches.

c42f added the design label Sep 5, 2022

c42f mentioned this issue Sep 13, 2022

Error recovery for unexpected continuation keywords #87

Open

c42f added the error messages Better, more actionable diagnostics label Sep 21, 2022

c42f mentioned this issue Jan 25, 2023

Display of diagnostic messages #150

Open

PallHaraldsson mentioned this issue Feb 12, 2024

Stack overflow when parsing 20k consecutive + signs #415

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diagnostics as pattern matching #93

Diagnostics as pattern matching #93

c42f commented Sep 5, 2022

pfitzseb commented Sep 13, 2022

c42f commented Sep 14, 2022 •

edited

Loading

c42f commented Sep 14, 2022 •

edited

Loading

gafter commented Aug 12, 2023

Diagnostics as pattern matching #93

Diagnostics as pattern matching #93

Comments

c42f commented Sep 5, 2022

pfitzseb commented Sep 13, 2022

c42f commented Sep 14, 2022 • edited Loading

c42f commented Sep 14, 2022 • edited Loading

gafter commented Aug 12, 2023

c42f commented Sep 14, 2022 •

edited

Loading

c42f commented Sep 14, 2022 •

edited

Loading