-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors #15
Comments
Here is an example how tree-sitter-json produces it's CST. Given {a* The resulting CST would be:
As we can see tree-sitter-json parses entire source string and creates CST of Errors and Literals covering entire source string. |
The issue here is that I did not use the internal lexer, and I only produces valid tokens, so that there is no token produced if it encountered invalid state. And to increase the parsing speed, I intentionally wrote the lexer with avoiding forking the tree in mind, which in practice combines multiple tokens into ones, that's the reason why error recovery does not work here. I guess this issue can be fixed by producing tokens more granularly and loosely though I'm not sure what the performance impact will be, I'll see what I can do. |
@ikatyang thanks for describing the problem. That explains a lot. Error recovery and parsing the entire source string are two most important things that we were looking for with tree-sitter. I do have couple of additional questions, if you don't mind 1. Is there any chance this grammar will choose to use internal tree sitter lexer in future?
I think people use tree-sitter for two reasons - unmatched error recovery ability and performance. Unfortunately this grammar is killing error recovery in favor of performance.
Thank you very much. We already have a very ugly workaround for this by creating surrogate ERROR node with unparsed source string. If in the first iteration we can make this grammar parse entire string it would be a grand start. |
The external lexer must be involved since there some tokens that are not possible to be described by the
It'd be basically a new project with only reusable test cases IMO.
Sorry, I forgot to confirm if it works well in invalid state, I only checked if it shows errors in error state since I originally want to use it as an AST parser 😅.
A quick workaround would be to treat those invalid indent/dedent tokens as valid ones that make more sense in that position, though it'd result in treating invalid tokens as valid ones, is it OK for you? If so, I could create a PR for it, and you could build from that branch since I won't merge it as it's just a quick workaround with some downsides. |
No worries, we already have a quick & dirty workaround (creating surrogate errors). No need for another one. We'll wait for some systematic solution on master branch. |
Hi @ikatyang, Just pinging if there is any plan for this issue in some foreseeable feature |
It's still on my TODO list but I do not have time to work on it recently so I cannot give you an ETA for it. |
Closing in favor of tree-sitter-grammars#14 |
Hi @ikatyang,
I've been playing with how Errors and tree-sitter recovery mechanism work in this grammar and found some interesting cases that I'm not sure how to handle.
produces
What is happening here is that parser doesn't parse after
title: Sample API
. This makes it pretty unusable in for example Editor cases. Only part of the source string is parsed and the rest is either ignored or consumed by parser. I would expect CST to contain at least another Error object with additional children nodes.Did you ever bumped into this issue?
Thanks for any answer
The text was updated successfully, but these errors were encountered: