-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump tree-sitter-r 4 - Syntax vs Semantic Diagnostics #523
Conversation
else | ||
if (condition2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like, this is not a valid top level if statement
{ | ||
if (cond1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I carefully retained all the existing indentation here, I just bumped it out by 1 ident with the addition of the braces to make it valid R code
// TODO: This should be a syntax check, as the grammar should not allow | ||
// two `Argument` nodes side by side | ||
fn check_call_like_next_sibling( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// When we hit an `ERROR` node, i.e. a syntax error, it often has its own children | ||
// which can also be `ERROR`s. The goal is to target the deepest (most precise) `ERROR` | ||
// nodes and only report syntax errors for those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Special handling of ERROR
nodes that may have ERROR
children is here
d1ba3a4
to
fbf2300
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! It's nice to see we can get this far with the TS parser.
My main comment is about potentially excessive tree traversals in the syntactic pass, I think we could do better and avoid some unnecessary passes.
crates/ark/src/treesitter.rs
Outdated
pub(crate) fn node_find_string<'a>(node: &'a Node) -> Option<Node<'a>> { | ||
if node.is_string() { | ||
// Already on a string | ||
return Some(*node); | ||
} | ||
// If we are on one of the following, we return the string parent: | ||
// - Anonymous node inside a string, like `"'"` | ||
// - `NodeType::StringContent` | ||
// - `NodeType::EscapeSequence` | ||
node.find_parent(|parent| parent.is_string()) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could just use .ancestors()
since the starting node is inclusive.
Do we want .strict_ancestors()
for the non-inclusive case? I would expect .ancestors()
to be exclusive but we probably should follow rust-analyzer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hmm .ancestors()
not being exclusive is surprising to me, I was not expecting node
to be an "ancestor" of itself.
The unwrapped call makes more sense since it explicitly takes first
, making it slightly less surprising when you also get first
back on the first iteration
std::iter::successors(Some(*self), |p| p.parent())
fbf2300
to
8d55790
Compare
To get a combination of fixes that work nicely together
Can often occur when you have unmatched string delimiters, which there isn't much else we can do about
Otherwise we get a diagnostic about dplyr not being installed
90138d5
to
a1d9544
Compare
Otherwise example 34 now indents out one level
Bump tree-sitter-r 5 - Leading newline skipping, tree-sitter 0.23.0 update
Addresses posit-dev/positron#2943
This PR also contains #529 which was merged into it
Pulls in a whopping 34 more commits from tree-sitter-r r-lib/tree-sitter-r@63ee9b1...99bf614. It's really only 3 main changes. We pull in all 3 at once because they are a bit intermingled with each other.
unmatched_delimiter
r-lib/tree-sitter-r#92)
,}
, and]
tokensopen
andclose
fields everywhere appropriate r-lib/tree-sitter-r#119optional()
fields where possible r-lib/tree-sitter-r#132With these 3 changes, I was able to greatly improve our diagnostics engine.
It has been split into two parts - a syntax path, and a semantic path:
diagnostics.rs
holds the entrypoint and the semantic pathdiagnostics_syntactic.rs
holds the syntax pathIn a future PR I'll do a mostly pure "rearrangement" PR to clean this structure up a bit more. I haven't done that yet to make it clear what has been moved out of
diagnostics.rs
.Syntax diagnostics
Diagnostics based purely on
ERROR
andMISSING
nodes in the tree-sitter AST.MISSING
nodes give us a nice way of detecting missing closing delimiters like}
or)
and seems to work nicelyERROR
nodes allow us to report syntax errors in general. We don't get that much info from tree-sitter about exactly what went wrong, so I just reportSyntax error
most of the time.A few more improvements in this realm:
If a syntax error spans > 20 lines, it is now truncated to only show a squiggle on the
start_point
of the range, and it states that this is aSyntax error. Starts here and ends on row {row}
. This helps us avoid overwhelming the user in the (very few!) cases where this can still happen. In my experience, mostly when they have an unmatched string delimiter like a stray"
that causes everything after it to look like a weird unclosed string.ERROR
nodes are no longer "recursively" reported. i.e. it is very common for anERROR
node in tree-sitter to have children that are alsoERROR
nodes. Like what you see below. Previously we actually reported bothERROR
s! This was particularly problematic because those outerERROR
nodes often span a huge number of lines, while the innerERROR
nodes can be very precise and target the exact problem quite nicely. We now only report what I call "terminal"ERROR
nodes that don't have anyERROR
children. This greatly improved that "aggressive diagnostic" issue where the whole file would light up in squiggles.Semantic diagnostics
Semantic diagnostics are now run in a separate path from syntax diagnostics, this has the following really nice benefit - we only run semantic diagnostics on top level expressions (i.e. children of
root
) thatnode.has_error()
returnsfalse
for. In other words, we only consider running semantic diagnostics down a section of the tree if we know that section of the tree does not contain any syntax errors.This actually works quite nicely in practice.
ERROR
nodes to be wary ofImprovement examples
This shows the "truncation" idea once a syntax error spans >20 rows
Screen.Recording.2024-09-13.at.2.37.24.PM.mov
This shows improvements in the example in posit-dev/positron#2943, which used to light up the whole file
Screen.Recording.2024-09-13.at.2.39.33.PM.mov
This is a tree-sitter-r test file, with many syntax errors. Note that 1) it doesn't light up the whole file and 2) it still shows some semantic issues too (symbol not found errors)
Screen.Recording.2024-09-13.at.2.48.23.PM.mov
Improvements on the example from posit-dev/positron#4177. We often target the missing opening/closing node now. At the very end it shows that
}
doesn't have a matching opening{
, and I do think that is still a technically correct syntax error message, even if we'd like to show the unmatched(
(Positron does at least highlight that unmatched(
in red here, which is nice)Screen.Recording.2024-09-13.at.2.51.09.PM.mov