-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error recovery in comma separated lists. #14509
Conversation
@@ -553,19 +553,27 @@ object Parsers { | |||
def inDefScopeBraces[T](body: => T, rewriteWithColon: Boolean = false): T = | |||
inBracesOrIndented(body, rewriteWithColon) | |||
|
|||
/** part { `separator` part } | |||
*/ | |||
def tokenSeparated[T](separator: Int, part: () => T): List[T] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I inlined this since it was only ever used for a comma separated list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That rang a bell. At the beginning of pandemic, I moved Scala 2 trailing comma handling into parser for better handling, so that is why this method still pulls its weight there.
I said on the PR that would forward-port it; apparently I looked and it seemed similar. But obviously I haven't done that yet. It must have slipped my mind during pandemic brain fog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and wrote up something to get rid of the weird commas. I'll put it up after this PR is in.
3097f54
to
ae845df
Compare
in.nextToken() | ||
ts += part() | ||
} | ||
if (expectedEnd != EMPTY && in.token != expectedEnd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in case we have more that one unexpected token for example foo(1 2 3, 4, 5)
this will not try to parse the rest of the param clause?
Maybe it makes sense to try to find the comma or end of line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntaxErrorOrIncomplete
skips to the next safe pint as a side effect, so we will see 2
is not a )
, issue an error, then jump to ,
and keep parsing. I'll add a third digit to one of the test cases to be sure.
@@ -1,5 +1,5 @@ | |||
class A[T] | |||
object o { | |||
// Testing compiler crash, this test should be modified when named type argument are completely implemented | |||
val x: A[T=Int, T=Int] = ??? // error: ']' expected, but '=' found // error | |||
val x: A[T=Int, T=Int] = ??? // error: ']' expected, but '=' found // error: ']' expected, but '=' found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this error complained about val x
not having a body, which doesn't make any sense.
3cc0a28
to
d28980b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great! I just have a few questions.
in.nextToken() | ||
ts += part() | ||
} | ||
if (expectedEnd != EMPTY && in.token != expectedEnd) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in case we have more that one unexpected token for example foo(1 2 3, 4, 5)
this will not try to parse the rest of the param clause?
Maybe it makes sense to try to find the comma or end of line?
@@ -2532,7 +2540,7 @@ object Parsers { | |||
if (leading == LBRACE || in.token == CASE) | |||
enumerators() | |||
else { | |||
val pats = patternsOpt() | |||
val pats = patternsOpt(EMPTY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why EMPTY
in this case? Shouldn't it be something like OUTDENT ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was to handle for x, y in (1,2) do
or for x, y in (1,2) yield
, so I think the expected end would have to be do
or yield
? Passing EMPTY
preserves the old behavior, so I figured it was safe. I could pass predicate down instead I suppose. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this if fine, the other possibility would be an Option
but I don't see it used anywhere in the class, so probably there is a reason for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was maybe being overly cautious but I didn't want to have a perf hit from creating a lot of unnecessary Some
wrappers.
5e9d681
to
415e6d5
Compare
415e6d5
to
547ed43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for me, but I would probably ask for a second opinion here 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does feel like a lot of effort for not that much of a gain. Error recovery is currently very streamlined and centralized. I think it can be improved, but my fear is that local improvements like this will make it harder to come up with a better overall design.
In that sense, even though it improves precision, I fear that #14463 was the already a step in the wrong direction, and this is more of the same. To pass context, I would instead try to rely on the currentRegion
abstraction. It tells us about what we expect to come, and we should make more use of it. For instance, in retrospect, instead of the solution in #14463 it would be more elegant to have a region that is a specialized version of InParens
that knows that elements can also be separated with commas. That avoids the ad-hoc passing of an additional argument to skip
. And likewise we should be able to avoid passing additional arguments of commaSeparated.
So, my proposal: Let's revert #14463 and base everything on regions. I have the impression that skip
could be a lot smarter than it is now, if it makes better use of this info.
I spent a little time trying to figure this out, but I'm not sure what you're looking for. If we want to fix the trailing comma bug fixed by #14517, then the parser has to pass down information to about whether it is a in delimited comma-separated list (e.g. Perhaps I did this PR a disservice by separating it from #14517, because it gives the appearance of only being relevant to error recovery. In fact, I think we need to pull the current comma-handling out of the scanner to get correct parsing behavior. Try to cram more information into the region, which I think currently exists largely to handle significant indentation, is the wrong way to go I think. I'll also note that the parser already passes down You could potentially make I spent a couple of hours trying to figure out a better way, but I'm not seeing it! Further tips appreciated. |
You can see my best attempt here if you're interested. |
This is another (much smaller) try that avoid passing down the You can see this change and #14463 rebased against main here. |
What is an undelimited comma separated list? |
I think it would be good to merge the three PRs on comma-separated lists with a single one that makes use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this would be superseded by #14695.
#14463 improves error recovery in comma-separated lists when an error occurs before a comma is encountered. The way
commaSeparated
currently works, there is still bad behavior when the individual element of a comma-separated list is successfully parsed, but not followed by a comma. For example, infoo(5 6, 7)
, the parser will successfully parse5
into an expression, terminate the comma-separated list (because it encounters6
instead of a comma, then pop up (producingfoo(5)
). The parser will then be in a state where it is looking for a terminal
), but instead it finds a
6and issues an error. This is okay behavior because a reasonable error is issued in the right place, but it means that everything after the
6` is parsed as if it occurred outside the comma-separated list, leading to potentially confusing errors.In this PR, we pass down the expected end token for a comma-separated list (where that is defined) and issue an error while still parsing the list if a part of the list is parsed without being followed by a comma or the expected terminal token. More code than I would like for such a small win, but I think the resulting behavior is worth it.