Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error recovery in comma separated lists. #14509

Closed
wants to merge 1 commit into from

Conversation

adampauls
Copy link
Contributor

#14463 improves error recovery in comma-separated lists when an error occurs before a comma is encountered. The way commaSeparated currently works, there is still bad behavior when the individual element of a comma-separated list is successfully parsed, but not followed by a comma. For example, in foo(5 6, 7), the parser will successfully parse 5 into an expression, terminate the comma-separated list (because it encounters 6 instead of a comma, then pop up (producing foo(5)). The parser will then be in a state where it is looking for a terminal ), but instead it finds a 6and issues an error. This is okay behavior because a reasonable error is issued in the right place, but it means that everything after the6` is parsed as if it occurred outside the comma-separated list, leading to potentially confusing errors.

In this PR, we pass down the expected end token for a comma-separated list (where that is defined) and issue an error while still parsing the list if a part of the list is parsed without being followed by a comma or the expected terminal token. More code than I would like for such a small win, but I think the resulting behavior is worth it.

@@ -553,19 +553,27 @@ object Parsers {
def inDefScopeBraces[T](body: => T, rewriteWithColon: Boolean = false): T =
inBracesOrIndented(body, rewriteWithColon)

/** part { `separator` part }
*/
def tokenSeparated[T](separator: Int, part: () => T): List[T] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I inlined this since it was only ever used for a comma separated list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That rang a bell. At the beginning of pandemic, I moved Scala 2 trailing comma handling into parser for better handling, so that is why this method still pulls its weight there.

scala/scala#8780

I said on the PR that would forward-port it; apparently I looked and it seemed similar. But obviously I haven't done that yet. It must have slipped my mind during pandemic brain fog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and wrote up something to get rid of the weird commas. I'll put it up after this PR is in.

@adampauls adampauls force-pushed the comma_separated_lookahead branch from 3097f54 to ae845df Compare February 17, 2022 20:30
in.nextToken()
ts += part()
}
if (expectedEnd != EMPTY && in.token != expectedEnd) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in case we have more that one unexpected token for example foo(1 2 3, 4, 5) this will not try to parse the rest of the param clause?

Maybe it makes sense to try to find the comma or end of line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntaxErrorOrIncomplete skips to the next safe pint as a side effect, so we will see 2 is not a ), issue an error, then jump to , and keep parsing. I'll add a third digit to one of the test cases to be sure.

@@ -1,5 +1,5 @@
class A[T]
object o {
// Testing compiler crash, this test should be modified when named type argument are completely implemented
val x: A[T=Int, T=Int] = ??? // error: ']' expected, but '=' found // error
val x: A[T=Int, T=Int] = ??? // error: ']' expected, but '=' found // error: ']' expected, but '=' found
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this error complained about val x not having a body, which doesn't make any sense.

@adampauls
Copy link
Contributor Author

cc @tgodzik , a better solution to the bad error message from #14463.

@adampauls adampauls force-pushed the comma_separated_lookahead branch 2 times, most recently from 3cc0a28 to d28980b Compare February 17, 2022 20:57
Copy link
Contributor

@tgodzik tgodzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great! I just have a few questions.

in.nextToken()
ts += part()
}
if (expectedEnd != EMPTY && in.token != expectedEnd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in case we have more that one unexpected token for example foo(1 2 3, 4, 5) this will not try to parse the rest of the param clause?

Maybe it makes sense to try to find the comma or end of line?

@@ -2532,7 +2540,7 @@ object Parsers {
if (leading == LBRACE || in.token == CASE)
enumerators()
else {
val pats = patternsOpt()
val pats = patternsOpt(EMPTY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why EMPTY in this case? Shouldn't it be something like OUTDENT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was to handle for x, y in (1,2) do or for x, y in (1,2) yield, so I think the expected end would have to be do or yield? Passing EMPTY preserves the old behavior, so I figured it was safe. I could pass predicate down instead I suppose. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this if fine, the other possibility would be an Option but I don't see it used anywhere in the class, so probably there is a reason for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was maybe being overly cautious but I didn't want to have a perf hit from creating a lot of unnecessary Some wrappers.

@adampauls adampauls force-pushed the comma_separated_lookahead branch 2 times, most recently from 5e9d681 to 415e6d5 Compare February 18, 2022 17:29
Copy link
Contributor

@tgodzik tgodzik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for me, but I would probably ask for a second opinion here 😅

@tgodzik tgodzik requested a review from odersky February 21, 2022 11:34
Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does feel like a lot of effort for not that much of a gain. Error recovery is currently very streamlined and centralized. I think it can be improved, but my fear is that local improvements like this will make it harder to come up with a better overall design.

In that sense, even though it improves precision, I fear that #14463 was the already a step in the wrong direction, and this is more of the same. To pass context, I would instead try to rely on the currentRegion abstraction. It tells us about what we expect to come, and we should make more use of it. For instance, in retrospect, instead of the solution in #14463 it would be more elegant to have a region that is a specialized version of InParens that knows that elements can also be separated with commas. That avoids the ad-hoc passing of an additional argument to skip. And likewise we should be able to avoid passing additional arguments of commaSeparated.

So, my proposal: Let's revert #14463 and base everything on regions. I have the impression that skip could be a lot smarter than it is now, if it makes better use of this info.

@adampauls
Copy link
Contributor Author

adampauls commented Feb 21, 2022

I spent a little time trying to figure this out, but I'm not sure what you're looking for. If we want to fix the trailing comma bug fixed by #14517, then the parser has to pass down information to about whether it is a in delimited comma-separated list (e.g. (a, b)) or an open comma-separated list (e.g. a, b as in for a, b <- ) that doesn't know what follows. So no matter what, you have to pass down information in all the places I've declared to commaSeparated whether it should expect a delimiter or not.

Perhaps I did this PR a disservice by separating it from #14517, because it gives the appearance of only being relevant to error recovery. In fact, I think we need to pull the current comma-handling out of the scanner to get correct parsing behavior. Try to cram more information into the region, which I think currently exists largely to handle significant indentation, is the wrong way to go I think. I'll also note that the parser already passes down Locations, which are like regions but handle orthogonal concerns.

You could potentially make skip a little smarter by inspecting the region, but when I tried to either add a new CommaSeparated region or add a flag to the existing InBraces and InParens regions, things started to break because the code that inspects regions (of which there is a lot, and it's already fairly brittle from what I can see) now need to think about whether to ignore these flags or not. I was not able to track down all the places that need to change to fix tests, but I'm fairly confident that more code ends up needing to think about comma-separated regions that shouldn't have to, just to avoid passing single flag to skip.

I spent a couple of hours trying to figure out a better way, but I'm not seeing it! Further tips appreciated.

@adampauls
Copy link
Contributor Author

adampauls commented Feb 21, 2022

You can see my best attempt here if you're interested.

@adampauls
Copy link
Contributor Author

adampauls commented Feb 21, 2022

This is another (much smaller) try that avoid passing down the skipAtComma bit. This one passes all current tests, but note that this version will not handle errors well in undelimited comma-separated lists because skip is no longer smart enough to skip to the next comma.

You can see this change and #14463 rebased against main here.

@odersky
Copy link
Contributor

odersky commented Mar 13, 2022

What is an undelimited comma separated list?

@odersky
Copy link
Contributor

odersky commented Mar 13, 2022

I think it would be good to merge the three PRs on comma-separated lists with a single one that makes use
of Scanner.Region.

@adampauls
Copy link
Contributor Author

adampauls commented Mar 15, 2022

Thanks, I made a single PR that uses Scanner.Region at #14534. An undelimited comma-separated list is one of these

val x, y, z = (1, 2, 3)

Tested here.

They also occur in derives clauses.

Copy link
Contributor

@odersky odersky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this would be superseded by #14695.

@adampauls adampauls closed this Mar 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants