Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid panic in invalid pattern recovery #10966

Merged
merged 1 commit into from
Apr 16, 2024

Conversation

dhruvmanila
Copy link
Member

Summary

This PR avoids the panic when trying to recover from an invalid pattern.

The panic occurs because there's no way to represent x as y syntax as a Python expression. There were few cases which were not accounted for when working on this.

The cases where it would panic are the ones where only a subset of pattern is valid in a position but all patterns are valid in a nested position of the mentioned top-level pattern.

For example, consider a mapping pattern and the following points:

  • Key position can only be a MatchValue and MatchSingleton
  • MatchClass can contain a MatchAs as an argument
  • MatchAs pattern with both pattern and name (a as b) cannot be converted to a valid expression

Considering this, if a mapping pattern has a class pattern with an as pattern as one of it's argument, the parser would panic.

match subject:
	case {Foo(a as b): 1}: ...
#        ^^^^^^^^^^^^^^^^ mapping pattern
#         ^^^^^^^^^^^     mapping pattern key, class pattern
#             ^^^^^^      as pattern with both pattern and name

The same case is for complex literal patterns as well.

I'm not sure what the recovery should look like but for now it's better to avoid the panic and create an invalid expression node in it's place.

Test Plan

Add new test cases and update the snapshots.

@dhruvmanila dhruvmanila added bug Something isn't working parser Related to the parser fuzzer Surfaced via fuzzing. labels Apr 16, 2024
Comment on lines -145 to -176
TokenKind::Lpar | TokenKind::Lsqb => {
let pattern = self.parse_parenthesized_or_sequence_pattern();

if matches!(
pattern,
Pattern::MatchAs(ast::PatternMatchAs {
pattern: Some(_),
name: Some(_),
..
})
) {
// If the pattern is an `as` pattern with both a pattern and a name,
// then it's a parenthesized `as` pattern. For example:
//
// ```python
// match subject:
// case (x as y):
// pass
// ```
//
// In this case, we should return early as it's not a valid value
// for the class and complex literal pattern. This would lead to
// panic when trying to convert it to an expression because there's
// no way to represent it in the AST.
//
// We exit early in this case, while in others we continue parsing,
// even if it's invalid because we can recover from it.
return pattern;
}

pattern
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't required now that we don't panic.

Comment on lines -247 to 216
// SAFETY: The `parse_match_pattern_lhs` function can only return
// an `as` pattern if it's parenthesized and the recovery context
// makes sure that this closure is never called in that case.
// This is because `(` is not in the `MAPPING_PATTERN_START_SET`.
recovery::pattern_to_expr(pattern)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we don't panic so no need for the comment.

Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+5 -52 violations, +0 -0 fixes in 3 projects; 41 projects unchanged)

bokeh/bokeh (+0 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- src/bokeh/events.py:182:9: PLW0642 Invalid assignment to `cls` argument in class method

pandas-dev/pandas (+0 -51 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

- pandas/core/arrays/categorical.py:568:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1083:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1098:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1099:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1127:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1136:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1147:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1149:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1154:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1203:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1205:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1221:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1278:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1292:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1498:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:1717:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:2144:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:2166:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:455:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:806:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/arrays/datetimelike.py:997:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/frame.py:7715:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/frame.py:7728:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/frame.py:8071:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/frame.py:8083:21: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/frame.py:8123:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:3407:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:3568:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:7902:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:7905:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:7908:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:9165:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:9172:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/generic.py:9175:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/groupby/grouper.py:249:13: PLW0642 Invalid assignment to `cls` argument in class method
- pandas/core/indexes/base.py:1329:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/indexes/base.py:2130:9: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/indexes/base.py:2913:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/indexes/base.py:3061:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/indexes/base.py:3293:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/indexes/base.py:4384:13: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/indexes/base.py:5958:17: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/indexes/base.py:5962:20: PLW0642 Invalid assignment to `self` argument in instance method
- pandas/core/internals/blocks.py:1122:13: PLW0642 Invalid assignment to `self` argument in instance method
... 7 additional changes omitted for project

zulip/zulip (+5 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ scripts/lib/zulip_tools.py:601:1: E302 [*] Expected 2 blank lines, found 0
+ zerver/decorator.py:556:1: E302 [*] Expected 2 blank lines, found 0
+ zerver/lib/validator.py:268:1: E302 [*] Expected 2 blank lines, found 0
+ zproject/config.py:33:1: E302 [*] Expected 2 blank lines, found 0
+ zproject/config.py:56:1: E302 [*] Expected 2 blank lines, found 0

Changes by rule (2 rules affected)

code total + violation - violation + fix - fix
PLW0642 52 0 52 0 0
E302 5 5 0 0 0

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind opening an issue for reworking the pattern recovery? I'm okay removing the panic branch for the release but I think we need to rething the pattern recovery holistically and either recover less (e.g. return the left hand side if positioned at a - or + and the left isn't a number), or change the AST structure to allow for better recovery.

Comment on lines +403 to +415
arguments: Arguments {
range: 492..500,
args: [
Name(
ExprName {
range: 493..499,
id: "",
ctx: Invalid,
},
),
],
keywords: [],
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised that the parsing both drops the a and as b parts. Is this intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we're replacing the as pattern with an invalid expression node when the conversion takes place in pattern error recovery. If not, the parser would panic.

@dhruvmanila dhruvmanila merged commit 9dc4307 into dhruv/parser Apr 16, 2024
17 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/avoid-pattern-recovery-panic branch April 16, 2024 10:40
dhruvmanila added a commit that referenced this pull request Apr 16, 2024
## Summary

This PR avoids the panic when trying to recover from an invalid pattern.

The panic occurs because there's no way to represent `x as y` syntax as
a Python expression. There were few cases which were not accounted for
when working on this.

The cases where it would panic are the ones where only a subset of
pattern is valid in a position but all patterns are valid in a nested
position of the mentioned top-level pattern.

For example, consider a mapping pattern and the following points:
* Key position can only be a `MatchValue` and `MatchSingleton`
* `MatchClass` can contain a `MatchAs` as an argument
* `MatchAs` pattern with both pattern and name (`a as b`) cannot be
converted to a valid expression

Considering this, if a mapping pattern has a class pattern with an `as`
pattern as one of it's argument, the parser would panic.

```py
match subject:
	case {Foo(a as b): 1}: ...
#        ^^^^^^^^^^^^^^^^ mapping pattern
#         ^^^^^^^^^^^     mapping pattern key, class pattern
#             ^^^^^^      as pattern with both pattern and name
```

The same case is for complex literal patterns as well.

I'm not sure what the recovery should look like but for now it's better
to avoid the panic and create an invalid expression node in it's place.

## Test Plan

Add new test cases and update the snapshots.
dhruvmanila added a commit that referenced this pull request Apr 16, 2024
## Summary

This PR avoids the panic when trying to recover from an invalid pattern.

The panic occurs because there's no way to represent `x as y` syntax as
a Python expression. There were few cases which were not accounted for
when working on this.

The cases where it would panic are the ones where only a subset of
pattern is valid in a position but all patterns are valid in a nested
position of the mentioned top-level pattern.

For example, consider a mapping pattern and the following points:
* Key position can only be a `MatchValue` and `MatchSingleton`
* `MatchClass` can contain a `MatchAs` as an argument
* `MatchAs` pattern with both pattern and name (`a as b`) cannot be
converted to a valid expression

Considering this, if a mapping pattern has a class pattern with an `as`
pattern as one of it's argument, the parser would panic.

```py
match subject:
	case {Foo(a as b): 1}: ...
#        ^^^^^^^^^^^^^^^^ mapping pattern
#         ^^^^^^^^^^^     mapping pattern key, class pattern
#             ^^^^^^      as pattern with both pattern and name
```

The same case is for complex literal patterns as well.

I'm not sure what the recovery should look like but for now it's better
to avoid the panic and create an invalid expression node in it's place.

## Test Plan

Add new test cases and update the snapshots.
dhruvmanila added a commit that referenced this pull request Apr 17, 2024
## Summary

This PR avoids the panic when trying to recover from an invalid pattern.

The panic occurs because there's no way to represent `x as y` syntax as
a Python expression. There were few cases which were not accounted for
when working on this.

The cases where it would panic are the ones where only a subset of
pattern is valid in a position but all patterns are valid in a nested
position of the mentioned top-level pattern.

For example, consider a mapping pattern and the following points:
* Key position can only be a `MatchValue` and `MatchSingleton`
* `MatchClass` can contain a `MatchAs` as an argument
* `MatchAs` pattern with both pattern and name (`a as b`) cannot be
converted to a valid expression

Considering this, if a mapping pattern has a class pattern with an `as`
pattern as one of it's argument, the parser would panic.

```py
match subject:
	case {Foo(a as b): 1}: ...
#        ^^^^^^^^^^^^^^^^ mapping pattern
#         ^^^^^^^^^^^     mapping pattern key, class pattern
#             ^^^^^^      as pattern with both pattern and name
```

The same case is for complex literal patterns as well.

I'm not sure what the recovery should look like but for now it's better
to avoid the panic and create an invalid expression node in it's place.

## Test Plan

Add new test cases and update the snapshots.
dhruvmanila added a commit that referenced this pull request Apr 18, 2024
## Summary

This PR avoids the panic when trying to recover from an invalid pattern.

The panic occurs because there's no way to represent `x as y` syntax as
a Python expression. There were few cases which were not accounted for
when working on this.

The cases where it would panic are the ones where only a subset of
pattern is valid in a position but all patterns are valid in a nested
position of the mentioned top-level pattern.

For example, consider a mapping pattern and the following points:
* Key position can only be a `MatchValue` and `MatchSingleton`
* `MatchClass` can contain a `MatchAs` as an argument
* `MatchAs` pattern with both pattern and name (`a as b`) cannot be
converted to a valid expression

Considering this, if a mapping pattern has a class pattern with an `as`
pattern as one of it's argument, the parser would panic.

```py
match subject:
	case {Foo(a as b): 1}: ...
#        ^^^^^^^^^^^^^^^^ mapping pattern
#         ^^^^^^^^^^^     mapping pattern key, class pattern
#             ^^^^^^      as pattern with both pattern and name
```

The same case is for complex literal patterns as well.

I'm not sure what the recovery should look like but for now it's better
to avoid the panic and create an invalid expression node in it's place.

## Test Plan

Add new test cases and update the snapshots.
dhruvmanila added a commit that referenced this pull request Apr 18, 2024
(Supersedes #9152, authored by @LaBatata101)

## Summary

This PR replaces the current parser generated from LALRPOP to a
hand-written recursive descent parser.

It also updates the grammar for [PEP
646](https://peps.python.org/pep-0646/) so that the parser outputs the
correct AST. For example, in `data[*x]`, the index expression is now a
tuple with a single starred expression instead of just a starred
expression.

Beyond the performance improvements, the parser is also error resilient
and can provide better error messages. The behavior as seen by any
downstream tools isn't changed. That is, the linter and formatter can
still assume that the parser will _stop_ at the first syntax error. This
will be updated in the following months.

For more details about the change here, refer to the PR corresponding to
the individual commits and the release blog post.

## Test Plan

Write _lots_ and _lots_ of tests for both valid and invalid syntax and
verify the output.

## Acknowledgements

- @MichaReiser for reviewing 100+ parser PRs and continuously providing
guidance throughout the project
- @LaBatata101 for initiating the transition to a hand-written parser in
#9152
- @addisoncrump for implementing the fuzzer which helped
[catch](#10903)
[a](#10910)
[lot](#10966)
[of](#10896)
[bugs](#10877)

---------

Co-authored-by: Victor Hugo Gomes <[email protected]>
Co-authored-by: Micha Reiser <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fuzzer Surfaced via fuzzing. parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants