Consider line continuation character for re-lexing #12008

dhruvmanila · 2024-06-24T07:24:07Z

Summary

This PR fixes a bug where the re-lexing logic didn't consider the line continuation character being present before the newline character. This meant that the lexer was being moved back to the newline character which is actually ignored via \.

Considering the following code:

f'middle {'string':\
        'format spec'}

The old token stream is:

...
Colon 18..19
FStringMiddle 19..29 (flags = F_STRING)
Newline 20..21
Indent 21..29
String 29..42
Rbrace 42..43
...

Notice how the ranges are overlapping between the FStringMiddle token and the tokens emitted after moving the lexer backwards.

After this fix, the new token stream which is without moving the lexer backwards in this scenario:

FStringStart 0..2 (flags = F_STRING)
FStringMiddle 2..9 (flags = F_STRING)
Lbrace 9..10
String 10..18
Colon 18..19
FStringMiddle 19..29 (flags = F_STRING)
FStringEnd 29..30 (flags = F_STRING)
Name 30..36
Name 37..41
Unknown 41..44
Newline 44..45

fixes: #12004

Test Plan

Add a test case and update the snapshots.

github-actions · 2024-06-24T07:43:50Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

crates/ruff_python_parser/src/lexer.rs

MichaReiser · 2024-06-24T11:55:35Z

crates/ruff_python_parser/src/lexer.rs

+                    if let Some(second_slash) = reverse_chars.next_if_eq(&'\\') {
+                        // Line continuation character has been escaped: `\\\n`
+                        newline_position = Some(current_position);
+                        // Set the newline position before updating the current position.
+                        current_position -= first_slash.text_len() - second_slash.text_len();
+                    } else {


I think it's still more complicate than this. What about \\\ Here, we have an escaped backslash followed by a continuation :(

I see. I guess we'd need to count the number of backslashes and make a decision based on whether it's odd or even.

MichaReiser · 2024-06-24T16:06:26Z

crates/ruff_python_parser/src/lexer.rs

+                let mut backslash_count = 0;
+                while reverse_chars.next_if_eq(&'\\').is_some() {
+                    backslash_count += 1;
+                }


Last comment :) Do we need to restrict the escape handling to cases where we know we're inside a string? Or wouldn't that work in case of an unterminated string literal?

I'm not sure if it matters because the parser is already in an error recovery state when encountering an escaped \ outside of a string, but it might be worth to add a test for it

test + a \\\ more

I'm not sure I follow here. Do you mean to ask whether this logic needs to be restricted to only recovering within a string or not? I don't think so that is necessary, I'll add a test case for line continuation character encountered while re-lexing outside of a string.

## Summary This PR fixes a bug introduced in #12008 which didn't consider the two character newline after the line continuation character. For example, consider the following code highlighted with whitespaces: ```py call(foo # comment \\r\n \r\n def bar():\r\n ....pass\r\n ``` The lexer is at `def` when it's running the re-lexing logic and trying to move back to a newline character. It encounters `\n` and it's being escaped (incorrect) but `\r` is being escaped, so it moves the lexer to `\n` character. This creates an overlap in token ranges which causes the panic. ``` Name 0..4 Lpar 4..5 Name 5..8 Comment 9..20 NonLogicalNewline 20..22 <-- overlap between Newline 21..22 <-- these two tokens NonLogicalNewline 22..23 Def 23..26 ... ``` fixes: #12028 ## Test Plan Add a test case with line continuation and windows style newline character.

Consider line continuation char for re-lexing

56e4962

dhruvmanila added bug Something isn't working parser Related to the parser labels Jun 24, 2024

dhruvmanila requested a review from MichaReiser as a code owner June 24, 2024 07:24

dhruvmanila changed the title ~~Consider line continuation char for re-lexing~~ Consider line continuation character for re-lexing Jun 24, 2024

MichaReiser reviewed Jun 24, 2024

View reviewed changes

crates/ruff_python_parser/src/lexer.rs Outdated Show resolved Hide resolved

Consider the escaped line continuation character

f808f88

dhruvmanila requested a review from MichaReiser June 24, 2024 11:54

MichaReiser reviewed Jun 24, 2024

View reviewed changes

Take 2 on backslashes

48f8e7b

dhruvmanila requested a review from MichaReiser June 24, 2024 12:43

MichaReiser approved these changes Jun 24, 2024

View reviewed changes

Add test case for line continuation outside string

09a35c9

dhruvmanila enabled auto-merge (squash) June 25, 2024 02:10

dhruvmanila merged commit 68a8978 into main Jun 25, 2024
17 checks passed

dhruvmanila deleted the dhruv/re-lexing-newline-escape branch June 25, 2024 02:13

dhruvmanila mentioned this pull request Jun 26, 2024

Consider 2-character EOL before line continuation #12035

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider line continuation character for re-lexing #12008

Consider line continuation character for re-lexing #12008

dhruvmanila commented Jun 24, 2024

github-actions bot commented Jun 24, 2024 •

edited

Loading

MichaReiser Jun 24, 2024

dhruvmanila Jun 24, 2024

MichaReiser Jun 24, 2024

dhruvmanila Jun 25, 2024

Consider line continuation character for re-lexing #12008

Consider line continuation character for re-lexing #12008

Conversation

dhruvmanila commented Jun 24, 2024

Summary

Test Plan

github-actions bot commented Jun 24, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

MichaReiser Jun 24, 2024

Choose a reason for hiding this comment

dhruvmanila Jun 24, 2024

Choose a reason for hiding this comment

MichaReiser Jun 24, 2024

Choose a reason for hiding this comment

dhruvmanila Jun 25, 2024

Choose a reason for hiding this comment

github-actions bot commented Jun 24, 2024 •

edited

Loading

`ruff-ecosystem` results