Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider line continuation character for re-lexing #12008

Merged
merged 4 commits into from
Jun 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# The newline character is being escaped which means that the lexer shouldn't be moved
# back to that position.
# https://github.com/astral-sh/ruff/issues/12004

f'middle {'string':\
'format spec'}

f'middle {'string':\\
'format spec'}

f'middle {'string':\\\
'format spec'}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
call(a, b, \\\

def bar():
pass
24 changes: 21 additions & 3 deletions crates/ruff_python_parser/src/lexer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1373,15 +1373,33 @@ impl<'src> Lexer<'src> {
}

let mut current_position = self.current_range().start();
let reverse_chars = self.source[..current_position.to_usize()].chars().rev();
let mut reverse_chars = self.source[..current_position.to_usize()]
.chars()
.rev()
.peekable();
let mut newline_position = None;

for ch in reverse_chars {
while let Some(ch) = reverse_chars.next() {
if is_python_whitespace(ch) {
current_position -= ch.text_len();
} else if matches!(ch, '\n' | '\r') {
current_position -= ch.text_len();
newline_position = Some(current_position);
// Count the number of backslashes before the newline character.
let mut backslash_count = 0;
while reverse_chars.next_if_eq(&'\\').is_some() {
backslash_count += 1;
}
Comment on lines +1388 to +1391
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comment :) Do we need to restrict the escape handling to cases where we know we're inside a string? Or wouldn't that work in case of an unterminated string literal?

I'm not sure if it matters because the parser is already in an error recovery state when encountering an escaped \ outside of a string, but it might be worth to add a test for it

test + a \\\
more

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow here. Do you mean to ask whether this logic needs to be restricted to only recovering within a string or not? I don't think so that is necessary, I'll add a test case for line continuation character encountered while re-lexing outside of a string.

if backslash_count == 0 {
// No escapes: `\n`
newline_position = Some(current_position);
} else {
if backslash_count % 2 == 0 {
// Even number of backslashes i.e., all backslashes cancel each other out
// which means the newline character is not being escaped.
newline_position = Some(current_position);
}
current_position -= TextSize::new('\\'.text_len().to_u32() * backslash_count);
}
} else {
break;
}
Expand Down
Loading
Loading