ISC001 fix can modify octal escape sequences #12936

dscorbett · 2024-08-16T15:44:47Z

The fix for ISC001 changes behavior when the first string ends with a one- or two-digit octal escape sequence and the second begins with an octal digit.

$ ruff --version
ruff 0.6.0
$ cat isc001.py
print("\12""0")
$ python isc001.py

0
$ ruff check --isolated --select ISC001 isc001.py --fix
Found 1 error (1 fixed, 0 remaining).
$ python isc001.py
P

The least disruptive fix would be to detect this specific situation and pad the escape sequence to three digits. In the above example, that would result in "\0120".

The text was updated successfully, but these errors were encountered:

AlexWaygood · 2024-08-17T15:01:59Z

The reason for this seems to be that CPython seems to process the octal escape before it concatenates the two strings when it parses the source to create the AST:

>>> import ast
>>> print(ast.dump(ast.parse(r'"\12" "0"')))
Module(body=[Expr(value=Constant(value='\n0'))], type_ignores=[])
>>> print(ast.dump(ast.parse(r'"\120"')))
Module(body=[Expr(value=Constant(value='P'))], type_ignores=[])

Our AST also understands this, but it seems like the fix for this rule doesn't.

dhruvmanila · 2024-08-20T15:11:19Z

Related #12753

dylwil3 · 2024-08-22T13:27:04Z

Related #12753

As you suggest there, a quick fix would be to move this rule and UP012 to an ast-based check instead of a token-based one. However, since the parser evaluates some escape sequences, this would result in replacing

"\12" "0"

with

"\n0"

for example.

If we instead want to preserve the escape sequence and suggest the fix "\0120", then we may have to re-implement 'half' of the parsing logic inside this rule (which is fine).

Is there a preference?

MichaReiser · 2024-08-22T13:30:47Z

I'm slightly leaning towards omitting the fix if there's an octal escape (or other escapes). Or is detecting the octal escape equally hard to emitting the right fix?

dylwil3 · 2024-08-23T02:46:44Z

My guess is that all three options

detecting octals at the end of the first string and skipping the fix
having the fix be ast-based (and therefore evaluating the octals prior to concatenating)
detecting octals at the end of the first string and normalizing the octal before concatenating

are about the same level of difficulty - none seem too difficult - with option (3) being a tinge fussier than (1) and (2).

dhruvmanila · 2024-08-23T04:45:57Z

At least for #12753, I think we should avoid omitting the fix if there's an escape sequence.

I remember talking with @MichaReiser about adding a token flag for strings which suggests that this string / f-string token contains an escape sequence. This flag would be set by the lexer as it's already looking at each character. But, I think it might be a bit difficult because the lexer would directly skip to the ending quote character using memchr

ruff/crates/ruff_python_parser/src/lexer.rs

Lines 945 to 946 in 028cb68

    
           let Some(index) = 
        
               memchr::memchr3(quote_byte, b'\r', b'\n', self.cursor.rest().as_bytes())

Another option would be to set the flag only on the AST node which should be easier to do because of the StringParser:

ruff/crates/ruff_python_ast/src/nodes.rs

Lines 1832 to 1834 in 028cb68

    
           bitflags! { 
        
               #[derive(Debug, Default, Copy, Clone, PartialEq, Eq, Hash)] 
        
               struct StringLiteralFlagsInner: u8 {

But, I would prefer if the flag could be added to the TokenFlags instead.

Another solution might be to move the rule to use the AST where we can access the concatenated value which would make it easier to detect the escape sequence. Once we detect the escape sequence, we can avoid emitting a fix for that string expression.

ruff/crates/ruff_python_ast/src/nodes.rs

Lines 1766 to 1769 in 028cb68

    
               /// Returns an iterator over the [`char`]s of each string literal part. 
        
               pub fn chars(&self) -> impl Iterator<Item = char> + Clone + '_ { 
        
                   self.iter().flat_map(|part| part.value.chars()) 
        
               }

Curious to hear @MichaReiser's opinion.

MichaReiser · 2024-08-24T12:38:25Z

I'm leaning towards a local fix in ISC001. It seems simple enough to look at the individual literal elements and search backward. Flagging whether the string contains any escape seems overly aggressive and would remove the fix even when the escape isn't at the end of the string. Migrating to AST rules seems fine to me, but only if we are okay with the fix replacing other escapes (which I think it should not).

We only need to be careful about that \\01 is not an octal escape but \\\01 is.

dhruvmanila · 2024-08-26T04:57:19Z

Yeah, I think implementing a local fix seems reasonable to me. What do you think? @dylwil3

dylwil3 · 2024-08-26T05:08:35Z

Sounds good, I'll give it a shot!

AlexWaygood added bug Something isn't working fixes Related to suggested fixes for violations help wanted Contributions especially welcome labels Aug 17, 2024

dylwil3 mentioned this issue Aug 26, 2024

[flake8-implicit-str-concat] Normalize octals before merging concatenated strings in single-line-implicit-string-concatenation (ISC001) #13118

Merged

AlexWaygood closed this as completed in #13118 Aug 27, 2024

dscorbett mentioned this issue Feb 21, 2025

PLE2514 fix should be marked unsafe and can modify octal escape sequences #16309

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISC001 fix can modify octal escape sequences #12936

ISC001 fix can modify octal escape sequences #12936

dscorbett commented Aug 16, 2024

AlexWaygood commented Aug 17, 2024

dhruvmanila commented Aug 20, 2024

dylwil3 commented Aug 22, 2024

MichaReiser commented Aug 22, 2024

dylwil3 commented Aug 23, 2024

dhruvmanila commented Aug 23, 2024

MichaReiser commented Aug 24, 2024 •

edited

Loading

dhruvmanila commented Aug 26, 2024

dylwil3 commented Aug 26, 2024

ISC001 fix can modify octal escape sequences #12936

ISC001 fix can modify octal escape sequences #12936

Comments

dscorbett commented Aug 16, 2024

AlexWaygood commented Aug 17, 2024

dhruvmanila commented Aug 20, 2024

dylwil3 commented Aug 22, 2024

MichaReiser commented Aug 22, 2024

dylwil3 commented Aug 23, 2024

dhruvmanila commented Aug 23, 2024

MichaReiser commented Aug 24, 2024 • edited Loading

dhruvmanila commented Aug 26, 2024

dylwil3 commented Aug 26, 2024

MichaReiser commented Aug 24, 2024 •

edited

Loading