Range formatting: Fix invalid syntax after parenthesizing expression #9751

MichaReiser · 2024-02-01T12:35:50Z

Summary

This PR fixes an issue where range formatting omitted the closing parentheses when formatting a range that end's at an expression that gets parenthesized.

The issue is there's no unambiguous mapping for the position after "automatic" in the following example:

def needs_parentheses( ) -> bool:
    return item.sizing_mode is None and item.width_policy == "auto" and item.height_policy == "automatic"

This gets reformatted to

def needs_parentheses() -> bool:
    return (
        item.sizing_mode is None
        and item.width_policy == "auto"
        and item.height_policy == "automatic"
    )

The generated IR ending after "automatic" roughly is (where 139 is the position after the closing quote of "automatic")

source_position(139), // end position from the expression statement
soft_line_break(),
if_group_breaks(&text(")"))
hard_line_break(), 
source_position(139)

The printer will generate multiple source map entries for this

139 => 164 // Right after "automatic"
139 => 169 // The closing parentheses
139 => 170 // After the closing parentheses
139 => 171 // After the hard line break

Generating multiple mappings for the same position is correct because, depending on the use case, you either want to know where position 139 starts (you pick the first) or where it ends (you pick the last) in the formatted code. That's why the proper fix to this problem would be to include whether we look for the end or start of a node for both offsets passed to Printed::slice_range. This way, the source map lookup logic could pick the correct entry if multiple entries point to the same source location... However, this doesn't work for us because the suite ranges in the AST are off.

Looking at the example from above again: Both the expression and the expression statement end at position 139. If the statement is the last in the body (which is the case in the example above), then the body, and with it, potentially the enclosing node, also ends at position 139 in our AST because the body's end position is equal to the last statement's end position. However, this is incorrect. The end offset of the body should be the position of the Dedent token (after the hard line break). We might fix this with the new parser, but it is as it is now.

The problem with the body ending at offset 139 is that we incorrectly include the hard_line_break that ends the body even when formatting the expression statement alone because it's impossible to disambiguate the end of the expression statement from the end of the body. Including the hard_line_break results in inserting a new line every time we format the expression statement.

I'm not too fond of the solution taken by this PR but it works 🤷

First, we reduce the granularity of the source map to only generate positions we care about. We no longer generate source map entries for:

Text elements
or anything below logical line level (expression, with items, etc)

This solves the problem that we can't disambiguate between the end of the expression and the expression statement.
This also solves the problem where we can't disambiguate between the expression statement and the body because we'll now always take the first source mapping position. This can produce incorrect* results if we start inserting tokens around the statement, similar to the problem with parenthesizing expressions. However, I'm unaware of a case where we do this today or why this should be needed in the future. If it's needed in the future, than let's hope that we have our new parser by then and can fix the ranges of compound statements.

The first commit removes the automatic source map generation for text elements from the Printer. It now always requires explicit source_position element.

The second commit changes the python formatter to only emit source positions for logical-line like nodes. It also fixes an issue where we didn't fix a comment's indent.

Test Plan

Added snapshot tests.

github-actions · 2024-02-01T12:53:59Z

`ruff-ecosystem` results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

sphinx-doc/sphinx (error)

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

sphinx-doc/sphinx (error)

ruff format --no-preview --exclude tests/roots/test-pycode/cp_1251_coded.py

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

Require explicit `FormatElement::SourcePosition` elements over implicitly adding a source mapping for every token. This has the downside that it requires more `FormatElement`s to achieve a source mapping at the same detail level as today. However, we don't have a use case for token by token source maps today. We only want to map node positions where this is more than sufficient.

charliermarsh · 2024-02-02T14:50:01Z

This can produce incorrect* results if we start inserting tokens around the statement, similar to the problem with parenthesizing expressions. However, I'm unaware of a case where we do this today or why this should be needed in the future. If it's needed in the future, than let's hope that we have our new parser by then and can fix the ranges of compound statements.

Are there any other downsides to removing these source map entries? Would it mean that we lose granularity in some cases (that we don't have a use for anyway right now)?

BurntSushi

LGTM!

BurntSushi · 2024-02-02T16:16:17Z

crates/ruff_formatter/src/builders.rs

@@ -340,29 +337,25 @@ impl<Context> Format<Context> for SourcePosition {
    }
 }

-/// Creates a text from a dynamic string with its optional start-position in the source document.
+/// Creates a text from a dynamic string.
+///
 /// This is done by allocating a new string internally.


It looks like this function doesn't allocate any more?

This function doesn't, but it does when calling Format. I can improve the documentation.

MichaReiser · 2024-02-02T16:27:53Z

@charliermarsh Yeah, it's mainly if we want to have a full fidelity source maps where we want to map exact positions from source to dest and the other way round. However, that would still have the problem I'm facing today. So that needs fixing before it can become useful anyway.

MichaReiser force-pushed the range-formatting-parentheses branch from f6ac41a to 8502cad Compare February 2, 2024 08:05

Range formatting: Fix invalid syntax after parenthesizing expression

22ba608

MichaReiser force-pushed the range-formatting-parentheses branch from 8502cad to 22ba608 Compare February 2, 2024 08:09

MichaReiser added internal An internal refactor or improvement formatter Related to the formatter labels Feb 2, 2024

MichaReiser requested review from charliermarsh and BurntSushi and removed request for charliermarsh February 2, 2024 08:19

MichaReiser marked this pull request as ready for review February 2, 2024 08:31

MichaReiser requested a review from charliermarsh February 2, 2024 08:41

fixup! Remove implicit source position from string tokens

59a8f29

MichaReiser force-pushed the range-formatting-parentheses branch from 765c6a7 to 59a8f29 Compare February 2, 2024 08:44

charliermarsh approved these changes Feb 2, 2024

View reviewed changes

BurntSushi approved these changes Feb 2, 2024

View reviewed changes

MichaReiser merged commit 4f7fb56 into main Feb 2, 2024
17 checks passed

MichaReiser deleted the range-formatting-parentheses branch February 2, 2024 16:56

MichaReiser mentioned this pull request Mar 4, 2024

Add support for caret repositioning with the formatter #10179

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Range formatting: Fix invalid syntax after parenthesizing expression #9751

Range formatting: Fix invalid syntax after parenthesizing expression #9751

MichaReiser commented Feb 1, 2024 •

edited

Loading

github-actions bot commented Feb 1, 2024 •

edited

Loading

charliermarsh commented Feb 2, 2024

BurntSushi left a comment

BurntSushi Feb 2, 2024

MichaReiser Feb 2, 2024

MichaReiser commented Feb 2, 2024

Range formatting: Fix invalid syntax after parenthesizing expression #9751

Range formatting: Fix invalid syntax after parenthesizing expression #9751

Conversation

MichaReiser commented Feb 1, 2024 • edited Loading

Summary

Test Plan

github-actions bot commented Feb 1, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

charliermarsh commented Feb 2, 2024

BurntSushi left a comment

Choose a reason for hiding this comment

BurntSushi Feb 2, 2024

Choose a reason for hiding this comment

MichaReiser Feb 2, 2024

Choose a reason for hiding this comment

MichaReiser commented Feb 2, 2024

MichaReiser commented Feb 1, 2024 •

edited

Loading

github-actions bot commented Feb 1, 2024 •

edited

Loading

`ruff-ecosystem` results