-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Range formatting: Fix invalid syntax after parenthesizing expression #9751
Conversation
|
Require explicit `FormatElement::SourcePosition` elements over implicitly adding a source mapping for every token. This has the downside that it requires more `FormatElement`s to achieve a source mapping at the same detail level as today. However, we don't have a use case for token by token source maps today. We only want to map node positions where this is more than sufficient.
f6ac41a
to
8502cad
Compare
8502cad
to
22ba608
Compare
765c6a7
to
59a8f29
Compare
Are there any other downsides to removing these source map entries? Would it mean that we lose granularity in some cases (that we don't have a use for anyway right now)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -340,29 +337,25 @@ impl<Context> Format<Context> for SourcePosition { | |||
} | |||
} | |||
|
|||
/// Creates a text from a dynamic string with its optional start-position in the source document. | |||
/// Creates a text from a dynamic string. | |||
/// | |||
/// This is done by allocating a new string internally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function doesn't allocate any more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't, but it does when calling Format
. I can improve the documentation.
@charliermarsh Yeah, it's mainly if we want to have a full fidelity source maps where we want to map exact positions from source to dest and the other way round. However, that would still have the problem I'm facing today. So that needs fixing before it can become useful anyway. |
Summary
This PR fixes an issue where range formatting omitted the closing parentheses when formatting a range that end's at an expression that gets parenthesized.
The issue is there's no unambiguous mapping for the position after
"automatic"
in the following example:This gets reformatted to
The generated IR ending after "automatic" roughly is (where 139 is the position after the closing quote of
"automatic"
)The printer will generate multiple source map entries for this
Generating multiple mappings for the same position is correct because, depending on the use case, you either want to know where position 139 starts (you pick the first) or where it ends (you pick the last) in the formatted code. That's why the proper fix to this problem would be to include whether we look for the end or start of a node for both offsets passed to
Printed::slice_range
. This way, the source map lookup logic could pick the correct entry if multiple entries point to the same source location... However, this doesn't work for us because the suite ranges in the AST are off.Looking at the example from above again: Both the expression and the expression statement end at position 139. If the statement is the last in the body (which is the case in the example above), then the body, and with it, potentially the enclosing node, also ends at position 139 in our AST because the body's end position is equal to the last statement's end position. However, this is incorrect. The end offset of the body should be the position of the
Dedent
token (after the hard line break). We might fix this with the new parser, but it is as it is now.The problem with the body ending at offset 139 is that we incorrectly include the
hard_line_break
that ends the body even when formatting the expression statement alone because it's impossible to disambiguate the end of the expression statement from the end of the body. Including thehard_line_break
results in inserting a new line every time we format the expression statement.I'm not too fond of the solution taken by this PR but it works 🤷
First, we reduce the granularity of the source map to only generate positions we care about. We no longer generate source map entries for:
This solves the problem that we can't disambiguate between the end of the expression and the expression statement.
This also solves the problem where we can't disambiguate between the expression statement and the body because we'll now always take the first source mapping position. This can produce incorrect* results if we start inserting tokens around the statement, similar to the problem with parenthesizing expressions. However, I'm unaware of a case where we do this today or why this should be needed in the future. If it's needed in the future, than let's hope that we have our new parser by then and can fix the ranges of compound statements.
The first commit removes the automatic source map generation for text elements from the Printer. It now always requires explicit
source_position
element.The second commit changes the python formatter to only emit source positions for logical-line like nodes. It also fixes an issue where we didn't fix a comment's indent.
Test Plan
Added snapshot tests.