-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First string token is always considered as docstring #7595
Comments
Yeah, we need something more reliable here. I wish this could use our AST-based docstring detection -- that is, I wish we could just move these rules to the AST checker. We could move them, but we'd then have to re-lex the strings in the AST for any implicit concatenations, which seems expensive. |
Why is this necessary, considering that python detects the whole string as docsttring? Let's see if @dhruvmanila ends up refactoring our string representation anyway to represent string parts in the AST. |
@MichaReiser -- Because the string could contain comments in-between the segments: x = (
"foo" # comment
"bar"
) This isn't a problem in the token-based model because we iterate over string segments and comment tokens. But in the AST, we'd need to delineate the string contents from the comments. |
I see, but we would only need to do this for strings where implicit concatenated is true. So it shouldn't be as many? But I guess it is now more complicated with the |
I haven't explored it fully but I don't think it'll be an issue once the new AST node for implicit string concatenation is complete. |
I think we can now solve this. |
Yeah, I can look at it tonight, looks interesting :) |
I think we would need to move the existing rules which uses the state machine to be based on AST instead of tokens. There aren't many -- The implementation should probably be in the semantic model. |
Yeah ideally we would get rid of that state machine entirely. |
## Summary This PR introduces a new semantic model flag `DOCSTRING` which suggests that the model is currently in a module / class / function docstring. This is the first step in eliminating the docstring detection state machine which is prone to bugs as stated in #7595. ## Test Plan ~TODO: Is there a way to add a test case for this?~ I tested this using the following code snippet and adding a print statement in the `string_like` analyzer to print if we're currently in a docstring or not. <details><summary>Test code snippet:</summary> <p> ```python "Docstring" ", still a docstring" "Not a docstring" def foo(): "Docstring" "Not a docstring" if foo: "Not a docstring" pass class Foo: "Docstring" "Not a docstring" foo: int "Unofficial variable docstring" def method(): "Docstring" "Not a docstring" pass def bar(): "Not a docstring".strip() def baz(): _something_else = 1 """Not a docstring""" ``` </p> </details>
## Summary This PR introduces a new semantic model flag `DOCSTRING` which suggests that the model is currently in a module / class / function docstring. This is the first step in eliminating the docstring detection state machine which is prone to bugs as stated in astral-sh#7595. ## Test Plan ~TODO: Is there a way to add a test case for this?~ I tested this using the following code snippet and adding a print statement in the `string_like` analyzer to print if we're currently in a docstring or not. <details><summary>Test code snippet:</summary> <p> ```python "Docstring" ", still a docstring" "Not a docstring" def foo(): "Docstring" "Not a docstring" if foo: "Not a docstring" pass class Foo: "Docstring" "Not a docstring" foo: int "Unofficial variable docstring" def method(): "Docstring" "Not a docstring" pass def bar(): "Not a docstring".strip() def baz(): _something_else = 1 """Not a docstring""" ``` </p> </details>
## Summary Part of astral-sh#7595 This PR moves the `RUF001` and `RUF002` rules to the AST checker. This removes the use of docstring detection from these rules. ## Test Plan As this is just a refactor, make sure existing test cases pass.
@dhruvmanila - I can't remember, can we close this? |
No, the last rule ( |
In the docstring detection state machine, the first string token is always considered as docstring without considering the context. For example, the first strings in the following example are considered as docstrings:
On the opposite end, if multiple strings are implicitly concatenated then only the first string is considered as docstring:
Python considers the entire string (
"firststring"
) as the module docstring but the state machine will only say that"first"
is docstring while"second"
is a normal string.The text was updated successfully, but these errors were encountered: