First string token is always considered as docstring #7595

dhruvmanila · 2023-09-22T10:05:36Z

In the docstring detection state machine, the first string token is always considered as docstring without considering the context. For example, the first strings in the following example are considered as docstrings:

"abcba".strip("ab")

On the opposite end, if multiple strings are implicitly concatenated then only the first string is considered as docstring:

"first" "second"

Python considers the entire string ("firststring") as the module docstring but the state machine will only say that "first" is docstring while "second" is a normal string.

The text was updated successfully, but these errors were encountered:

charliermarsh · 2023-09-29T02:03:41Z

Yeah, we need something more reliable here. I wish this could use our AST-based docstring detection -- that is, I wish we could just move these rules to the AST checker.

We could move them, but we'd then have to re-lex the strings in the AST for any implicit concatenations, which seems expensive.

MichaReiser · 2023-09-29T08:28:02Z

We could move them, but we'd then have to re-lex the strings in the AST for any implicit concatenations, which seems expensive.

Why is this necessary, considering that python detects the whole string as docsttring?

Let's see if @dhruvmanila ends up refactoring our string representation anyway to represent string parts in the AST.

charliermarsh · 2023-09-29T17:22:07Z

@MichaReiser -- Because the string could contain comments in-between the segments:

x = (
    "foo" # comment
    "bar"
)

This isn't a problem in the token-based model because we iterate over string segments and comment tokens. But in the AST, we'd need to delineate the string contents from the comments.

MichaReiser · 2023-11-03T01:46:05Z

This isn't a problem in the token-based model because we iterate over string segments and comment tokens. But in the AST, we'd need to delineate the string contents from the comments.

I see, but we would only need to do this for strings where implicit concatenated is true. So it shouldn't be as many? But I guess it is now more complicated with the fstrings support.

dhruvmanila · 2023-11-06T04:17:06Z

I haven't explored it fully but I don't think it'll be an issue once the new AST node for implicit string concatenation is complete.

charliermarsh · 2023-12-07T02:18:04Z

I think we can now solve this.

dhruvmanila · 2023-12-07T02:40:49Z

Yeah, I can look at it tonight, looks interesting :)

dhruvmanila · 2023-12-07T04:32:21Z

I think we would need to move the existing rules which uses the state machine to be based on AST instead of tokens. There aren't many -- RUF001-003 and flake8-quotes rules. I think the only reason they're token based is because of implicit string concatenation.

The implementation should probably be in the semantic model.

charliermarsh · 2023-12-07T04:39:32Z

Yeah ideally we would get rid of that state machine entirely.

## Summary This PR introduces a new semantic model flag `DOCSTRING` which suggests that the model is currently in a module / class / function docstring. This is the first step in eliminating the docstring detection state machine which is prone to bugs as stated in #7595. ## Test Plan ~TODO: Is there a way to add a test case for this?~ I tested this using the following code snippet and adding a print statement in the `string_like` analyzer to print if we're currently in a docstring or not. <details><summary>Test code snippet:</summary> <p> ```python "Docstring" ", still a docstring" "Not a docstring" def foo(): "Docstring" "Not a docstring" if foo: "Not a docstring" pass class Foo: "Docstring" "Not a docstring" foo: int "Unofficial variable docstring" def method(): "Docstring" "Not a docstring" pass def bar(): "Not a docstring".strip() def baz(): _something_else = 1 """Not a docstring""" ``` </p> </details>

## Summary Part of #7595 This PR moves the `RUF001` and `RUF002` rules to the AST checker. This removes the use of docstring detection from these rules. ## Test Plan As this is just a refactor, make sure existing test cases pass.

## Summary This PR introduces a new semantic model flag `DOCSTRING` which suggests that the model is currently in a module / class / function docstring. This is the first step in eliminating the docstring detection state machine which is prone to bugs as stated in astral-sh#7595. ## Test Plan ~TODO: Is there a way to add a test case for this?~ I tested this using the following code snippet and adding a print statement in the `string_like` analyzer to print if we're currently in a docstring or not. <details><summary>Test code snippet:</summary> <p> ```python "Docstring" ", still a docstring" "Not a docstring" def foo(): "Docstring" "Not a docstring" if foo: "Not a docstring" pass class Foo: "Docstring" "Not a docstring" foo: int "Unofficial variable docstring" def method(): "Docstring" "Not a docstring" pass def bar(): "Not a docstring".strip() def baz(): _something_else = 1 """Not a docstring""" ``` </p> </details>

## Summary Part of astral-sh#7595 This PR moves the `RUF001` and `RUF002` rules to the AST checker. This removes the use of docstring detection from these rules. ## Test Plan As this is just a refactor, make sure existing test cases pass.

## Summary Continuing with #7595, this PR moves the `Q001`, `Q002`, `Q003` rules to the AST based checker. ## Test Plan Make sure all of the existing test cases pass and verify there are no ecosystem changes.

## Summary Continuing with #7595, this PR moves the `Q004` rule to the AST checker. ## Test Plan - [x] Existing test cases should pass - [x] No ecosystem updates

charliermarsh · 2024-04-12T01:43:40Z

@dhruvmanila - I can't remember, can we close this?

dhruvmanila · 2024-04-12T03:04:42Z

No, the last rule (Q003) is remaining, maybe I'll pick that up this weekend. I remember that was a bit difficult because it supports PEP 701 syntax.

## Summary This PR moves the `Q003` rule to AST checker. This is the final rule that used the docstring detection state machine and thus this PR removes it as well. resolves: #7595 resolves: #7808 ## Test Plan - [x] `cargo test` - [x] Make sure there are no changes in the ecosystem

dhruvmanila added bug Something isn't working docstring Related to docstring linting or formatting labels Sep 22, 2023

dhruvmanila self-assigned this Dec 7, 2023

dhruvmanila mentioned this issue Feb 12, 2024

Make semantic model aware of docstring #9960

Merged

dhruvmanila mentioned this issue Feb 15, 2024

Move RUF001, RUF002 to AST checker #9993

Merged

dhruvmanila mentioned this issue Mar 9, 2024

Move Q001-3 to AST based checker #10312

Merged

dhruvmanila mentioned this issue Mar 24, 2024

Move Q004 to AST based checker #10548

Merged

2 tasks

dhruvmanila added a commit that referenced this issue Mar 25, 2024

Move Q004 to AST based checker (#10548)

e9115b8

## Summary Continuing with #7595, this PR moves the `Q004` rule to the AST checker. ## Test Plan - [x] Existing test cases should pass - [x] No ecosystem updates

dhruvmanila mentioned this issue Apr 14, 2024

Move Q003 to AST checker #10923

Merged

2 tasks

dhruvmanila closed this as completed in #10923 Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First string token is always considered as docstring #7595

First string token is always considered as docstring #7595

dhruvmanila commented Sep 22, 2023 •

edited

Loading

charliermarsh commented Sep 29, 2023

MichaReiser commented Sep 29, 2023

charliermarsh commented Sep 29, 2023

MichaReiser commented Nov 3, 2023

dhruvmanila commented Nov 6, 2023 •

edited

Loading

charliermarsh commented Dec 7, 2023

dhruvmanila commented Dec 7, 2023

dhruvmanila commented Dec 7, 2023

charliermarsh commented Dec 7, 2023

charliermarsh commented Apr 12, 2024

dhruvmanila commented Apr 12, 2024

First string token is always considered as docstring #7595

First string token is always considered as docstring #7595

Comments

dhruvmanila commented Sep 22, 2023 • edited Loading

charliermarsh commented Sep 29, 2023

MichaReiser commented Sep 29, 2023

charliermarsh commented Sep 29, 2023

MichaReiser commented Nov 3, 2023

dhruvmanila commented Nov 6, 2023 • edited Loading

charliermarsh commented Dec 7, 2023

dhruvmanila commented Dec 7, 2023

dhruvmanila commented Dec 7, 2023

charliermarsh commented Dec 7, 2023

charliermarsh commented Apr 12, 2024

dhruvmanila commented Apr 12, 2024

dhruvmanila commented Sep 22, 2023 •

edited

Loading

dhruvmanila commented Nov 6, 2023 •

edited

Loading