Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pydocstyle] Escaped docstring in docstring (D301 ) #12192

Merged
merged 3 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions crates/ruff_linter/resources/test/fixtures/pydocstyle/D301.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,33 @@ def make_unique_pod_id(pod_id: str) -> str | None:

def shouldnt_add_raw_here2():
u"Sum\\mary."


def shouldnt_add_raw_for_docstring_contains_docstring():
"""
This docstring contains another docstring.

def foo():
\"\"\"Foo.\"\"\"
"""


def shouldnt_add_raw_for_docstring_contains_docstring2():
"""
This docstring contains another docstring.

def bar():
\"""Bar.\"""
"""


def shouldnt_add_raw_for_docstring_contains_escaped_triple_quotes():
"""
Escaped triple quote \""" or \"\"\".
"""


def should_add_raw_for_single_quote_escape():
"""
This is single quote escape \".
"""
27 changes: 27 additions & 0 deletions crates/ruff_linter/src/rules/pydocstyle/rules/backslashes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,35 @@ pub(crate) fn backslashes(checker: &mut Checker, docstring: &Docstring) {
// Docstring contains at least one backslash.
let body = docstring.body();
let bytes = body.as_bytes();
let mut backslash_index = 0;
let escaped_docstring_backslashes_pattern = b"\"\\\"\\\"";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely also need to handle single quotes here (i.e., escaped single quotes within single-quote docstrings).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But D301 is based on double quotes, do we need to cover single-quote docstring here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't verified this myself but the way I read the code is that docstrings are extracted from any string literal

/// Extract a docstring from a function or class body.
pub(crate) fn docstring_from(suite: &[Stmt]) -> Option<&ast::ExprStringLiteral> {
let stmt = suite.first()?;
// Require the docstring to be a standalone expression.
let Stmt::Expr(ast::StmtExpr { value, range: _ }) = stmt else {
return None;
};
// Only match strings.
value.as_string_literal_expr()
}

if memchr_iter(b'\\', bytes).any(|position| {
let escaped_char = bytes.get(position.saturating_add(1));
// Allow escaped docstring.
if matches!(escaped_char, Some(b'"')) {
// If the next chars is equal to `"""`, it is a escaped docstring pattern.
let escaped_triple_quotes =
&bytes[position.saturating_add(1)..position.saturating_add(4)];
if escaped_triple_quotes == b"\"\"\"" {
return false;
}
// For the `"\"\"` pattern, each iteration advances by 2 characters.
// For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this assumption is correct and this might actually a bug in the existing implementation. For example, the function passed to any will be called twice for \\, once for each backslash position but the offsets aren't to indices apart.

What I understand is that you want to track if you're at the beginning of an escape sequence.

This is not fully fledged out, but I think we may have to rewrite the entire loop

    while let Some(position) = memchr::memchr(b'\\', &bytes[offset..]) {
        let after_escape = &body[position + 1..];
        let next_char_len = after_escape.chars().next().unwrap_or_default();

        let Some(escaped_char) = &after_escape.chars().next() else {
            break;
        };

        if matches!(escaped_char, '"' | '\'') {
            let is_escaped_triple =
                after_escape.starts_with("\"\"\"") || after_escape.starts_with("\'\'\'");

            if is_escaped_triple {
                // don't add a diagnostic
            }
            
            if position != 0 && offset == position {
                // An escape sequence, e.g. `\a\b`
            }
        }
        
        

        offset = position + escaped_char.len_utf8();
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This helps a lot!

// Therefore, we utilize an index to keep track of the remaining characters.
let escaped_quotes_backslashes = &bytes
[position.saturating_add(1)..position.saturating_add(6 - backslash_index * 2)];
if escaped_quotes_backslashes
== &escaped_docstring_backslashes_pattern[backslash_index * 2..]
{
backslash_index += 1;
// Reset to avoid overflow.
if backslash_index > 2 {
backslash_index = 0;
}
return false;
}
return true;
}
// Allow continuations (backslashes followed by newlines) and Unicode escapes.
!matches!(escaped_char, Some(b'\r' | b'\n' | b'u' | b'U' | b'N'))
}) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,22 @@ D301.py:37:5: D301 Use `r"""` if any backslashes in a docstring
|
= help: Add `r` prefix

D301.py:65:5: D301 [*] Use `r"""` if any backslashes in a docstring
|
64 | def should_add_raw_for_single_quote_escape():
65 | """
| _____^
66 | | This is single quote escape \".
67 | | """
| |_______^ D301
|
= help: Add `r` prefix

ℹ Unsafe fix
62 62 |
63 63 |
64 64 | def should_add_raw_for_single_quote_escape():
65 |- """
65 |+ r"""
66 66 | This is single quote escape \".
67 67 | """
Loading