-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pydocstyle] Escaped docstring in docstring (D301 ) #12192
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,8 +69,35 @@ pub(crate) fn backslashes(checker: &mut Checker, docstring: &Docstring) { | |
// Docstring contains at least one backslash. | ||
let body = docstring.body(); | ||
let bytes = body.as_bytes(); | ||
let mut backslash_index = 0; | ||
let escaped_docstring_backslashes_pattern = b"\"\\\"\\\""; | ||
if memchr_iter(b'\\', bytes).any(|position| { | ||
let escaped_char = bytes.get(position.saturating_add(1)); | ||
// Allow escaped docstring. | ||
if matches!(escaped_char, Some(b'"')) { | ||
// If the next chars is equal to `"""`, it is a escaped docstring pattern. | ||
let escaped_triple_quotes = | ||
&bytes[position.saturating_add(1)..position.saturating_add(4)]; | ||
if escaped_triple_quotes == b"\"\"\"" { | ||
return false; | ||
} | ||
// For the `"\"\"` pattern, each iteration advances by 2 characters. | ||
// For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this assumption is correct and this might actually a bug in the existing implementation. For example, the function passed to What I understand is that you want to track if you're at the beginning of an escape sequence. This is not fully fledged out, but I think we may have to rewrite the entire loop while let Some(position) = memchr::memchr(b'\\', &bytes[offset..]) {
let after_escape = &body[position + 1..];
let next_char_len = after_escape.chars().next().unwrap_or_default();
let Some(escaped_char) = &after_escape.chars().next() else {
break;
};
if matches!(escaped_char, '"' | '\'') {
let is_escaped_triple =
after_escape.starts_with("\"\"\"") || after_escape.starts_with("\'\'\'");
if is_escaped_triple {
// don't add a diagnostic
}
if position != 0 && offset == position {
// An escape sequence, e.g. `\a\b`
}
}
offset = position + escaped_char.len_utf8();
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you. This helps a lot! |
||
// Therefore, we utilize an index to keep track of the remaining characters. | ||
let escaped_quotes_backslashes = &bytes | ||
[position.saturating_add(1)..position.saturating_add(6 - backslash_index * 2)]; | ||
if escaped_quotes_backslashes | ||
== &escaped_docstring_backslashes_pattern[backslash_index * 2..] | ||
{ | ||
backslash_index += 1; | ||
// Reset to avoid overflow. | ||
if backslash_index > 2 { | ||
backslash_index = 0; | ||
} | ||
return false; | ||
} | ||
return true; | ||
} | ||
// Allow continuations (backslashes followed by newlines) and Unicode escapes. | ||
!matches!(escaped_char, Some(b'\r' | b'\n' | b'u' | b'U' | b'N')) | ||
}) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We likely also need to handle single quotes here (i.e., escaped single quotes within single-quote docstrings).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But D301 is based on double quotes, do we need to cover single-quote docstring here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't verified this myself but the way I read the code is that docstrings are extracted from any string literal
ruff/crates/ruff_linter/src/docstrings/extraction.rs
Lines 6 to 15 in 7d16f83