-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flake8-implicit-str-concat
] Normalize octals before merging concatenated strings in single-line-implicit-string-concatenation
(ISC001
)
#13118
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very clever. Good job! Two small points:
- The
.to_string()
call on line 175 will clone the underlying data, which could be quite costly -- and is unnecessary in the happy path, since most strings don't have octal escapes in them. We can avoid this by using aCow
-- something like this? (The diff is relative to your branch)
--- a/crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs
+++ b/crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs
@@ -1,3 +1,5 @@
+use std::borrow::Cow;
+
use itertools::Itertools;
use ruff_diagnostics::{Diagnostic, Edit, Fix, FixAvailability, Violation};
@@ -173,11 +175,11 @@ fn concatenate_strings(a_range: TextRange, b_range: TextRange, locator: &Locator
}
let mut a_body =
- a_text[a_leading_quote.len()..a_text.len() - a_trailing_quote.len()].to_string();
+ Cow::Borrowed(&a_text[a_leading_quote.len()..a_text.len() - a_trailing_quote.len()]);
let b_body = &b_text[b_leading_quote.len()..b_text.len() - b_trailing_quote.len()];
if a_leading_quote.find(['r', 'R']).is_none() {
- a_body = normalize_ending_octal(&a_body);
+ normalize_ending_octal(&mut a_body);
}
let concatenation = format!("{a_leading_quote}{a_body}{b_body}{a_trailing_quote}");
@@ -191,10 +193,10 @@ fn concatenate_strings(a_range: TextRange, b_range: TextRange, locator: &Locator
/// Pads an octal at the end of the string
/// to three digits, if necessary.
-fn normalize_ending_octal(text: &str) -> String {
+fn normalize_ending_octal(text: &mut Cow<'_, str>) {
// Early return for short strings
if text.len() < 2 {
- return text.to_string();
+ return;
}
let mut rev_bytes = text.bytes().rev();
@@ -202,20 +204,19 @@ fn normalize_ending_octal(text: &str) -> String {
// "\y" -> "\00y"
if has_odd_consecutive_backslashes(&rev_bytes) {
let prefix = &text[..text.len() - 2];
- return format!("{prefix}\\00{}", last_byte as char);
+ *text = Cow::Owned(format!("{prefix}\\00{}", last_byte as char));
}
// "\xy" -> "\0xy"
- if let Some(penultimate_byte @ b'0'..=b'7') = rev_bytes.next() {
+ else if let Some(penultimate_byte @ b'0'..=b'7') = rev_bytes.next() {
if has_odd_consecutive_backslashes(&rev_bytes) {
let prefix = &text[..text.len() - 3];
- return format!(
+ *text = Cow::Owned(format!(
"{prefix}\\0{}{}",
penultimate_byte as char, last_byte as char
- );
+ ));
}
}
}
- text.to_string()
}
- I wonder if it's necessary to normalize the ending octal in the first string if the second string doesn't start with a digit. E.g. if I understand correctly, something like
"\12" "foo"
will be fixed as"\012foo"
according to the logic in your PR -- but I think it's safe to not apply the normalization logic in this case, and instead fix it as\12foo
? What do you think?
crates/ruff_linter/src/rules/flake8_implicit_str_concat/rules/implicit.rs
Outdated
Show resolved
Hide resolved
Re putting the The borrow checker complained when I tried to implement the fix you suggested (assuming I did it correctly), because I borrow let mut rev_bytes = text.bytes().rev(); and then try to modify it with More fundamentally, I'm hoping you could help with a Rust confusion here. I would've thought that the let concatenation = format!("{a_leading_quote}{a_body}{b_body}{a_trailing_quote}"); Maybe a more minimal version of my question is: Do you gain much in terms or memory/performance by doing let a = &"xyz";
let s = format!("{a}")
// then a drops out of scope vs let a = "xyz".to_string();
let s = format!("{a}")
// then a drops out of scope ? Or does the compiler end up making those roughly the same? (Obviously the first code is better in this contrived example, but the question remains.) Sorry for the long-ish question, and thanks for the very helpful review! |
Ummmm... I'm not entirely sure exactly what optimisations are permitted here! You may well be right that the compiler is smart enough to "see through" the allocation and optimise it away -- but I'm not sure if that's a permitted optimisation, or if it's one the compiler's smart enough to make. I could go and try to investigate exactly whether it does make this optimisation or not -- but I'm not sure the exact optimisations the compiler is likely to make here are things we should be relying on anyway ;) So I think it's better to use a |
I pushed the change I was suggesting to your PR branch in 25805a2 :-) the key to avoiding the borrow-checker complaints is to have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again!
Makes sense, and thank you that was very helpful! |
single-line-implicit-string-concatenation (ISC001)
flake8-implicit-str-concat
] Normalize octals before merging concatenated strings in single-line-implicit-string-concatenation
(ISC001
)
This PR pads the last octal (if there is one) to three digits before concatenating strings in the fix for
ISC001
.For example:
"\12""0"
is fixed to"\0120"
.Closes #12936.