-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of commented-out-code
(~50-80%)
#7706
Conversation
CodSpeed Performance ReportMerging #7706 will improve performances by 3.56%Comparing Summary
Benchmarks breakdown
|
Think I need to run the benchmarks locally with just this rule or something. |
PR Check ResultsEcosystem✅ ecosystem check detected no changes. |
@@ -62,16 +64,11 @@ pub(crate) fn comment_contains_code(line: &str, task_tags: &[String]) -> bool { | |||
return false; | |||
} | |||
|
|||
// Check that this is possibly code. | |||
if CODE_INDICATORS.iter().all(|symbol| !line.contains(symbol)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it helps performance but some of the above regex could be combined with https://docs.rs/regex/latest/regex/struct.RegexSet.html . But we may also be able to join the Regex by simply ORing them together (A | B | C) instead of having multiple regex expressions
90128ef
to
32c6c2a
Compare
32c6c2a
to
54f98f6
Compare
@@ -150,7 +125,6 @@ mod tests { | |||
fn comment_contains_code_with_print() { | |||
assert!(comment_contains_code("#print", &[])); | |||
assert!(comment_contains_code("#print(1)", &[])); | |||
assert!(comment_contains_code("#print 1", &[])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is Python 2 code.
@@ -127,7 +103,6 @@ mod tests { | |||
assert!(!comment_contains_code("# 123", &[])); | |||
assert!(!comment_contains_code("# 123.1", &[])); | |||
assert!(!comment_contains_code("# 1, 2, 3", &[])); | |||
assert!(!comment_contains_code("x = 1 # x = 1", &[])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method assumes you pass a comment, so this test doesn't really make sense.
|
||
if PARTIAL_DICTIONARY_REGEX.is_match(&line) { | ||
// If the comment matches any of the specified positive cases, assume it's code. | ||
if POSITIVE_CASES.is_match(line) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, combining HASH_NUMBER
and ALLOWLIST_REGEX
into a single regex ended up hurting performance, so I left those as two separate passes.
54f98f6
to
e84b724
Compare
crates/ruff_linter/Cargo.toml
Outdated
wsl = { version = "0.1.0" } | ||
aho-corasick = "1.1.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which of these styles do we want (my preference is <pkg> = "x.y.z"
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I prefer the version
one but I broke my rule.
e84b724
to
91aede8
Compare
commented-out-code
(50-80% improvement)
commented-out-code
(50-80% improvement)commented-out-code
(~50-80%)
Summary
This PR implements a variety of optimizations to improve performance of the Eradicate rule, which always shows up in all-rules benchmarks and bothers me. (These improvements are not hugely important, but it was kind of a fun Friday thing to spent a bit of time on.)
The improvements include:
aho-corasick
to speed an exact substring search.RegexSet
.\s*
and other pieces from the regular expressions (since we already trim strings before matching on them).Test Plan
I benchmarked this function in a standalone crate using a variety of cases. Criterion reports that this version is up to 80% faster, and almost every case is at least 50% faster: