Improve performance of `commented-out-code` (~50-80%) #7706

charliermarsh · 2023-09-29T03:21:13Z

Summary

This PR implements a variety of optimizations to improve performance of the Eradicate rule, which always shows up in all-rules benchmarks and bothers me. (These improvements are not hugely important, but it was kind of a fun Friday thing to spent a bit of time on.)

The improvements include:

Doing cheaper work first (checking for some explicit substrings upfront).
Using aho-corasick to speed an exact substring search.
Merging multiple regular expressions using a RegexSet.
Removing some unnecessary \s* and other pieces from the regular expressions (since we already trim strings before matching on them).

Test Plan

I benchmarked this function in a standalone crate using a variety of cases. Criterion reports that this version is up to 80% faster, and almost every case is at least 50% faster:

Eradicate/Detection/# Warn if we are installing over top of an existing installation. This can
                        time:   [101.84 ns 102.32 ns 102.82 ns]
                        change: [-77.166% -77.062% -76.943%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Eradicate/Detection/#from foo import eradicate
                        time:   [74.872 ns 75.096 ns 75.314 ns]
                        change: [-84.180% -84.131% -84.079%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Eradicate/Detection/# encoding: utf8
                        time:   [46.522 ns 46.862 ns 47.237 ns]
                        change: [-29.408% -28.918% -28.471%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe
Eradicate/Detection/# Issue #999
                        time:   [16.942 ns 16.994 ns 17.058 ns]
                        change: [-57.243% -57.064% -56.815%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Eradicate/Detection/# type: ignore
                        time:   [43.074 ns 43.163 ns 43.262 ns]
                        change: [-17.614% -17.390% -17.152%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Eradicate/Detection/# user_content_type, _ = TimelineEvent.objects.using(db_alias).get_or_create(
                        time:   [209.40 ns 209.81 ns 210.23 ns]
                        change: [-32.806% -32.630% -32.470%] (p = 0.00 < 0.05)
                        Performance has improved.
Eradicate/Detection/# this is = to that :(
                        time:   [72.659 ns 73.068 ns 73.473 ns]
                        change: [-68.884% -68.775% -68.655%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe
Eradicate/Detection/#except Exception:
                        time:   [92.063 ns 92.366 ns 92.691 ns]
                        change: [-64.204% -64.052% -63.909%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Eradicate/Detection/#print(1)
                        time:   [68.359 ns 68.537 ns 68.725 ns]
                        change: [-72.424% -72.356% -72.278%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
Eradicate/Detection/#'key': 1 + 1,
                        time:   [79.604 ns 79.865 ns 80.135 ns]
                        change: [-69.787% -69.667% -69.549%] (p = 0.00 < 0.05)
                        Performance has improved.

crates/ruff_linter/src/rules/eradicate/detection.rs

codspeed-hq · 2023-09-29T03:31:14Z

CodSpeed Performance Report

Merging #7706 will improve performances by 3.56%

_{Comparing charlie/eradicate (91aede8) with main (e9f8b91)}

Summary

⚡ 1 improvements
✅ 24 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`charlie/eradicate`	Change
⚡	`linter/default-rules[large/dataset.py]`	95.7 ms	92.4 ms	+3.56%

charliermarsh · 2023-09-29T03:32:46Z

Think I need to run the benchmarks locally with just this rule or something.

github-actions · 2023-09-29T03:40:45Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

MichaReiser · 2023-09-29T08:35:27Z

crates/ruff_linter/src/rules/eradicate/detection.rs

@@ -62,16 +64,11 @@ pub(crate) fn comment_contains_code(line: &str, task_tags: &[String]) -> bool {
        return false;
    }

-    // Check that this is possibly code.
-    if CODE_INDICATORS.iter().all(|symbol| !line.contains(symbol)) {


Not sure if it helps performance but some of the above regex could be combined with https://docs.rs/regex/latest/regex/struct.RegexSet.html . But we may also be able to join the Regex by simply ORing them together (A | B | C) instead of having multiple regex expressions

charliermarsh · 2023-09-29T19:18:58Z

crates/ruff_linter/src/rules/eradicate/detection.rs

@@ -150,7 +125,6 @@ mod tests {
    fn comment_contains_code_with_print() {
        assert!(comment_contains_code("#print", &[]));
        assert!(comment_contains_code("#print(1)", &[]));
-        assert!(comment_contains_code("#print 1", &[]));


This is Python 2 code.

charliermarsh · 2023-09-29T19:19:11Z

crates/ruff_linter/src/rules/eradicate/detection.rs

@@ -127,7 +103,6 @@ mod tests {
        assert!(!comment_contains_code("# 123", &[]));
        assert!(!comment_contains_code("# 123.1", &[]));
        assert!(!comment_contains_code("# 1, 2, 3", &[]));
-        assert!(!comment_contains_code("x = 1  # x = 1", &[]));


The method assumes you pass a comment, so this test doesn't really make sense.

charliermarsh · 2023-09-29T19:19:38Z

crates/ruff_linter/src/rules/eradicate/detection.rs

-
-    if PARTIAL_DICTIONARY_REGEX.is_match(&line) {
+    // If the comment matches any of the specified positive cases, assume it's code.
+    if POSITIVE_CASES.is_match(line) {


Interestingly, combining HASH_NUMBER and ALLOWLIST_REGEX into a single regex ended up hurting performance, so I left those as two separate passes.

konstin · 2023-09-29T19:43:56Z

crates/ruff_linter/Cargo.toml

 wsl = { version = "0.1.0" }
+aho-corasick = "1.1.1"


which of these styles do we want (my preference is <pkg> = "x.y.z")?

Oh I prefer the version one but I broke my rule.

charliermarsh commented Sep 29, 2023

View reviewed changes

crates/ruff_linter/src/rules/eradicate/detection.rs Show resolved Hide resolved

charliermarsh commented Sep 29, 2023

View reviewed changes

crates/ruff_linter/src/rules/eradicate/detection.rs Show resolved Hide resolved

charliermarsh commented Sep 29, 2023

View reviewed changes

crates/ruff_linter/src/rules/eradicate/detection.rs Show resolved Hide resolved

MichaReiser reviewed Sep 29, 2023

View reviewed changes

charliermarsh force-pushed the charlie/eradicate branch from 90128ef to 32c6c2a Compare September 29, 2023 19:16

charliermarsh marked this pull request as ready for review September 29, 2023 19:18

charliermarsh force-pushed the charlie/eradicate branch from 32c6c2a to 54f98f6 Compare September 29, 2023 19:18

charliermarsh commented Sep 29, 2023

View reviewed changes

charliermarsh force-pushed the charlie/eradicate branch from 54f98f6 to e84b724 Compare September 29, 2023 19:43

charliermarsh added the performance Potential performance improvement label Sep 29, 2023

konstin approved these changes Sep 29, 2023

View reviewed changes

charliermarsh added 2 commits September 29, 2023 16:03

Add some opts

6a54962

Improve performance of commented-out-code rule

91aede8

charliermarsh force-pushed the charlie/eradicate branch from e84b724 to 91aede8 Compare September 29, 2023 20:05

charliermarsh enabled auto-merge (squash) September 29, 2023 20:06

charliermarsh changed the title ~~Improve performance of commented-out-code rule~~ Improve performance of commented-out-code rule by 50-80% Sep 29, 2023

charliermarsh changed the title ~~Improve performance of commented-out-code rule by 50-80%~~ Improve performance of commented-out-code (50-80% improvement) Sep 29, 2023

charliermarsh changed the title ~~Improve performance of commented-out-code (50-80% improvement)~~ Improve performance of commented-out-code (~50-80%) Sep 29, 2023

charliermarsh disabled auto-merge September 29, 2023 20:06

charliermarsh enabled auto-merge (squash) September 29, 2023 20:06

charliermarsh merged commit 8c8988e into main Sep 29, 2023

charliermarsh deleted the charlie/eradicate branch September 29, 2023 20:13

branchvincent mentioned this pull request Oct 2, 2023

ruff 0.0.292 Homebrew/homebrew-core#149170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `commented-out-code` (~50-80%) #7706

Improve performance of `commented-out-code` (~50-80%) #7706

charliermarsh commented Sep 29, 2023 •

edited

Loading

codspeed-hq bot commented Sep 29, 2023 •

edited

Loading

charliermarsh commented Sep 29, 2023

github-actions bot commented Sep 29, 2023 •

edited

Loading

MichaReiser Sep 29, 2023 •

edited

Loading

charliermarsh Sep 29, 2023

charliermarsh Sep 29, 2023

charliermarsh Sep 29, 2023

konstin Sep 29, 2023 •

edited

Loading

charliermarsh Sep 29, 2023

Improve performance of commented-out-code (~50-80%) #7706

Improve performance of commented-out-code (~50-80%) #7706

Conversation

charliermarsh commented Sep 29, 2023 • edited Loading

Summary

Test Plan

codspeed-hq bot commented Sep 29, 2023 • edited Loading

CodSpeed Performance Report

Merging #7706 will improve performances by 3.56%

Summary

Benchmarks breakdown

charliermarsh commented Sep 29, 2023

github-actions bot commented Sep 29, 2023 • edited Loading

PR Check Results

Ecosystem

MichaReiser Sep 29, 2023 • edited Loading

Choose a reason for hiding this comment

charliermarsh Sep 29, 2023

Choose a reason for hiding this comment

charliermarsh Sep 29, 2023

Choose a reason for hiding this comment

charliermarsh Sep 29, 2023

Choose a reason for hiding this comment

konstin Sep 29, 2023 • edited Loading

Choose a reason for hiding this comment

charliermarsh Sep 29, 2023

Choose a reason for hiding this comment

Improve performance of `commented-out-code` (~50-80%) #7706

Improve performance of `commented-out-code` (~50-80%) #7706

charliermarsh commented Sep 29, 2023 •

edited

Loading

codspeed-hq bot commented Sep 29, 2023 •

edited

Loading

github-actions bot commented Sep 29, 2023 •

edited

Loading

MichaReiser Sep 29, 2023 •

edited

Loading

konstin Sep 29, 2023 •

edited

Loading