Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(pycodestyle): Remove regex captures #3735

Merged
merged 1 commit into from
Mar 28, 2023

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Mar 26, 2023

From the regex documentation:

Advice: Prefer in this order: is_match, find, captures.

source

This PR removes our usages of capture_iter with find_iter and and implements the whitespace-testing manually.

Performance

  • no-logical: pycode style rules disabled
  • pr3715: The base version with logical lines enabled
  • pr3735: This PR with logical lines enabled.
group                                      no-logical                             pr3735                                  pr3715
-----                                      ----------                             -----                                  ------
linter/all-rules/large/dataset.py          1.00      8.5±0.01ms     4.8 MB/sec    1.10      9.4±0.05ms     4.3 MB/sec    1.10      9.4±0.15ms     4.3 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      2.1±0.01ms     7.8 MB/sec    1.09      2.3±0.00ms     7.1 MB/sec    1.15      2.5±0.00ms     6.8 MB/sec
linter/all-rules/numpy/globals.py          1.00    247.9±3.25µs    11.9 MB/sec    1.10    272.6±1.25µs    10.8 MB/sec    1.16    286.9±3.46µs    10.3 MB/sec
linter/all-rules/pydantic/types.py         1.00      3.7±0.04ms     7.0 MB/sec    1.11      4.1±0.07ms     6.3 MB/sec    1.19      4.4±0.02ms     5.9 MB/sec
linter/default-rules/large/dataset.py      1.00      4.7±0.01ms     8.7 MB/sec    1.19      5.6±0.11ms     7.3 MB/sec    1.24      5.8±0.08ms     7.0 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00    999.5±5.57µs    16.7 MB/sec    1.22   1216.5±2.35µs    13.7 MB/sec    1.31  1304.4±16.73µs    12.8 MB/sec
linter/default-rules/numpy/globals.py      1.00    101.0±0.41µs    29.2 MB/sec    1.36    137.5±1.36µs    21.5 MB/sec    1.44    145.7±1.31µs    20.2 MB/sec
linter/default-rules/pydantic/types.py     1.00      2.1±0.01ms    11.9 MB/sec    1.19      2.5±0.06ms    10.0 MB/sec    1.26      2.7±0.03ms     9.5 MB/sec

We're getting closer, but a 36% regression for some cases is still heavy....

@github-actions
Copy link
Contributor

github-actions bot commented Mar 26, 2023

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
linter/all-rules/large/dataset.py          1.00     17.5±0.10ms     2.3 MB/sec    1.00     17.5±0.07ms     2.3 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.02      4.6±0.04ms     3.6 MB/sec    1.00      4.5±0.02ms     3.7 MB/sec
linter/all-rules/numpy/globals.py          1.00    620.0±1.39µs     4.8 MB/sec    1.00    621.6±1.77µs     4.7 MB/sec
linter/all-rules/pydantic/types.py         1.01      7.8±0.07ms     3.3 MB/sec    1.00      7.7±0.05ms     3.3 MB/sec
linter/default-rules/large/dataset.py      1.01      9.2±0.05ms     4.4 MB/sec    1.00      9.2±0.06ms     4.4 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00      2.1±0.01ms     8.1 MB/sec    1.00      2.1±0.01ms     8.1 MB/sec
linter/default-rules/numpy/globals.py      1.00    229.8±1.34µs    12.8 MB/sec    1.00    228.8±0.64µs    12.9 MB/sec
linter/default-rules/pydantic/types.py     1.00      4.3±0.02ms     5.9 MB/sec    1.00      4.3±0.04ms     5.9 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
linter/all-rules/large/dataset.py          1.01     15.0±0.03ms     2.7 MB/sec    1.00     14.9±0.03ms     2.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      4.1±0.02ms     4.1 MB/sec    1.00      4.0±0.02ms     4.1 MB/sec
linter/all-rules/numpy/globals.py          1.03    453.6±3.19µs     6.5 MB/sec    1.00    439.7±3.14µs     6.7 MB/sec
linter/all-rules/pydantic/types.py         1.01      6.7±0.02ms     3.8 MB/sec    1.00      6.6±0.02ms     3.9 MB/sec
linter/default-rules/large/dataset.py      1.04      8.3±0.03ms     4.9 MB/sec    1.00      8.0±0.03ms     5.1 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.02   1752.4±5.15µs     9.5 MB/sec    1.00   1722.7±8.36µs     9.7 MB/sec
linter/default-rules/numpy/globals.py      1.06    185.1±1.48µs    15.9 MB/sec    1.00    174.4±3.12µs    16.9 MB/sec
linter/default-rules/pydantic/types.py     1.03      3.8±0.02ms     6.7 MB/sec    1.00      3.7±0.02ms     6.9 MB/sec

@MichaReiser MichaReiser force-pushed the logical-lines-perf-improvements branch from f86cb60 to 2c8fbb7 Compare March 26, 2023 10:29
@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch from 7c82846 to 6ca18cc Compare March 26, 2023 10:29
@MichaReiser MichaReiser changed the title Avoid using Regex captures perf(pycodestyle): Remove regex captures Mar 26, 2023
@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch from 6ca18cc to 5f238d7 Compare March 26, 2023 11:17
@MichaReiser MichaReiser marked this pull request as ready for review March 26, 2023 11:29
@MichaReiser MichaReiser force-pushed the logical-lines-perf-improvements branch from 2c8fbb7 to 58b2a7a Compare March 26, 2023 18:34
@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch from 5f238d7 to ba4add9 Compare March 26, 2023 18:34
@MichaReiser MichaReiser force-pushed the logical-lines-perf-improvements branch from 58b2a7a to 8a08a96 Compare March 27, 2023 06:27
@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch from ba4add9 to 59e7f10 Compare March 27, 2023 06:27
Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I trust you to resolve any necessary issues based on discussion around the deviations, but just tag me if you want eyes on anything as you follow up.

@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch from 59e7f10 to d8c87f9 Compare March 27, 2023 20:59
@@ -9,10 +9,10 @@ expression: diagnostics
fixable: false
location:
row: 28
column: 1
column: 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked these numbers and they match pycodestyle

crates/ruff/resources/test/fixtures/pycodestyle/E27.py:28:2: E274 tab before keyword
crates/ruff/resources/test/fixtures/pycodestyle/E27.py:30:5: E274 tab before keyword

The other tests are fixed (very hacky solution but the next PR replaces it anyway)

@MichaReiser MichaReiser force-pushed the logical-lines-perf-improvements branch from 8a08a96 to f62f750 Compare March 27, 2023 21:08
@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch 2 times, most recently from 02f1bbd to b161131 Compare March 27, 2023 21:15
@MichaReiser MichaReiser force-pushed the logical-lines-perf-improvements branch from f62f750 to 7389522 Compare March 28, 2023 06:45
@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch from b161131 to d28f60f Compare March 28, 2023 06:45
Base automatically changed from logical-lines-perf-improvements to main March 28, 2023 07:09
@MichaReiser MichaReiser force-pushed the logical-lines_Avoid_Regex_captures branch from d28f60f to fea2767 Compare March 28, 2023 07:14
@MichaReiser MichaReiser merged commit 1d724b1 into main Mar 28, 2023
@MichaReiser MichaReiser deleted the logical-lines_Avoid_Regex_captures branch March 28, 2023 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants