Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CompactString for Identifier #12101

Merged
merged 4 commits into from
Jul 1, 2024
Merged

Use CompactString for Identifier #12101

merged 4 commits into from
Jul 1, 2024

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Jun 29, 2024

This PR introduces a small string create for ExprName and Identifier to reduce the number of allocations.

  • moves the Name struct from red_knot_python_semantic to ruff_python_ast
  • Change ExprName and Identifier to store a Name instead of a String
  • Change Name to use CompactString which shows better performance than smol_str

Performance improvement

  • Less time spent allocating
  • Less time spent deallocating (drop). This especially evident in the linter benchmarks

This change should also reduce peak-memory usage.

Why CompactString

CompactString shows better performance in the read path than smol_str. The only disadvantage compared to smol_str is that smol_str supports O(1) cloning. I had to update the red-knot symbol table to store references to avoid allocating new strings (it actually already allocated new strings, but we could have removed those allocations when the AST stores smol_str).

@MichaReiser MichaReiser changed the title Use CompactString for Identifie Use CompactString for Identifier Jun 29, 2024
Copy link

codspeed-hq bot commented Jun 29, 2024

CodSpeed Performance Report

Merging #12101 will improve performances by 7.54%

Comparing identifier-compact_str (43ea86c) with main (db6ee74)

Summary

⚡ 6 improvements
✅ 24 untouched benchmarks

Benchmarks breakdown

Benchmark main identifier-compact_str Change
lexer[large/dataset.py] 1.1 ms 1.1 ms +6.06%
lexer[numpy/ctypeslib.py] 227.2 µs 216.3 µs +5.06%
lexer[pydantic/types.py] 506.2 µs 481.6 µs +5.11%
lexer[unicode/pypinyin.py] 77.4 µs 74.2 µs +4.3%
linter/default-rules[pydantic/types.py] 1.9 ms 1.8 ms +7.54%
parser[numpy/ctypeslib.py] 953.7 µs 915 µs +4.23%

Copy link
Contributor

github-actions bot commented Jun 29, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -1 violations, +0 -0 fixes in 1 projects; 1 project error; 48 projects unchanged)

python/typeshed (+0 -1 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select E,F,FA,I,PYI,RUF,UP,W

- stdlib/_collections_abc.pyi:10:5: PYI057 Do not use `typing.ByteString`, which has unclear semantics and is deprecated

demisto/content (error)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`:
  - 'ignore' -> 'lint.ignore'
  - 'select' -> 'lint.select'
  - 'unfixable' -> 'lint.unfixable'
  - 'per-file-ignores' -> 'lint.per-file-ignores'
warning: `PGH001` has been remapped to `S307`.
warning: `PGH002` has been remapped to `G010`.
warning: `PLR1701` has been remapped to `SIM101`.
ruff failed
  Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled.

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
PYI057 1 0 1 0 0

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression
error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

demisto/content (error)

ruff format --preview --exclude Packs/ThreatQ/Integrations/ThreatQ/ThreatQ.py

warning: The top-level linter settings are deprecated in favour of their counterparts in the `lint` section. Please update the following options in `pyproject.toml`:
  - 'ignore' -> 'lint.ignore'
  - 'select' -> 'lint.select'
  - 'unfixable' -> 'lint.unfixable'
  - 'per-file-ignores' -> 'lint.per-file-ignores'
warning: `PGH001` has been remapped to `S307`.
warning: `PGH002` has been remapped to `G010`.
warning: `PLR1701` has been remapped to `SIM101`.
ruff failed
  Cause: Selection of deprecated rule `E999` is not allowed when preview is enabled.

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/gpt_actions_library/.gpt_action_getting_started.ipynb:11:1:1: Expected an expression
error: Failed to parse examples/gpt_actions_library/gpt_action_bigquery.ipynb:13:1:1: Expected an expression

@MichaReiser MichaReiser added the performance Potential performance improvement label Jun 29, 2024
@MichaReiser MichaReiser force-pushed the identifier-compact_str branch 3 times, most recently from 954f54b to dc7cc94 Compare June 29, 2024 13:40
@MichaReiser MichaReiser marked this pull request as ready for review June 29, 2024 13:40
@charliermarsh
Copy link
Member

Nice!

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I'm a fan of this, and it's a great sign that we can now do it and see such a significant, measurable impact.

@charliermarsh
Copy link
Member

How did you determine that CompactString shows better results than smol_str? (It seems like a totally reasonable conclusion to me, just curious.)

@MichaReiser
Copy link
Member Author

MichaReiser commented Jun 29, 2024

Sorry, I should have mentioned this in the PR description.

I first started by using smol_str because the O(1) cloning is nice, see #12099. The PR looked good at first because it significantly improved performance. But that was too good to be true and I realised that it mainly was because of an unnecessary allocation in parse_identifier. I also noticed that the lexer and parser benchmarks improved across the board, but that many linter benchmarks regressed, probably because accessing a string now required more branching. That's when I started to try out CompactString which showed better improvements in the Lexer and Parser benchmarks, without regressing the Linter benchmarks as much.

I rebased and reopened #12099 for a direct comparison.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks for doing this.

crates/ruff_python_ast/src/name.rs Outdated Show resolved Hide resolved
Comment on lines 238 to 241
pub(crate) fn try_make_suggestion(
&self,
name: String,
name: Name,
iter: &Expr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be useful to use Into<Name> in the functions like this? That way one can directly pass in String / &str and not worry about constructing the Name. If it's too much work (I guess it is?), it's fine not to do it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. I'm somewhat concerned about accidental monomorphization because some of the methods aren't trivial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to not do it, just wondered if it would be useful.

@MichaReiser MichaReiser force-pushed the identifier-compact_str branch from dc7cc94 to 4ffd07e Compare July 1, 2024 07:33
@MichaReiser MichaReiser requested a review from carljm as a code owner July 1, 2024 07:48
@MichaReiser MichaReiser merged commit 5109b50 into main Jul 1, 2024
20 checks passed
@MichaReiser MichaReiser deleted the identifier-compact_str branch July 1, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants