-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(toml): Decouple parsing from interning system #12881
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
r? @ehuss (rustbot has picked a reviewer for you, use r? to override) |
rustbot
added
A-cli
Area: Command-line interface, option parsing, etc.
A-manifest
Area: Cargo.toml issues
A-profiles
Area: profiles
A-workspaces
Area: workspaces
Command-remove
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
Oct 25, 2023
☔ The latest upstream changes (presumably #12884) made this pull request unmergeable. Please resolve the merge conflicts. |
To have a separate manifest API (rust-lang#12801), we can't rely on interning because it might be used in longer-lifetime applications. I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance. Originally, I hoped to entirely replacing string interning. For that, I was looking at `arcstr` as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the `Copy`s into `Clone`s), I decided to scale back. Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like `hipstr`, `compact_str`, `flexstr`, and `ecow`. The first three give us more wiggle room and `hipstr` was the fastest of them, so I went with that. I then double checked macro benchmarks, and realized `hipstr` made no difference and switched to `String` to keep things simple / with lower dependencies. When doing this, I had created a type alias (`TomlStr`) for the string type so I could more easily swap it out if needed (and not have to always write out a lifetime). With just using `String`, I went ahead and dropped that. I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. ```console $ ../dump/cargo-12801-bench.rs run Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run` warning: `package.edition` is unspecified, defaulting to `2021` Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run` $ hyperfine "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms] Range (min … max): 115.6 ms … 124.3 ms 24 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms] Range (min … max): 115.7 ms … 123.6 ms 24 runs Summary ../cargo-old check ran 1.00 ± 0.03 times faster than ../cargo-new check $ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s] Range (min … max): 19.605 s … 21.123 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s] Range (min … max): 19.764 s … 20.420 s 10 runs Summary ../cargo-new check ran 1.01 ± 0.02 times faster than ../cargo-old check $ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s] Range (min … max): 20.156 s … 22.010 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s] Range (min … max): 20.703 s … 22.462 s 10 runs Summary ../cargo-old check ran 1.01 ± 0.03 times faster than ../cargo-new check ```
epage
force-pushed
the
intern
branch
2 times, most recently
from
October 28, 2023 02:15
905d1d5
to
acc52f3
Compare
Thanks! @bors r+ |
bors
added
S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
and removed
S-waiting-on-review
Status: Awaiting review from the assignee but also interested parties.
labels
Oct 28, 2023
☀️ Test successful - checks-actions |
8 tasks
20 tasks
bors
added a commit
to rust-lang-ci/rust
that referenced
this pull request
Oct 31, 2023
Update cargo 7 commits in 708383d620e183a9ece69b8fe930c411d83dee27..b4d18d4bd3db6d872892f6c87c51a02999b80802 2023-10-27 21:09:26 +0000 to 2023-10-31 18:19:10 +0000 - refactor(toml): Cleanup noticed on the way to rust-lang/cargo#12801 (rust-lang/cargo#12902) - feat(trim-paths): set env `CARGO_TRIM_PATHS` for build scripts (rust-lang/cargo#12900) - feat: implement RFC 3127 `-Ztrim-paths` (rust-lang/cargo#12625) - docs: clarify config to use vendored source is printed to stdout (rust-lang/cargo#12893) - Improve the margin calculation for the search command's UI (rust-lang/cargo#12890) - Add new packages to [workspace.members] automatically (rust-lang/cargo#12779) - refactor(toml): Decouple parsing from interning system (rust-lang/cargo#12881) r? ghost
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cli
Area: Command-line interface, option parsing, etc.
A-manifest
Area: Cargo.toml issues
A-profiles
Area: profiles
A-workspaces
Area: workspaces
Command-remove
S-waiting-on-bors
Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR try to resolve?
To have a separate manifest API (#12801), we can't rely on interning because it might be used in longer-lifetime applications.
To keep this limited in scope, this only removes
InternedString
from manifest parsing. Everything else still usesInternedString
.How should we test and review this PR?
I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. See my comment for the script that managed the benchmarks.
Benchmarks:
Additional information
I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance.
Originally, I hoped to entirely replacing string interning. For that, I was looking at
arcstr
as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of theCopy
s intoClone
s), I decided to scale back.Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like
hipstr
,compact_str
,flexstr
, andecow
.The first three give us more wiggle room and
hipstr
was the fastest of them, so I went with that.I then double checked macro benchmarks, and realized
hipstr
made no difference and switched toString
to keep things simple / with lower dependencies.When doing this, I had created a type alias (
TomlStr
) for the string type so I could more easily swap it out if needed(and not have to always write out a lifetime).
With just using
String
, I went ahead and dropped that.