refactor(toml): Decouple parsing from interning system #12881

epage · 2023-10-25T20:02:40Z

What does this PR try to resolve?

To have a separate manifest API (#12801), we can't rely on interning because it might be used in longer-lifetime applications.

To keep this limited in scope, this only removes InternedString from manifest parsing. Everything else still uses InternedString.

How should we test and review this PR?

I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. See my comment for the script that managed the benchmarks.

Benchmarks:

$ ../dump/cargo-12801-bench.rs run
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run`
warning: `package.edition` is unspecified, defaulting to `2021`
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run`
$ hyperfine "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     119.3 ms ±   3.2 ms    [User: 98.6 ms, System: 20.3 ms]
  Range (min … max):   115.6 ms … 124.3 ms    24 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     119.4 ms ±   2.4 ms    [User: 98.0 ms, System: 21.1 ms]
  Range (min … max):   115.7 ms … 123.6 ms    24 runs

Summary
  ../cargo-old check ran
    1.00 ± 0.03 times faster than ../cargo-new check
$ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     20.249 s ±  0.392 s    [User: 157.719 s, System: 22.771 s]
  Range (min … max):   19.605 s … 21.123 s    10 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     20.123 s ±  0.212 s    [User: 156.156 s, System: 22.325 s]
  Range (min … max):   19.764 s … 20.420 s    10 runs

Summary
  ../cargo-new check ran
    1.01 ± 0.02 times faster than ../cargo-old check
$ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     21.105 s ±  0.465 s    [User: 156.482 s, System: 22.799 s]
  Range (min … max):   20.156 s … 22.010 s    10 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     21.358 s ±  0.538 s    [User: 156.187 s, System: 22.979 s]
  Range (min … max):   20.703 s … 22.462 s    10 runs

Summary
  ../cargo-old check ran
    1.01 ± 0.03 times faster than ../cargo-new check

Additional information

I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance.

Originally, I hoped to entirely replacing string interning. For that, I was looking at arcstr as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the Copys into Clones), I decided to scale back.

Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like hipstr, compact_str, flexstr, and ecow.
The first three give us more wiggle room and hipstr was the fastest of them, so I went with that.

I then double checked macro benchmarks, and realized hipstr made no difference and switched to String to keep things simple / with lower dependencies.

When doing this, I had created a type alias (TomlStr) for the string type so I could more easily swap it out if needed
(and not have to always write out a lifetime).
With just using String, I went ahead and dropped that.

rustbot · 2023-10-25T20:02:45Z

r? @ehuss

(rustbot has picked a reviewer for you, use r? to override)

bors · 2023-10-27T17:07:45Z

☔ The latest upstream changes (presumably #12884) made this pull request unmergeable. Please resolve the merge conflicts.

To have a separate manifest API (rust-lang#12801), we can't rely on interning because it might be used in longer-lifetime applications. I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance. Originally, I hoped to entirely replacing string interning. For that, I was looking at `arcstr` as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the `Copy`s into `Clone`s), I decided to scale back. Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like `hipstr`, `compact_str`, `flexstr`, and `ecow`. The first three give us more wiggle room and `hipstr` was the fastest of them, so I went with that. I then double checked macro benchmarks, and realized `hipstr` made no difference and switched to `String` to keep things simple / with lower dependencies. When doing this, I had created a type alias (`TomlStr`) for the string type so I could more easily swap it out if needed (and not have to always write out a lifetime). With just using `String`, I went ahead and dropped that. I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. ```console $ ../dump/cargo-12801-bench.rs run Finished dev [unoptimized + debuginfo] target(s) in 0.07s Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run` warning: `package.edition` is unspecified, defaulting to `2021` Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run` $ hyperfine "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 119.3 ms ± 3.2 ms [User: 98.6 ms, System: 20.3 ms] Range (min … max): 115.6 ms … 124.3 ms 24 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 119.4 ms ± 2.4 ms [User: 98.0 ms, System: 21.1 ms] Range (min … max): 115.7 ms … 123.6 ms 24 runs Summary ../cargo-old check ran 1.00 ± 0.03 times faster than ../cargo-new check $ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 20.249 s ± 0.392 s [User: 157.719 s, System: 22.771 s] Range (min … max): 19.605 s … 21.123 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 20.123 s ± 0.212 s [User: 156.156 s, System: 22.325 s] Range (min … max): 19.764 s … 20.420 s 10 runs Summary ../cargo-new check ran 1.01 ± 0.02 times faster than ../cargo-old check $ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check" Benchmark 1: ../cargo-old check Time (mean ± σ): 21.105 s ± 0.465 s [User: 156.482 s, System: 22.799 s] Range (min … max): 20.156 s … 22.010 s 10 runs Benchmark 2: ../cargo-new check Time (mean ± σ): 21.358 s ± 0.538 s [User: 156.187 s, System: 22.979 s] Range (min … max): 20.703 s … 22.462 s 10 runs Summary ../cargo-old check ran 1.01 ± 0.03 times faster than ../cargo-new check ```

ehuss · 2023-10-28T20:40:21Z

Thanks!

@bors r+

bors · 2023-10-28T20:40:22Z

📌 Commit acc52f3 has been approved by ehuss

It is now in the queue for this repository.

bors · 2023-10-28T20:41:31Z

⌛ Testing commit acc52f3 with merge d1830f6...

bors · 2023-10-28T21:24:36Z

☀️ Test successful - checks-actions
Approved by: ehuss
Pushing d1830f6 to master...

Update cargo 7 commits in 708383d620e183a9ece69b8fe930c411d83dee27..b4d18d4bd3db6d872892f6c87c51a02999b80802 2023-10-27 21:09:26 +0000 to 2023-10-31 18:19:10 +0000 - refactor(toml): Cleanup noticed on the way to rust-lang/cargo#12801 (rust-lang/cargo#12902) - feat(trim-paths): set env `CARGO_TRIM_PATHS` for build scripts (rust-lang/cargo#12900) - feat: implement RFC 3127 `-Ztrim-paths` (rust-lang/cargo#12625) - docs: clarify config to use vendored source is printed to stdout (rust-lang/cargo#12893) - Improve the margin calculation for the search command's UI (rust-lang/cargo#12890) - Add new packages to [workspace.members] automatically (rust-lang/cargo#12779) - refactor(toml): Decouple parsing from interning system (rust-lang/cargo#12881) r? ghost

rustbot assigned ehuss Oct 25, 2023

rustbot added A-cli Area: Command-line interface, option parsing, etc. A-manifest Area: Cargo.toml issues A-profiles Area: profiles A-workspaces Area: workspaces Command-remove S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 25, 2023

epage force-pushed the intern branch from cec022e to d21bd2a Compare October 27, 2023 15:28

epage force-pushed the intern branch 2 times, most recently from 905d1d5 to acc52f3 Compare October 28, 2023 02:15

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 28, 2023

bors merged commit d1830f6 into rust-lang:master Oct 28, 2023
20 checks passed

bors mentioned this pull request Oct 28, 2023

feat: implement RFC 3127 -Ztrim-paths #12625

Merged

8 tasks

epage deleted the intern branch October 29, 2023 00:25

epage mentioned this pull request Oct 30, 2023

Official API for parsing Cargo.tomls schema #12801

Closed

20 tasks

weihanglo mentioned this pull request Oct 31, 2023

Update cargo rust-lang/rust#117462

Merged

ehuss added this to the 1.75.0 milestone Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(toml): Decouple parsing from interning system #12881

refactor(toml): Decouple parsing from interning system #12881

epage commented Oct 25, 2023 •

edited

Loading

rustbot commented Oct 25, 2023

bors commented Oct 27, 2023

ehuss commented Oct 28, 2023

bors commented Oct 28, 2023

bors commented Oct 28, 2023

bors commented Oct 28, 2023

refactor(toml): Decouple parsing from interning system #12881

refactor(toml): Decouple parsing from interning system #12881

Conversation

epage commented Oct 25, 2023 • edited Loading

What does this PR try to resolve?

How should we test and review this PR?

Additional information

rustbot commented Oct 25, 2023

bors commented Oct 27, 2023

ehuss commented Oct 28, 2023

bors commented Oct 28, 2023

bors commented Oct 28, 2023

bors commented Oct 28, 2023

epage commented Oct 25, 2023 •

edited

Loading