Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(toml): Decouple parsing from interning system #12881

Merged
merged 1 commit into from
Oct 28, 2023

Conversation

epage
Copy link
Contributor

@epage epage commented Oct 25, 2023

What does this PR try to resolve?

To have a separate manifest API (#12801), we can't rely on interning because it might be used in longer-lifetime applications.

To keep this limited in scope, this only removes InternedString from manifest parsing. Everything else still uses InternedString.

How should we test and review this PR?

I had problems getting the cargo benchmarks running, so I did a quick and dirty benchmark that is end-to-end, covering fresh builds, clean builds, and resolution. I ran these against a fresh clone of cargo's code base. See my comment for the script that managed the benchmarks.

Benchmarks:

$ ../dump/cargo-12801-bench.rs run
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run`
warning: `package.edition` is unspecified, defaulting to `2021`
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run`
$ hyperfine "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     119.3 ms ±   3.2 ms    [User: 98.6 ms, System: 20.3 ms]
  Range (min … max):   115.6 ms … 124.3 ms    24 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     119.4 ms ±   2.4 ms    [User: 98.0 ms, System: 21.1 ms]
  Range (min … max):   115.7 ms … 123.6 ms    24 runs

Summary
  ../cargo-old check ran
    1.00 ± 0.03 times faster than ../cargo-new check
$ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     20.249 s ±  0.392 s    [User: 157.719 s, System: 22.771 s]
  Range (min … max):   19.605 s … 21.123 s    10 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     20.123 s ±  0.212 s    [User: 156.156 s, System: 22.325 s]
  Range (min … max):   19.764 s … 20.420 s    10 runs

Summary
  ../cargo-new check ran
    1.01 ± 0.02 times faster than ../cargo-old check
$ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     21.105 s ±  0.465 s    [User: 156.482 s, System: 22.799 s]
  Range (min … max):   20.156 s … 22.010 s    10 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     21.358 s ±  0.538 s    [User: 156.187 s, System: 22.979 s]
  Range (min … max):   20.703 s … 22.462 s    10 runs

Summary
  ../cargo-old check ran
    1.01 ± 0.03 times faster than ../cargo-new check

Additional information

I consulted https://github.com/rosetta-rs/string-rosetta-rs when deciding on what string type to use for performance.

Originally, I hoped to entirely replacing string interning. For that, I was looking at arcstr as it had a fast equality operator. However, that is only helpful so long as the two strings we are comparing came from the original source. Unsure how likely that is to happen (and daunted by converting all of the Copys into Clones), I decided to scale back.

Concerned about all of the small allocations when parsing a manifest, I assumed I'd need a string type with small-string optimizations, like hipstr, compact_str, flexstr, and ecow.
The first three give us more wiggle room and hipstr was the fastest of them, so I went with that.

I then double checked macro benchmarks, and realized hipstr made no difference and switched to String to keep things simple / with lower dependencies.

When doing this, I had created a type alias (TomlStr) for the string type so I could more easily swap it out if needed
(and not have to always write out a lifetime).
With just using String, I went ahead and dropped that.

@rustbot
Copy link
Collaborator

rustbot commented Oct 25, 2023

r? @ehuss

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added A-cli Area: Command-line interface, option parsing, etc. A-manifest Area: Cargo.toml issues A-profiles Area: profiles A-workspaces Area: workspaces Command-remove S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 25, 2023
@bors
Copy link
Contributor

bors commented Oct 27, 2023

☔ The latest upstream changes (presumably #12884) made this pull request unmergeable. Please resolve the merge conflicts.

To have a separate manifest API (rust-lang#12801), we can't rely on interning
because it might be used in longer-lifetime applications.

I consulted https://github.com/rosetta-rs/string-rosetta-rs when
deciding on what string type to use for performance.

Originally, I hoped to entirely replacing string interning.  For that, I
was looking at `arcstr` as it had a fast equality operator.  However,
that is only helpful so long as the two strings we are comparing came
from the original source.  Unsure how likely that is to happen (and
daunted by converting all of the `Copy`s into `Clone`s), I decided to
scale back.

Concerned about all of the small allocations when parsing a manifest, I
assumed I'd need a string type with small-string optimizations, like
`hipstr`, `compact_str`, `flexstr`, and `ecow`.
The first three give us more wiggle room and `hipstr` was the fastest of
them, so I went with that.

I then double checked macro benchmarks, and realized `hipstr` made no
difference and switched to `String` to keep things simple / with lower
dependencies.

When doing this, I had created a type alias (`TomlStr`) for the string
type so I could more easily swap it out if needed
(and not have to always write out a lifetime).
With just using `String`, I went ahead and dropped that.

I had problems getting the cargo benchmarks running, so I did a quick
and dirty benchmark that is end-to-end, covering fresh builds, clean
builds, and resolution.  I ran these against a fresh clone of cargo's
code base.
```console
$ ../dump/cargo-12801-bench.rs run
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target/debug/cargo -Zscript -Zmsrv-policy ../dump/cargo-12801-bench.rs run`
warning: `package.edition` is unspecified, defaulting to `2021`
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `/home/epage/.cargo/target/0a/7f4c1ab500f045/debug/cargo-12801-bench run`
$ hyperfine "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     119.3 ms ±   3.2 ms    [User: 98.6 ms, System: 20.3 ms]
  Range (min … max):   115.6 ms … 124.3 ms    24 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     119.4 ms ±   2.4 ms    [User: 98.0 ms, System: 21.1 ms]
  Range (min … max):   115.7 ms … 123.6 ms    24 runs

Summary
  ../cargo-old check ran
    1.00 ± 0.03 times faster than ../cargo-new check
$ hyperfine --prepare "cargo clean" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     20.249 s ±  0.392 s    [User: 157.719 s, System: 22.771 s]
  Range (min … max):   19.605 s … 21.123 s    10 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     20.123 s ±  0.212 s    [User: 156.156 s, System: 22.325 s]
  Range (min … max):   19.764 s … 20.420 s    10 runs

Summary
  ../cargo-new check ran
    1.01 ± 0.02 times faster than ../cargo-old check
$ hyperfine --prepare "cargo clean && rm -f Cargo.lock" "../cargo-old check" "../cargo-new check"
Benchmark 1: ../cargo-old check
  Time (mean ± σ):     21.105 s ±  0.465 s    [User: 156.482 s, System: 22.799 s]
  Range (min … max):   20.156 s … 22.010 s    10 runs

Benchmark 2: ../cargo-new check
  Time (mean ± σ):     21.358 s ±  0.538 s    [User: 156.187 s, System: 22.979 s]
  Range (min … max):   20.703 s … 22.462 s    10 runs

Summary
  ../cargo-old check ran
    1.01 ± 0.03 times faster than ../cargo-new check
```
@epage epage force-pushed the intern branch 2 times, most recently from 905d1d5 to acc52f3 Compare October 28, 2023 02:15
@ehuss
Copy link
Contributor

ehuss commented Oct 28, 2023

Thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Oct 28, 2023

📌 Commit acc52f3 has been approved by ehuss

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 28, 2023
@bors
Copy link
Contributor

bors commented Oct 28, 2023

⌛ Testing commit acc52f3 with merge d1830f6...

@bors
Copy link
Contributor

bors commented Oct 28, 2023

☀️ Test successful - checks-actions
Approved by: ehuss
Pushing d1830f6 to master...

@bors bors merged commit d1830f6 into rust-lang:master Oct 28, 2023
20 checks passed
@bors bors mentioned this pull request Oct 28, 2023
8 tasks
@epage epage deleted the intern branch October 29, 2023 00:25
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 31, 2023
Update cargo

7 commits in 708383d620e183a9ece69b8fe930c411d83dee27..b4d18d4bd3db6d872892f6c87c51a02999b80802
2023-10-27 21:09:26 +0000 to 2023-10-31 18:19:10 +0000
- refactor(toml): Cleanup noticed on the way to rust-lang/cargo#12801 (rust-lang/cargo#12902)
- feat(trim-paths): set env `CARGO_TRIM_PATHS` for build scripts (rust-lang/cargo#12900)
- feat: implement RFC 3127 `-Ztrim-paths` (rust-lang/cargo#12625)
- docs: clarify config to use vendored source is printed to stdout (rust-lang/cargo#12893)
- Improve the margin calculation for the search command's UI (rust-lang/cargo#12890)
- Add new packages to [workspace.members] automatically (rust-lang/cargo#12779)
- refactor(toml): Decouple parsing from interning system (rust-lang/cargo#12881)

r? ghost
@ehuss ehuss added this to the 1.75.0 milestone Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli Area: Command-line interface, option parsing, etc. A-manifest Area: Cargo.toml issues A-profiles Area: profiles A-workspaces Area: workspaces Command-remove S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants