Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

toml_edit is pretty slow to compile #327

Closed
Tracked by #340 ...
ordian opened this issue Jul 31, 2022 · 12 comments · Fixed by #402
Closed
Tracked by #340 ...

toml_edit is pretty slow to compile #327

ordian opened this issue Jul 31, 2022 · 12 comments · Fixed by #402
Labels
A-parse Area: Parsing TOML C-enhancement Category: Raise on the bar on expectations

Comments

@ordian
Copy link
Member

ordian commented Jul 31, 2022

toml_edit is pretty slow to compile, one of the slowest dependencies of cargo. My suspicion is the number of macros for combine is the core problem (due to the extra parsing steps the compiler has to do). I want to explore using nom once some ergonomic improvements are made since it doesn't use macros.

Originally posted by @epage in #323 (comment)

@ordian
Copy link
Member Author

ordian commented Jul 31, 2022

It's funny I migrated the parser from nom to combine a while ago (#26) because of too many macros. Combine was mainly based on traits at the time. It would be ironic to go full circle.

Did some profiling:
Screenshot from 2022-07-31 17-42-35

$ cargo +nightly rustc -- -Z self-profile
$ summarize summarize toml_edit-554834.mm_profdata | head -20
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                             | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_passes                                      | 17.92s    | 18.750          | 17.95s   | 1          | 0.00ns                          |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| finish_ongoing_codegen                           | 16.27s    | 17.022          | 16.28s   | 1          | 0.00ns                          |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_codegen_emit_obj                     | 16.20s    | 16.949          | 16.20s   | 121        | 0.00ns                          |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_lto_optimize                                | 13.10s    | 13.699          | 13.10s   | 121        | 0.00ns                          |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_module_optimize                             | 12.63s    | 13.216          | 12.63s   | 121        | 0.00ns                          |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| LLVM_thin_lto_import                             | 3.39s     | 3.545           | 3.39s    | 121        | 0.00ns                          |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| typeck                                           | 3.15s     | 3.296           | 6.77s    | 1516       | 10.39ms                         |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_module_optimize                          | 1.86s     | 1.949           | 14.50s   | 121        | 0.00ns                          |
+--------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| codegen_module_perform_lto                       | 1.86s     | 1.948           | 35.04s   | 121        | 0.00ns                          |

Screenshot from 2022-07-31 17-28-08
It spends a lot of time with monomorphization and linking which suggests generics explosion.

❯ cargo llvm-lines | head -20
  Lines          Copies        Function name
  -----          ------        -------------
  884919 (100%)  15631 (100%)  (TOTAL)
   81961 (9.3%)    120 (0.8%)  combine::parser::sequence::<impl combine::parser::Parser<Input> for (A,B)>::parse_mode_impl
   59663 (6.7%)     79 (0.5%)  <combine::parser::combinator::AndThen<P,F> as combine::parser::Parser<Input>>::parse_mode_impl
   56629 (6.4%)     59 (0.4%)  <(Y,Z) as combine::parser::choice::ChoiceParser<Input>>::parse_mode_choice
   51181 (5.8%)    335 (2.1%)  <combine::parser::combinator::Map<P,F> as combine::parser::Parser<Input>>::parse_mode_impl
   42252 (4.8%)     37 (0.2%)  combine::parser::sequence::<impl combine::parser::Parser<Input> for (A,B,C)>::parse_mode_impl
   31767 (3.6%)    256 (1.6%)  combine::error::ParseResult<T,E>::map
   26308 (3.0%)     62 (0.4%)  combine::parser::ParseMode::parse_committed
   25442 (2.9%)     50 (0.3%)  <combine::parser::range::RecognizeWithValue<P> as combine::parser::Parser<Input>>::parse_mode
   21348 (2.4%)     68 (0.4%)  <combine::parser::repeat::Iter<Input,P,S,M> as core::iter::traits::iterator::Iterator>::next
   20884 (2.4%)   1402 (9.0%)  combine::parser::Parser::parse_mode
   14444 (1.6%)    628 (4.0%)  <combine::parser::PartialMode as combine::parser::ParseMode>::parse
   14369 (1.6%)     67 (0.4%)  combine::parser::sequence::PartialState2<A,B>::add_errors
   12562 (1.4%)     17 (0.1%)  combine::parser::range::parse_partial_range
   12328 (1.4%)    134 (0.9%)  combine::parser::sequence::PartialState2<A,B>::add_errors::{{closure}}
   11365 (1.3%)      6 (0.0%)  <(V,X,Y,Z) as combine::parser::choice::ChoiceParser<Input>>::parse_mode_choice
   11342 (1.3%)      8 (0.1%)  <(X,Y,Z) as combine::parser::choice::ChoiceParser<Input>>::parse_mode_choice
   10836 (1.2%)    774 (5.0%)  <combine::parser::FirstMode as combine::parser::ParseMode>::parse

I'd say rewriting the parser manually is still on the table and might be beneficial long-term (e.g. precise control over errors). Short-term we could try to reduce the number of macros and generics in toml_edit itself.

@epage
Copy link
Member

epage commented Aug 1, 2022

It's funny I migrated the parser from nom to combine a while ago (#26) because of too many macros. Combine was mainly based on traits at the time. It would be ironic to go full circle.

And combine can still be light on macros except in some specific cases.

Short-term we could try to reduce the number of macros and generics in toml_edit itself.

I had looked at trying to drop the macros where possible but combine's API is such a mess to work with because everything is a combinator, returning a generic parser, rather than nom's model of operating on the actual parsers themselves (which also simplifies some parser cases in my experience).

I'd say rewriting the parser manually is still on the table and might be beneficial long-term (e.g. precise control over errors).

Its interesting to see what perspectives exist in the community. I've heard from some that the use of a parser library in toml_edit made it more favorable to toml-rs. I've also seen others say to just write a parser by hand. I lean towards preferring parser libraries as I feel it makes the parsing code more maintainable unless there is a very specific and critical need otherwise.

@ordian
Copy link
Member Author

ordian commented Aug 1, 2022

I had looked at trying to drop the macros where possible but combine's API is such a mess to work with because everything is a combinator, returning a generic parser, rather than nom's model of operating on the actual parsers themselves (which also simplifies some parser cases in my experience).

Fair point, I haven't looked at the current state of nom. Feel free to explore this approach.

Its interesting to see what perspectives exist in the community. I've heard from some that the use of a parser library in toml_edit made it more favorable to toml-rs. I've also seen others say to just write a parser by hand. I lean towards preferring parser libraries as I feel it makes the parsing code more maintainable unless there is a very specific and critical need otherwise.

Interesting indeed. I do agree that using parser libraries result in more readable and maintainable code. The downside is that we lose precise control. Control over compile times, control over error messages and when things don't work as expected we're at the mercy of the library developers if they are complex enough. Another concern is security/vulnerability response.

That being said, I'm not sure whether the problem is with combine or how we (ab)use it (probably a combination of both). But if you think nom's approach will have fewer downsides, go for it :)

@epage
Copy link
Member

epage commented Aug 1, 2022

I want to explore using nom once some ergonomic improvements are made since it doesn't use macros.

As a follow up to my earlier comment, like most parser libraries, the development of nom has slowed down, so we do not yet have these ergonomic improvements. I am coordinator with the maintainer for getting access to make these improvements. They will be back from vacation soon which will hopefully get things moving forward.

@epage
Copy link
Member

epage commented Aug 1, 2022

btw I had forgotten to call out that I made a parser benchmark repo. This was in part to test my combine theory. While combine performs fairly well, that is for an old version of combine (all I could find a json implementation for) and only uses one macro rather than our 2 layers of macros for every parser approach.

Also, for more background on why I had originally suspected combine and our use of it: When nnethercote was analyzing hot spots for compiling, he found that tt munchers were slow to compile because it needed to re-parse at each layer of recursion to check what pattern it matched. While we aren't doing tt munching, we put the entire body of every parser function into two layers of macros.

However, if this is all done during typecheck, then the numbers earlier in this thread blow that theory out of the water. However, the use of generics that combine forces on us and the complexities by only working through combinators are both reasons for us to still re-evaluate our use of combine.

@epage epage added C-enhancement Category: Raise on the bar on expectations A-parse Area: Parsing TOML labels Sep 23, 2022
@epage
Copy link
Member

epage commented Dec 22, 2022

I have a prototype toml_edit on top of nom. I do not expect the polishing work to have a significant change in build performance.

Running cargo clean -p toml_edit && time cargo check, I get 3s for combine and 0.5s for nom.

This is with Rust 1.66 on a i9-12900H processor

@epage
Copy link
Member

epage commented Dec 22, 2022

While its too early to read into these numbers, I ran cargo bench -F perf

  • cargo_manifest was 20% slower
  • map cases were 20% faster
  • array cases were 30% faster

@epage
Copy link
Member

epage commented Dec 22, 2022

Parser is now fully compliant. The only regression is in error messages

@NobodyXu

This comment was marked as off-topic.

@epage

This comment was marked as off-topic.

@NobodyXu

This comment was marked as off-topic.

@epage

This comment was marked as off-topic.

epage added a commit to epage/toml_edit that referenced this issue Dec 23, 2022
This is using a short-lived fork `nom` until we can get the changes
upstreamed.  Note that `combine` requires using `attempt` to backtrack
while `nom` backtracks by default and requires `cut` to not backtrack.

I find the straight functions much easier to understand what is
happening and this allows us to intermix procedural with declarative
logic (e.g. in string processing I skipped combinators to make it easier
to not require an allocation).

Regarding that allocation, we still do it but this opens the door for us
to use `InternalString` for user values which might give us more of a
performance boost (previously, the forced allocation made it a moot
point to measure it).

Running `cargo clean -p toml_edit && time cargo check`, I get 3s for building `toml_edit` with `combine` and 0.5s for `nom`.

For runtime performance
- Parsing `cargo init`s generated manifest took 4% less time
- Parsing `cargo`s manifest took 2% less time
- 10 tables took 37% less time
- 100 tables took 41% less time
- Array of 10 tables took 38% less time
- Array of 100 tables took 40% less time

This is with Rust 1.66 on a i9-12900H processor under WSL2

Fixes toml-rs#327
epage added a commit to epage/toml_edit that referenced this issue Dec 23, 2022
This is using a short-lived fork `nom` until we can get the changes
upstreamed.  Note that `combine` requires using `attempt` to backtrack
while `nom` backtracks by default and requires `cut` to not backtrack.

I find the straight functions much easier to understand what is
happening and this allows us to intermix procedural with declarative
logic (e.g. in string processing I skipped combinators to make it easier
to not require an allocation).

Regarding that allocation, we still do it but this opens the door for us
to use `InternalString` for user values which might give us more of a
performance boost (previously, the forced allocation made it a moot
point to measure it).

Running `cargo clean -p toml_edit && time cargo check`, I get 3s for building `toml_edit` with `combine` and 0.5s for `nom`.

For runtime performance
- Parsing `cargo init`s generated manifest took 4% less time
- Parsing `cargo`s manifest took 2% less time
- 10 tables took 37% less time
- 100 tables took 41% less time
- Array of 10 tables took 38% less time
- Array of 100 tables took 40% less time

This is with Rust 1.66 on a i9-12900H processor under WSL2

Fixes toml-rs#327
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parse Area: Parsing TOML C-enhancement Category: Raise on the bar on expectations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants