Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reduce memory usage in mem2reg #7053

Merged
merged 10 commits into from
Jan 14, 2025
Merged

fix: Reduce memory usage in mem2reg #7053

merged 10 commits into from
Jan 14, 2025

Conversation

aakoshh
Copy link
Contributor

@aakoshh aakoshh commented Jan 14, 2025

Description

Problem*

Resolves #7001

Summary*

Changes mem2reg::block::Block to only use the mutable APIs of OrdMap if unifying the alias sets would cause something to change. That should save some memory, because doing self.aliases.entry(...).and_modify(...).or_insert_with(...) causes OrdMap::get_mut to be called, which calls Arc::make_mut to prepare the entry for modification, which involves making a clone and a new Arc if the Arc is shared, which it will be because we cloned a Block to get the folding started.

This resulted in the memory required for mem2reg to go down to ~1GB, with later peaks corresponding to convert_ssa_instruction during ACIR generation:
Screenshot 2025-01-14 at 14 26 14

Screenshot 2025-01-14 at 14 29 14

With #7057 merged the results are even better:

Screenshot 2025-01-14 at 15 02 37 Screenshot 2025-01-14 at 15 00 50

TODO in followup PR:

  • See if blocks and aliases can be freed up as we go
  • Pre-process functions before inlining them (Tom suggests: inline, unroll and mem2reg)

Additional Context

Profiling on Mac

I couldn't compile the heaptrack_gui. Instead tried cargo-instruments which requires Xcode to be installed, but it stopped working after a while, and I ended up using Instruments directly (a profiler tool on Mac, part of Xcode).

Executing the following command:

./target/release/nargo --program-dir ../aztec-packages/noir-projects/noir-protocol-circuits/crates/rollup-base-public compile  --silence-warnings --skip-underconstrained-check --skip-brillig-constraints-check --force

Compiled with:

cargo build -p nargo_cli --release

The following is needed for Instruments to be able to start the executable:

codesign -s - -v -f --entitlements ../debug.plist target/release/nargo

where debug.plist has the following content:

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "https://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"><dict><key>com.apple.security.get-task-allow</key><true/></dict></plist>

Memory profile

Original memory profile over time:

Screenshot 2025-01-14 at 01 22 26

EXPERIMENT: Changed AliasSet to use OrdSet instead of BTreeSet to utilise structural sharing.

When I tried this on aztec-packages it was promising but perhaps I didn't run it long enough I accidentally got the 2nd mem2reg commented out. Profiling again locally on noir revealed this version uses 21GB of memory 😖 :

Screenshot 2025-01-14 at 01 27 35

The increase seems to come from the allocation of a large number of Arcs:
Screenshot 2025-01-14 at 10 13 52

After having a second look at the size of the AliasSets, they seem to actually only contain a single item, so I was barking up the wrong tree.

Other tweaks I tried:

  • Replace Expression in aliases: OrdMap<Expression, AliasSet> with u64 calculated by fxhash::hash64, just to see the impact of not storing clones of Expressions; the result is the exact same memory usage, because as it turns out the memory allocated by OrdMap::get_mut call is still for Arcs, it has always been.
  • Replace aliases: OrdMap<Expression, AliasSet> with std::HashMap or std::BTreeMap; the memory usage in this case went from 5GB to 11GB; this is because of the increased cost of cloning these maps
  • Replace im with im_rc to see if Arc vs Rc makes any difference, although it's probably just faster than smaller. Can't because of rayon.

Documentation*

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist*

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

Copy link
Contributor

github-actions bot commented Jan 14, 2025

Compilation Report

Program Compilation Time %
sha256_regression 1.050s -17%
regression_4709 0.833s 5%
ram_blowup_regression 16.200s -5%
rollup-root 3.504s 5%
rollup-merge 1.844s 0%
rollup-block-merge 3.324s 0%
rollup-base-public 31.240s -14%
rollup-base-private 11.740s -7%
private-kernel-tail 0.940s -4%
private-kernel-reset 6.086s -3%
private-kernel-inner 2.148s 7%

Copy link
Contributor

github-actions bot commented Jan 14, 2025

Execution Report

Program Execution Time %
sha256_regression 0.052s 0%
regression_4709 0.001s 0%
ram_blowup_regression 0.598s -2%
rollup-root 0.104s 0%
rollup-merge 0.006s 0%
rollup-block-merge 0.104s 0%
rollup-base-public 1.222s 0%
rollup-base-private 0.462s 2%
private-kernel-tail 0.019s 0%
private-kernel-reset 0.310s -2%
private-kernel-inner 0.069s 1%

Copy link
Contributor

github-actions bot commented Jan 14, 2025

Execution Memory Report

Program Peak Memory
keccak256 74.67M
workspace 123.84M
regression_4709 316.00M
ram_blowup_regression 512.61M
rollup-base-public 478.87M
rollup-base-private 325.48M
private-kernel-tail 180.47M
private-kernel-reset 245.16M
private-kernel-inner 208.55M

Copy link
Contributor

github-actions bot commented Jan 14, 2025

Compilation Memory Report

Program Peak Memory %
keccak256 77.72M 0%
workspace 123.71M 0%
regression_4709 424.09M 0%
ram_blowup_regression 1.58G 0%
rollup-base-public 2.51G -34%
rollup-base-private 901.10M -9%
private-kernel-tail 207.18M 0%
private-kernel-reset 669.26M 0%
private-kernel-inner 294.40M 0%

@aakoshh aakoshh changed the title fix: Use OrdSet for AliasSet test: Try to reduce memory usage in mem2reg Jan 14, 2025
@aakoshh aakoshh changed the title test: Try to reduce memory usage in mem2reg fix: Try to reduce memory usage in mem2reg Jan 14, 2025
@TomAFrench
Copy link
Member

image

Nice!

@aakoshh aakoshh marked this pull request as ready for review January 14, 2025 15:03
@aakoshh aakoshh changed the title fix: Try to reduce memory usage in mem2reg fix: Reduce memory usage in mem2reg Jan 14, 2025
@aakoshh aakoshh requested a review from a team January 14, 2025 15:09
@TomAFrench TomAFrench added this pull request to the merge queue Jan 14, 2025
@TomAFrench TomAFrench removed this pull request from the merge queue due to a manual request Jan 14, 2025
@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 14, 2025

Now merged #7058 the results are very similar 👍

@aakoshh aakoshh enabled auto-merge January 14, 2025 16:54
@TomAFrench
Copy link
Member

Sorry, didn't see the usage of AbstractVecSet when fixing those conflicts.

@aakoshh aakoshh added this pull request to the merge queue Jan 14, 2025
Merged via the queue into master with commit a0ffedf Jan 14, 2025
89 checks passed
@aakoshh aakoshh deleted the 7001-fix-mem2reg-mem-req branch January 14, 2025 17:13
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Jan 14, 2025
…oir-lang/noir#7066)

feat!: Handle generic fields in `StructDefinition::fields` and move old functionality to `StructDefinition::fields_as_written` (noir-lang/noir#7067)
chore(ci): Check various inliner aggressiveness setttings in Brillig reports (noir-lang/noir#7049)
chore: reenable reports on rollup root circuits (noir-lang/noir#7061)
fix: don't always select trait impl when verifying trait constraints (noir-lang/noir#7041)
chore: mark some critical libraries as good again (noir-lang/noir#7065)
fix: Reduce memory usage in mem2reg (noir-lang/noir#7053)
feat: Allow associated types to be ellided from trait constraints (noir-lang/noir#7026)
chore(perf): try using vec-collections's VecSet in AliasSet (noir-lang/noir#7058)
chore: reduce number of iterations of `acvm::compiler::compile` (noir-lang/noir#7050)
chore: add `noir_check_shuffle` as a critical library (noir-lang/noir#7056)
chore: clippy warning fix (noir-lang/noir#7051)
chore(ci): Unify compilation/execution report jobs that take averages with single runs  (noir-lang/noir#7048)
fix(nargo_fmt): let doc comment could come after regular comment (noir-lang/noir#7046)
fix(nargo_fmt): don't consider identifiers the same if they are equal… (noir-lang/noir#7043)
feat: auto-import traits when suggesting trait methods (noir-lang/noir#7037)
feat: avoid inserting `inc_rc` instructions into ACIR (noir-lang/noir#7036)
fix(lsp): suggest all possible trait methods, but only visible ones (noir-lang/noir#7027)
chore: add more protocol circuits to reports (noir-lang/noir#6977)
feat: avoid generating a new witness when checking if linear expression is zero (noir-lang/noir#7031)
feat: skip codegen of zero iteration loops (noir-lang/noir#7030)
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Jan 14, 2025
…ir#7066)

feat!: Handle generic fields in `StructDefinition::fields` and move old functionality to `StructDefinition::fields_as_written` (noir-lang/noir#7067)
chore(ci): Check various inliner aggressiveness setttings in Brillig reports (noir-lang/noir#7049)
chore: reenable reports on rollup root circuits (noir-lang/noir#7061)
fix: don't always select trait impl when verifying trait constraints (noir-lang/noir#7041)
chore: mark some critical libraries as good again (noir-lang/noir#7065)
fix: Reduce memory usage in mem2reg (noir-lang/noir#7053)
feat: Allow associated types to be ellided from trait constraints (noir-lang/noir#7026)
chore(perf): try using vec-collections's VecSet in AliasSet (noir-lang/noir#7058)
chore: reduce number of iterations of `acvm::compiler::compile` (noir-lang/noir#7050)
chore: add `noir_check_shuffle` as a critical library (noir-lang/noir#7056)
chore: clippy warning fix (noir-lang/noir#7051)
chore(ci): Unify compilation/execution report jobs that take averages with single runs  (noir-lang/noir#7048)
fix(nargo_fmt): let doc comment could come after regular comment (noir-lang/noir#7046)
fix(nargo_fmt): don't consider identifiers the same if they are equal… (noir-lang/noir#7043)
feat: auto-import traits when suggesting trait methods (noir-lang/noir#7037)
feat: avoid inserting `inc_rc` instructions into ACIR (noir-lang/noir#7036)
fix(lsp): suggest all possible trait methods, but only visible ones (noir-lang/noir#7027)
chore: add more protocol circuits to reports (noir-lang/noir#6977)
feat: avoid generating a new witness when checking if linear expression is zero (noir-lang/noir#7031)
feat: skip codegen of zero iteration loops (noir-lang/noir#7030)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increase in memory used to compile rollup-base-public
3 participants