Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase in memory used to compile rollup-base-public #7001

Closed
aakoshh opened this issue Jan 9, 2025 · 12 comments · Fixed by #7053
Closed

Increase in memory used to compile rollup-base-public #7001

aakoshh opened this issue Jan 9, 2025 · 12 comments · Fixed by #7053
Labels
bug Something isn't working

Comments

@aakoshh
Copy link
Contributor

aakoshh commented Jan 9, 2025

Aim

Followup for #6972 added a new mem2reg pass. As a consequence we saw a 100% increase in memory used during the compilation of one of the protocol circuits in aztec-packages.

Expected Behavior

Didn't expect a significant increase in memory usage.

Bug

#6972 (comment)

To Reproduce

See how CI does it.

Workaround

None

Workaround Description

No response

Additional Context

No response

Project Impact

None

Blocker Context

No response

Nargo Version

nargo version = 1.0.0-beta.1 noirc version = 1.0.0-beta.1+bb8dd5ce43f0d89e393bd49f8415008826903652 (git version hash: 13b5871, is dirty: false)

NoirJS Version

No response

Proving Backend Tooling & Version

No response

Would you like to submit a PR for this Issue?

None

Support Needs

No response

@aakoshh aakoshh added the bug Something isn't working label Jan 9, 2025
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Noir Jan 9, 2025
@TomAFrench
Copy link
Member

I think a contributing factor this is how early we perform the inlining pass. Looking at the ordering of passes:

.run_pass(Ssa::remove_unreachable_functions, "Removing Unreachable Functions")
.run_pass(Ssa::defunctionalize, "Defunctionalization")
.run_pass(Ssa::remove_paired_rc, "Removing Paired rc_inc & rc_decs")
.run_pass(|ssa| ssa.inline_functions(options.inliner_aggressiveness), "Inlining (1st)")
// Run mem2reg with the CFG separated into blocks
.run_pass(Ssa::mem2reg, "Mem2Reg (1st)")
.run_pass(Ssa::simplify_cfg, "Simplifying (1st)")
.run_pass(Ssa::as_slice_optimization, "`as_slice` optimization")
.run_pass(Ssa::remove_unreachable_functions, "Removing Unreachable Functions")
.try_run_pass(
Ssa::evaluate_static_assert_and_assert_constant,
"`static_assert` and `assert_constant`",
)?
.run_pass(Ssa::loop_invariant_code_motion, "Loop Invariant Code Motion")
.try_run_pass(
|ssa| ssa.unroll_loops_iteratively(options.max_bytecode_increase_percent),
"Unrolling",
)?
.run_pass(Ssa::simplify_cfg, "Simplifying (2nd)")
.run_pass(Ssa::mem2reg, "Mem2Reg (2nd)")
.run_pass(Ssa::flatten_cfg, "Flattening")
.run_pass(Ssa::remove_bit_shifts, "Removing Bit Shifts")
// Run mem2reg once more with the flattened CFG to catch any remaining loads/stores
.run_pass(Ssa::mem2reg, "Mem2Reg (3rd)")
// Run the inlining pass again to handle functions with `InlineType::NoPredicates`.
// Before flattening is run, we treat functions marked with the `InlineType::NoPredicates` as an entry point.
// This pass must come immediately following `mem2reg` as the succeeding passes
// may create an SSA which inlining fails to handle.
.run_pass(
|ssa| ssa.inline_functions_with_no_predicates(options.inliner_aggressiveness),
"Inlining (2nd)",
)
.run_pass(Ssa::remove_if_else, "Remove IfElse")
.run_pass(Ssa::fold_constants, "Constant Folding")
.run_pass(Ssa::remove_enable_side_effects, "EnableSideEffectsIf removal")
.run_pass(Ssa::fold_constants_using_constraints, "Constraint Folding")
.run_pass(Ssa::dead_instruction_elimination, "Dead Instruction Elimination (1st)")
.run_pass(Ssa::simplify_cfg, "Simplifying:")
.run_pass(Ssa::array_set_optimization, "Array Set Optimizations")

You can see that we pretty much immediately inline all of the functions into the entrypoint function. This means that if we've got a function which is used in multiple places, we're going to have to do all the later passes N different times rather than fully simplifying the function on its own before we inline it.

Note that we can't just run all the various passes on every function before inlining. We're going to need to at the least tolerate loop unrolling failing as loop bounds may come from function arguments, also some thought would need to go into flattening as well (maybe?)

@TomAFrench
Copy link
Member

Image

Memory flamegraph showing that the blocks field of the PerFunctionContext within mem2reg is holding all the memory.

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 13, 2025

This shows the heaviest stack trace in terms of memory allocations:
Image

It's also inside Block::unify like in the flamegraph above, it points at the im::OrdMap as the culprit, which is the data structure backing all fields in Block.

@jfecher
Copy link
Contributor

jfecher commented Jan 13, 2025

The various maps in Block have been a known memory issue in the past. They were changed from hashmaps to OrdMaps since those used less memory in some tests in the past. I think further improvement will require more than just a container change. E.g. sacrificing optimizations by arbitrarily removing known values in a Block after some limit. Or rewriting mem2reg more thoroughly to use a different algorithm.

Edit: perhaps an easier change would be to add a check to drop any blocks we don't need any more (blocks whose successors are all finished as well).

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 13, 2025

With a bit of refactoring I could at least narrow it down to the maintenance of aliases, rather than the other fields that use OrdMap:
Image

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 13, 2025

A bit more digging shows that there is a cambrian explosion in the number of blocks during the second simplification Unrolling:

AFTER Removing Unreachable Functions: functions=366
    FUNCTION main: blocks=1
...
AFTER Inlining (1st): functions=33
    FUNCTION main: blocks=917
mem2reg:
    FUNCTION main: block_cnt=915 alias_set_cnt=53967 alias_cnt=53967  
AFTER Mem2Reg (1st): functions=33
    FUNCTION main: blocks=917
AFTER Simplifying (1st): functions=33
    FUNCTION main: blocks=848
...
AFTER Unrolling: functions=33
    FUNCTION main: blocks=62932
AFTER Simplifying (2nd): functions=33
    FUNCTION main: block_vec_cnt=61750, block_set_cnt=61750
mem2reg:
    FUNCTION main: block_cnt=61750 alias_set_cnt=65954275 alias_cnt=65954275
AFTER Mem2Reg (2nd): functions=33
    FUNCTION main: blocks=61758
AFTER Flattening: functions=33
    FUNCTION main: blocks=1
...

After Unrolling the number of blocks in the main function goes from 848 during previous passes to 62932. All of these blocks see their predecessors folded together, during which their aliases are unified, which involves set unions. In our case that means that towards the end of the 60K+ blocks we visit, each of them have 1K+ entries in aliases (ie. different expression keys), which get copied for a total of 65 million alias set entries.

I wonder if there is a case here for trying to achieve more structural sharing around the data structures tracking aliases, since these blocks seem to hoard aliases as we move down the order of their topology.

EDIT: Only after this did I separate alias_cnt and alias_set_cnt. They seem to be equal, like nothing has multiple aliases, which is strange, but then the size wouldn't be about the AliasSet itself. Indeed looking one more level into the trace revealed it to be the Arcs used for the keys and values of the OrdMap.

I'll also have a look at why the simplification step adds so many new blocks.

@TomAFrench
Copy link
Member

TomAFrench commented Jan 14, 2025

I'm not seeing simplification creating lots of blocks, it seems like they exist at the beginning of that pass so they're most likely from the loop unrolling pass.

tbh, the more I look at this the more it seems like anything but switching to a bottom-up approach to inlining is pointless. Consider the function below.... This is an implementation of the Serialize trait and all the various functions which get called by it.

This whole stack of SSA will eventually become some casts and a single make_array instruction but we end up waiting until it's part of the main function before we even start simplifying it down to that.

acir(inline) fn serialize f33 {
  b0(v36: Field, v37: u32, v38: Field, v39: Field, v40: Field, v41: Field, v42: Field, v43: u32, v44: Field, v45: u32, v46: Field, v47: u32, v48: Field, v49: u32, v50: Field, v51: Field, v52: Field, v53: Field, v54: u64, v55: Field, v56: Field, v57: Field, v58: Field, v59: Field, v60: Field):
    v62, v63 = call f34() -> ([Field; 25], u32)
    v64 = allocate -> &mut [Field; 25]
    store v62 at v64
    v65 = allocate -> &mut u32
    store v63 at v65
    v67 = call f36(v36, v37) -> [Field; 2]
    call f35(v64, v65, v67)
    v70 = call f38(v38, v39, v40, v41) -> [Field; 4]
    call f37(v64, v65, v70)
    v73 = call f40(v42, v43, v44, v45, v46, v47, v48, v49) -> [Field; 8]
    call f39(v64, v65, v73)
    v76 = call f42(v50, v51, v52, v53, v54, v55, v56, v57, v58) -> [Field; 9]
    call f41(v64, v65, v76)
    call f43(v64, v65, v59)
    call f43(v64, v65, v60)
    v80 = load v64 -> [Field; 25]
    v81 = load v65 -> u32
    v83 = call f44(v80, v81) -> [Field; 25]
    return v83
}

acir(inline) fn new f34 {
  b0():
    v37 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 25]
    return v37, u32 0
}
acir(inline) fn extend_from_array f35 {
  b0(v36: &mut [Field; 25], v37: &mut u32, v38: [Field; 2]):
    v40 = load v36 -> [Field; 25]
    v41 = load v37 -> u32
    v43 = add v41, u32 2
    v45 = lt u32 25, v43
    v46 = not v45
    constrain v45 == u1 0, "extend_from_array out of bounds"
    jmp b1(u32 0)
  b1(v39: u32):
    v49 = lt v39, u32 2
    jmpif v49 then: b3, else: b2
  b2():
    v50 = load v36 -> [Field; 25]
    v51 = load v37 -> u32
    store v50 at v36
    store v43 at v37
    return
  b3():
    v52 = load v36 -> [Field; 25]
    v53 = load v37 -> u32
    v54 = add v53, v39
    v55 = array_get v38, index v39 -> Field
    v56 = array_set v52, index v54, value v55
    v58 = unchecked_add v54, u32 1
    store v56 at v36
    store v53 at v37
    v59 = unchecked_add v39, u32 1
    jmp b1(v59)
}
acir(inline) fn serialize f36 {
  b0(v36: Field, v37: u32):
    v38 = cast v37 as Field
    v39 = make_array [v36, v38] : [Field; 2]
    return v39
}
acir(inline) fn extend_from_array f37 {
  b0(v36: &mut [Field; 25], v37: &mut u32, v38: [Field; 4]):
    v40 = load v36 -> [Field; 25]
    v41 = load v37 -> u32
    v43 = add v41, u32 4
    v45 = lt u32 25, v43
    v46 = not v45
    constrain v45 == u1 0, "extend_from_array out of bounds"
    jmp b1(u32 0)
  b1(v39: u32):
    v49 = lt v39, u32 4
    jmpif v49 then: b3, else: b2
  b2():
    v50 = load v36 -> [Field; 25]
    v51 = load v37 -> u32
    store v50 at v36
    store v43 at v37
    return
  b3():
    v52 = load v36 -> [Field; 25]
    v53 = load v37 -> u32
    v54 = add v53, v39
    v55 = array_get v38, index v39 -> Field
    v56 = array_set v52, index v54, value v55
    v58 = unchecked_add v54, u32 1
    store v56 at v36
    store v53 at v37
    v59 = unchecked_add v39, u32 1
    jmp b1(v59)
}
acir(inline) fn serialize f38 {
  b0(v36: Field, v37: Field, v38: Field, v39: Field):
    v41, v42 = call f57() -> ([Field; 4], u32)
    v43 = allocate -> &mut [Field; 4]
    store v41 at v43
    v44 = allocate -> &mut u32
    store v42 at v44
    call f58(v43, v44, v36)
    call f58(v43, v44, v37)
    call f58(v43, v44, v38)
    call f58(v43, v44, v39)
    v49 = load v43 -> [Field; 4]
    v50 = load v44 -> u32
    v52 = call f59(v49, v50) -> [Field; 4]
    return v52
}
acir(inline) fn extend_from_array f39 {
  b0(v36: &mut [Field; 25], v37: &mut u32, v38: [Field; 8]):
    v40 = load v36 -> [Field; 25]
    v41 = load v37 -> u32
    v43 = add v41, u32 8
    v45 = lt u32 25, v43
    v46 = not v45
    constrain v45 == u1 0, "extend_from_array out of bounds"
    jmp b1(u32 0)
  b1(v39: u32):
    v49 = lt v39, u32 8
    jmpif v49 then: b3, else: b2
  b2():
    v50 = load v36 -> [Field; 25]
    v51 = load v37 -> u32
    store v50 at v36
    store v43 at v37
    return
  b3():
    v52 = load v36 -> [Field; 25]
    v53 = load v37 -> u32
    v54 = add v53, v39
    v55 = array_get v38, index v39 -> Field
    v56 = array_set v52, index v54, value v55
    v58 = unchecked_add v54, u32 1
    store v56 at v36
    store v53 at v37
    v59 = unchecked_add v39, u32 1
    jmp b1(v59)
}
acir(inline) fn serialize f40 {
  b0(v36: Field, v37: u32, v38: Field, v39: u32, v40: Field, v41: u32, v42: Field, v43: u32):
    v45, v46 = call f52() -> ([Field; 8], u32)
    v47 = allocate -> &mut [Field; 8]
    store v45 at v47
    v48 = allocate -> &mut u32
    store v46 at v48
    v50 = call f36(v36, v37) -> [Field; 2]
    call f53(v47, v48, v50)
    v53 = call f55(v38, v39, v40, v41, v42, v43) -> [Field; 6]
    call f54(v47, v48, v53)
    v55 = load v47 -> [Field; 8]
    v56 = load v48 -> u32
    v58 = call f56(v55, v56) -> [Field; 8]
    return v58
}
acir(inline) fn extend_from_array f41 {
  b0(v36: &mut [Field; 25], v37: &mut u32, v38: [Field; 9]):
    v40 = load v36 -> [Field; 25]
    v41 = load v37 -> u32
    v43 = add v41, u32 9
    v45 = lt u32 25, v43
    v46 = not v45
    constrain v45 == u1 0, "extend_from_array out of bounds"
    jmp b1(u32 0)
  b1(v39: u32):
    v49 = lt v39, u32 9
    jmpif v49 then: b3, else: b2
  b2():
    v50 = load v36 -> [Field; 25]
    v51 = load v37 -> u32
    store v50 at v36
    store v43 at v37
    return
  b3():
    v52 = load v36 -> [Field; 25]
    v53 = load v37 -> u32
    v54 = add v53, v39
    v55 = array_get v38, index v39 -> Field
    v56 = array_set v52, index v54, value v55
    v58 = unchecked_add v54, u32 1
    store v56 at v36
    store v53 at v37
    v59 = unchecked_add v39, u32 1
    jmp b1(v59)
}
acir(inline) fn serialize f42 {
  b0(v36: Field, v37: Field, v38: Field, v39: Field, v40: u64, v41: Field, v42: Field, v43: Field, v44: Field):
    v46, v47 = call f45() -> ([Field; 9], u32)
    v48 = allocate -> &mut [Field; 9]
    store v46 at v48
    v49 = allocate -> &mut u32
    store v47 at v49
    call f46(v48, v49, v36)
    call f46(v48, v49, v37)
    call f46(v48, v49, v38)
    call f46(v48, v49, v39)
    v54 = cast v40 as Field
    call f46(v48, v49, v54)
    v57 = call f47(v41) -> Field
    call f46(v48, v49, v57)
    v60 = call f48(v42) -> Field
    call f46(v48, v49, v60)
    v63 = call f50(v43, v44) -> [Field; 2]
    call f49(v48, v49, v63)
    v65 = load v48 -> [Field; 9]
    v66 = load v49 -> u32
    v68 = call f51(v65, v66) -> [Field; 9]
    return v68
}
acir(inline) fn push f43 {
  b0(v36: &mut [Field; 25], v37: &mut u32, v38: Field):
    v39 = load v36 -> [Field; 25]
    v40 = load v37 -> u32
    v42 = lt v40, u32 25
    constrain v42 == u1 1, "push out of bounds"
    v44 = array_set v39, index v40, value v38
    v46 = unchecked_add v40, u32 1
    v47 = add v40, u32 1
    store v44 at v36
    store v47 at v37
    return
}
acir(inline) fn storage f44 {
  b0(v36: [Field; 25], v37: u32):
    return v36
}

acir(inline) fn new f45 {
  b0():
    v37 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 9]
    return v37, u32 0
}
acir(inline) fn push f46 {
  b0(v36: &mut [Field; 9], v37: &mut u32, v38: Field):
    v39 = load v36 -> [Field; 9]
    v40 = load v37 -> u32
    v42 = lt v40, u32 9
    constrain v42 == u1 1, "push out of bounds"
    v44 = array_set v39, index v40, value v38
    v46 = unchecked_add v40, u32 1
    v47 = add v40, u32 1
    store v44 at v36
    store v47 at v37
    return
}
acir(inline) fn to_field f47 {
  b0(v36: Field):
    return v36
}
acir(inline) fn to_field f48 {
  b0(v36: Field):
    return v36
}
acir(inline) fn extend_from_array f49 {
  b0(v36: &mut [Field; 9], v37: &mut u32, v38: [Field; 2]):
    v40 = load v36 -> [Field; 9]
    v41 = load v37 -> u32
    v43 = add v41, u32 2
    v45 = lt u32 9, v43
    v46 = not v45
    constrain v45 == u1 0, "extend_from_array out of bounds"
    jmp b1(u32 0)
  b1(v39: u32):
    v49 = lt v39, u32 2
    jmpif v49 then: b3, else: b2
  b2():
    v50 = load v36 -> [Field; 9]
    v51 = load v37 -> u32
    store v50 at v36
    store v43 at v37
    return
  b3():
    v52 = load v36 -> [Field; 9]
    v53 = load v37 -> u32
    v54 = add v53, v39
    v55 = array_get v38, index v39 -> Field
    v56 = array_set v52, index v54, value v55
    v58 = unchecked_add v54, u32 1
    store v56 at v36
    store v53 at v37
    v59 = unchecked_add v39, u32 1
    jmp b1(v59)
}
acir(inline) fn serialize f50 {
  b0(v36: Field, v37: Field):
    v38 = make_array [v36, v37] : [Field; 2]
    return v38
}
acir(inline) fn storage f51 {
  b0(v36: [Field; 9], v37: u32):
    return v36
}
acir(inline) fn new f52 {
  b0():
    v37 = make_array [Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0, Field 0] : [Field; 8]
    return v37, u32 0
}
acir(inline) fn extend_from_array f53 {
  b0(v36: &mut [Field; 8], v37: &mut u32, v38: [Field; 2]):
    v40 = load v36 -> [Field; 8]
    v41 = load v37 -> u32
    v43 = add v41, u32 2
    v45 = lt u32 8, v43
    v46 = not v45
    constrain v45 == u1 0, "extend_from_array out of bounds"
    jmp b1(u32 0)
  b1(v39: u32):
    v49 = lt v39, u32 2
    jmpif v49 then: b3, else: b2
  b2():
    v50 = load v36 -> [Field; 8]
    v51 = load v37 -> u32
    store v50 at v36
    store v43 at v37
    return
  b3():
    v52 = load v36 -> [Field; 8]
    v53 = load v37 -> u32
    v54 = add v53, v39
    v55 = array_get v38, index v39 -> Field
    v56 = array_set v52, index v54, value v55
    v58 = unchecked_add v54, u32 1
    store v56 at v36
    store v53 at v37
    v59 = unchecked_add v39, u32 1
    jmp b1(v59)
}
acir(inline) fn extend_from_array f54 {
  b0(v36: &mut [Field; 8], v37: &mut u32, v38: [Field; 6]):
    v40 = load v36 -> [Field; 8]
    v41 = load v37 -> u32
    v43 = add v41, u32 6
    v45 = lt u32 8, v43
    v46 = not v45
    constrain v45 == u1 0, "extend_from_array out of bounds"
    jmp b1(u32 0)
  b1(v39: u32):
    v49 = lt v39, u32 6
    jmpif v49 then: b3, else: b2
  b2():
    v50 = load v36 -> [Field; 8]
    v51 = load v37 -> u32
    store v50 at v36
    store v43 at v37
    return
  b3():
    v52 = load v36 -> [Field; 8]
    v53 = load v37 -> u32
    v54 = add v53, v39
    v55 = array_get v38, index v39 -> Field
    v56 = array_set v52, index v54, value v55
    v58 = unchecked_add v54, u32 1
    store v56 at v36
    store v53 at v37
    v59 = unchecked_add v39, u32 1
    jmp b1(v59)
}
acir(inline) fn serialize f55 {
  b0(v36: Field, v37: u32, v38: Field, v39: u32, v40: Field, v41: u32):
    v43 = call f36(v36, v37) -> [Field; 2]
    v45 = call f36(v38, v39) -> [Field; 2]
    v47 = call f36(v40, v41) -> [Field; 2]
    v49 = array_get v43, index u32 0 -> Field
    v51 = array_get v43, index u32 1 -> Field
    v52 = array_get v45, index u32 0 -> Field
    v53 = array_get v45, index u32 1 -> Field
    v54 = array_get v47, index u32 0 -> Field
    v55 = array_get v47, index u32 1 -> Field
    v56 = make_array [v49, v51, v52, v53, v54, v55] : [Field; 6]
    return v56
}
acir(inline) fn storage f56 {
  b0(v36: [Field; 8], v37: u32):
    return v36
}
acir(inline) fn new f57 {
  b0():
    v37 = make_array [Field 0, Field 0, Field 0, Field 0] : [Field; 4]
    return v37, u32 0
}
acir(inline) fn push f58 {
  b0(v36: &mut [Field; 4], v37: &mut u32, v38: Field):
    v39 = load v36 -> [Field; 4]
    v40 = load v37 -> u32
    v42 = lt v40, u32 4
    constrain v42 == u1 1, "push out of bounds"
    v44 = array_set v39, index v40, value v38
    v46 = unchecked_add v40, u32 1
    v47 = add v40, u32 1
    store v44 at v36
    store v47 at v37
    return
}
acir(inline) fn storage f59 {
  b0(v36: [Field; 4], v37: u32):
    return v36
}
acir(inline) fn array_concat f60 {
  b0(v36: [Field; 1], v37: [Field; 25]):
    v40 = array_get v36, index u32 0 -> Field
    v41 = make_array [v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40, v40] : [Field; 26]
    v42 = allocate -> &mut [Field; 26]
    store v41 at v42
    jmp b1(u32 0)
  b1(v38: u32):
    v44 = lt v38, u32 25
    jmpif v44 then: b3, else: b2
  b2():
    v45 = load v42 -> [Field; 26]
    return v45
  b3():
    v46 = load v42 -> [Field; 26]
    v48 = add v38, u32 1
    v49 = array_get v37, index v38 -> Field
    v50 = array_set v46, index v48, value v49
    v51 = unchecked_add v48, u32 1
    store v50 at v42
    v52 = unchecked_add v38, u32 1
    jmp b1(v52)
}
acir(inline) fn to_field f61 {
  b0(v36: u32):
    v37 = cast v36 as Field
    return v37
}

@TomAFrench
Copy link
Member

My thoughts are that we should build a callgraph of all the functions in the program, as a first pass at this we can then:

  • Find all the functions which are called less than (some) N times.
  • Inline these functions into their callsites
  • mem2reg everything in parallel.

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 14, 2025

I don't disagree with anything you said, I also thought that functions that end up being called could be massaged a little bit before inlining, although they appear multiple times because of the different parameters they are called with.

For the record the number of blocks in functions was printed like this:

fn run_pass<F>(mut self, pass: F, msg: &str) -> Self
    where
        F: FnOnce(Ssa) -> Ssa,
    {
        self.ssa = time(msg, self.print_codegen_timings, || pass(self.ssa));
        println!("AFTER {msg}: functions={}", self.ssa.functions.len());
        for f in self.ssa.functions.values() {
            let block_cnt = PostOrder::with_function(f).into_vec().len();
            println!("    FUNCTION {}: blocks={block_cnt}", f.name());
        }
        self.print(msg)
    }

If the increase in the number blocks isn't coming from the pass it says then I'm not sure what could cause them. There is no concurrency here to mix up the print order.

@TomAFrench
Copy link
Member

I'll have a look at replicating this as if we're getting a blowup in blocks within the simplification pass then I'm missing something important.

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 14, 2025

As noticed by @TomAFrench , the above count did not cover try_run_pass. After moving it into the print method the real culprit is indeed the unrolling pass:

AFTER Unrolling: functions=33
    FUNCTION main: blocks=62932

@aakoshh
Copy link
Contributor Author

aakoshh commented Jan 14, 2025

Hah, got the -50% on rollup-base-public with df5b88d 🎉 😌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants