Skip to content

Commit

Permalink
[mono][interp] Remove no_inlining functionality for dead bblocks (#11…
Browse files Browse the repository at this point in the history
…0468)

Many methods in the BCL, especially hwintrins related, contain a lot of code that is detected as dead during compilation. On mono, inlining happens during IL import and a lot of optimizations are run as later passes. This exposed the issue where we have a lot of dead code bloat from inlining, with optimizations later running on it.

A simple solution for this problem was tracking jump counts for each bblock (#97514), which are initialized when bblocks are first created, before IL import stage. Then a small set of IL import level optimizations were added, in order to reduce the jump targets of each bblock. As we were further importing IL, if we reached a bblock with 0 jump targets, we would disable inlining into it, in order to reduce code bloat. Disabling code emit altogether was too challenging. Another limitation of this approach was that we would fail to detect dead code if it was part of a loop. The results were good however, by reducing mem usage in `System.Numerics.Tensor.Tests` from 6GB to 600MB.

For an unrelated issue, the order in which we generate bblocks was redesigned in order to account for bblock stack state initialization in weird control flow scenarios (#108731). This was achieved by deferring IL import into bblocks that were not yet reached from other live bblocks. A side effect of this is that we no longer generate code at all in unreachable bblocks, completely superseding the previous approach while addressing both the problems of inlining into loops or generating IR for dead IL. In the previously mentioned test suite, this further reduced the memory usage to 300MB.

Remnants of the unnecessary `no_inlining` approach still lingered in the code, leading to disabling of inline optimization in some reachable code. This triggered a significant performance regression which this PR addresses.
  • Loading branch information
BrzVlad authored Dec 10, 2024
1 parent 5f41aab commit ad30479
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 44 deletions.
48 changes: 7 additions & 41 deletions src/mono/mono/mini/interp/transform.c
Original file line number Diff line number Diff line change
Expand Up @@ -760,8 +760,6 @@ handle_branch (TransformData *td, int long_op, int offset)
init_bb_stack_state (td, target_bb);

if (long_op != MINT_CALL_HANDLER) {
if (td->cbb->no_inlining)
target_bb->jump_targets--;
// We don't link finally blocks into the cfg (or other handler blocks for that matter)
interp_link_bblocks (td, td->cbb, target_bb);
}
Expand Down Expand Up @@ -803,8 +801,6 @@ one_arg_branch(TransformData *td, int mint_op, int offset, int inst_size)
return FALSE;
} else {
// branch condition always false, it is a NOP
int target = GPTRDIFF_TO_INT (td->ip + offset + inst_size - td->il_code);
td->offset_to_bb [target]->jump_targets--;
return TRUE;
}
} else {
Expand Down Expand Up @@ -901,8 +897,6 @@ two_arg_branch(TransformData *td, int mint_op, int offset, int inst_size)
return FALSE;
} else {
// branch condition always false, it is a NOP
int target = GPTRDIFF_TO_INT (td->ip + offset + inst_size - td->il_code);
td->offset_to_bb [target]->jump_targets--;
return TRUE;
}
} else {
Expand Down Expand Up @@ -2884,9 +2878,6 @@ interp_method_check_inlining (TransformData *td, MonoMethod *method, MonoMethodS
if (td->disable_inlining)
return FALSE;

if (td->cbb->no_inlining)
return FALSE;

// Exception handlers are always uncommon, with the exception of finally.
int inner_clause = td->clause_indexes [td->current_il_offset];
if (inner_clause != -1 && td->header->clauses [inner_clause].flags != MONO_EXCEPTION_CLAUSE_FINALLY)
Expand Down Expand Up @@ -4151,7 +4142,6 @@ get_basic_blocks (TransformData *td, MonoMethodHeader *header, gboolean make_lis
unsigned char *target;
ptrdiff_t cli_addr;
const MonoOpcode *opcode;
InterpBasicBlock *bb;

td->offset_to_bb = (InterpBasicBlock**)mono_mempool_alloc0 (td->mempool, (unsigned int)(sizeof (InterpBasicBlock*) * (end - start + 1)));
get_bb (td, start, make_list);
Expand All @@ -4160,21 +4150,18 @@ get_basic_blocks (TransformData *td, MonoMethodHeader *header, gboolean make_lis
MonoExceptionClause *c = header->clauses + i;
if (start + c->try_offset > end || start + c->try_offset + c->try_len > end)
return FALSE;
bb = get_bb (td, start + c->try_offset, make_list);
bb->jump_targets++;
get_bb (td, start + c->try_offset, make_list);
mono_bitset_set (il_targets, c->try_offset);
mono_bitset_set (il_targets, c->try_offset + c->try_len);
if (start + c->handler_offset > end || start + c->handler_offset + c->handler_len > end)
return FALSE;
bb = get_bb (td, start + c->handler_offset, make_list);
bb->jump_targets++;
get_bb (td, start + c->handler_offset, make_list);
mono_bitset_set (il_targets, c->handler_offset);
mono_bitset_set (il_targets, c->handler_offset + c->handler_len);
if (c->flags == MONO_EXCEPTION_CLAUSE_FILTER) {
if (start + c->data.filter_offset > end)
return FALSE;
bb = get_bb (td, start + c->data.filter_offset, make_list);
bb->jump_targets++;
get_bb (td, start + c->data.filter_offset, make_list);
mono_bitset_set (il_targets, c->data.filter_offset);
}
}
Expand Down Expand Up @@ -4207,8 +4194,7 @@ get_basic_blocks (TransformData *td, MonoMethodHeader *header, gboolean make_lis
target = start + cli_addr + 2 + (signed char)ip [1];
if (target > end)
return FALSE;
bb = get_bb (td, target, make_list);
bb->jump_targets++;
get_bb (td, target, make_list);
ip += 2;
get_bb (td, ip, make_list);
mono_bitset_set (il_targets, GPTRDIFF_TO_UINT32 (target - start));
Expand All @@ -4217,8 +4203,7 @@ get_basic_blocks (TransformData *td, MonoMethodHeader *header, gboolean make_lis
target = start + cli_addr + 5 + (gint32)read32 (ip + 1);
if (target > end)
return FALSE;
bb = get_bb (td, target, make_list);
bb->jump_targets++;
get_bb (td, target, make_list);
ip += 5;
get_bb (td, ip, make_list);
mono_bitset_set (il_targets, GPTRDIFF_TO_UINT32 (target - start));
Expand All @@ -4231,15 +4216,13 @@ get_basic_blocks (TransformData *td, MonoMethodHeader *header, gboolean make_lis
target = start + cli_addr;
if (target > end)
return FALSE;
bb = get_bb (td, target, make_list);
bb->jump_targets++;
get_bb (td, target, make_list);
mono_bitset_set (il_targets, GPTRDIFF_TO_UINT32 (target - start));
for (j = 0; j < n; ++j) {
target = start + cli_addr + (gint32)read32 (ip);
if (target > end)
return FALSE;
bb = get_bb (td, target, make_list);
bb->jump_targets++;
get_bb (td, target, make_list);
ip += 4;
mono_bitset_set (il_targets, GPTRDIFF_TO_UINT32 (target - start));
}
Expand Down Expand Up @@ -5446,13 +5429,6 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,

/* We are starting a new basic block. Change cbb and link them together */
if (link_bblocks) {
if (!new_bb->jump_targets && td->cbb->no_inlining) {
// This is a bblock that is not branched to and falls through from
// a dead predecessor. It means it is dead.
new_bb->no_inlining = TRUE;
if (td->verbose_level)
g_print ("Disable inlining in BB%d\n", new_bb->index);
}
/*
* By default we link cbb with the new starting bblock, unless the previous
* instruction is an unconditional branch (BR, LEAVE, ENDFINALLY)
Expand All @@ -5472,16 +5448,6 @@ generate_code (TransformData *td, MonoMethod *method, MonoMethodHeader *header,
}
// link_bblocks remains true, which is the default
} else {
if (!new_bb->jump_targets) {
// This is a bblock that is not branched to and it is not linked to the
// predecessor. It means it is dead.
new_bb->no_inlining = TRUE;
if (td->verbose_level)
g_print ("Disable inlining in BB%d\n", new_bb->index);
} else {
g_assert (new_bb->jump_targets > 0);
}

if (new_bb->stack_height >= 0) {
// This is relevant only for copying the vars associated with the values on the stack
memcpy (td->stack, new_bb->stack_state, new_bb->stack_height * sizeof(td->stack [0]));
Expand Down
3 changes: 0 additions & 3 deletions src/mono/mono/mini/interp/transform.h
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,6 @@ struct _InterpBasicBlock {
StackInfo *stack_state;

int index;
int jump_targets;

InterpBasicBlock *try_bblock;

Expand All @@ -160,8 +159,6 @@ struct _InterpBasicBlock {
// This block has special semantics and it shouldn't be optimized away
guint preserve : 1;
guint dead: 1;
// This bblock is detectead early as being dead, we don't inline into it
guint no_inlining: 1;
// If patchpoint is set we will store mapping information between native offset and bblock index within
// InterpMethod. In the unoptimized method we will map from native offset to the bb_index while in the
// optimized method we will map the bb_index to the corresponding native offset.
Expand Down

0 comments on commit ad30479

Please sign in to comment.