-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize multi-dimensional array access (#70271)
Currently, multi-dimensional (MD) array access operations are treated as opaque to most of the JIT; they pass through the optimization pipeline untouched. Lowering expands the `GT_ARR_ELEM` node (representing a `a[i,j]` operation, for example) to `GT_ARR_OFFSET` and `GT_ARR_INDEX` trees, to expand the register requirements of the operation. These are then directly used to generate code. This change moves the expansion of `GT_ARR_ELEM` to a new pass that follows loop optimization but precedes Value Numbering, CSE, and the rest of the optimizer. This placement allows for future improvement to loop cloning to support cloning loops with MD references, but allows the optimizer to kick in on the new expansion. One nice feature of this change: there is no machine-dependent code required; all the nodes get lowered to machine-independent nodes before code generation. The MDBenchI and MDBenchF micro-benchmarks (very targeted to this work) improve about 10% to 60%, but there is one significant CQ regression in MDMulMatrix of over 20%. Future loop cloning, CSE, and/or LSRA work will be needed to get that back. In this change, `GT_ARR_ELEM` nodes are morphed to appropriate trees. Note that an MD array `Get`, `Set`, or `Address` operation is imported as a call, and, if all required conditions are satisfied, is treated as an intrinsic and replaced by IR nodes, especially `GT_ARR_ELEM` nodes, in `impArrayAccessIntrinsic()`. For example, a simple 2-dimensional array access like `a[i,j]` looks like: ``` \--* ARR_ELEM[,] byref +--* LCL_VAR ref V00 arg0 +--* LCL_VAR int V01 arg1 \--* LCL_VAR int V02 arg2 ``` This is replaced by: ``` &a + offset + elemSize * ((i - a.GetLowerBound(0)) * a.GetLength(1) + (j - a.GetLowerBound(1))) ``` plus the appropriate `i` and `j` bounds checks. In IR, this is: ``` * ADD byref +--* ADD long | +--* MUL long | | +--* CAST long <- uint | | | \--* ADD int | | | +--* MUL int | | | | +--* COMMA int | | | | | +--* ASG int | | | | | | +--* LCL_VAR int V04 tmp1 | | | | | | \--* SUB int | | | | | | +--* LCL_VAR int V01 arg1 | | | | | | \--* MDARR_LOWER_BOUND int (0) | | | | | | \--* LCL_VAR ref V00 arg0 | | | | | \--* COMMA int | | | | | +--* BOUNDS_CHECK_Rng void | | | | | | +--* LCL_VAR int V04 tmp1 | | | | | | \--* MDARR_LENGTH int (0) | | | | | | \--* LCL_VAR ref V00 arg0 | | | | | \--* LCL_VAR int V04 tmp1 | | | | \--* MDARR_LENGTH int (1) | | | | \--* LCL_VAR ref V00 arg0 | | | \--* COMMA int | | | +--* ASG int | | | | +--* LCL_VAR int V05 tmp2 | | | | \--* SUB int | | | | +--* LCL_VAR int V02 arg2 | | | | \--* MDARR_LOWER_BOUND int (1) | | | | \--* LCL_VAR ref V00 arg0 | | | \--* COMMA int | | | +--* BOUNDS_CHECK_Rng void | | | | +--* LCL_VAR int V05 tmp2 | | | | \--* MDARR_LENGTH int (1) | | | | \--* LCL_VAR ref V00 arg0 | | | \--* LCL_VAR int V05 tmp2 | | \--* CNS_INT long 4 | \--* CNS_INT long 32 \--* LCL_VAR ref V00 arg0 ``` before being morphed by the usual morph transformations. Some things to consider: 1. MD arrays have both a lower bound and length for each dimension (even if very few MD arrays actually have a non-zero lower bound) 2. The new `GT_MDARR_LOWER_BOUND(dim)` node represents the lower-bound value for a particular array dimension. The "effective index" for a dimension is the index minus the lower bound. 3. The new `GT_MDARR_LENGTH(dim)` node represents the length value (number of elements in a dimension) for a particular array dimension. 4. The effective index is bounds checked against the dimension length. 5. The lower bound and length values are 32-bit signed integers (`TYP_INT`). 6. After constructing a "linearized index", the index is scaled by the array element size, and the offset from the array object to the beginning of the array data is added. 7. Much of the complexity above is simply to assign temps to the various values that are used subsequently. 8. The index expressions are used exactly once. However, if have side effects, they need to be copied, early, to preserve exception ordering. 9. Only the top-level operation adds the array object to the scaled, linearized index, to create the final address `byref`. As usual, we need to be careful to not create an illegal byref by adding any partial index. calculation. 10. To avoid doing unnecessary work, the importer sets the global `OMF_HAS_MDARRAYREF` flag if there are any MD array expressions to expand. Also, the block flag `BBF_HAS_MDARRAYREF` is set on blocks where these exist, so only those blocks are processed. Remaining work: 1. Implement `optEarlyProp` support for MD arrays. 2. Implement loop cloning support for MD arrays. 3. (optionally) Remove old `GT_ARR_OFFSET` and `GT_ARR_INDEX` nodes and related code, as well as `GT_ARR_ELEM` code used after the new expansion. 4. Implement improvements in CSE and LSRA to improve codegen for the MDMulMatrix benchmark. The new early expansion is enabled by default. It can be disabled (even in Release, currently), by setting `COMPlus_JitEarlyExpandMDArrays=0`. If disabled, it can be selectively enabled using `COMPlus_JitEarlyExpandMDArraysFilter=<method_set>` (e.g., as specified for `JitDump`). Fixes #60785.
- Loading branch information
1 parent
dfbc648
commit cc0ccbe
Showing
28 changed files
with
1,188 additions
and
316 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.