Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[opt] Eliminate redundant mod for SNode access under packed mode (#6444)
Issue: #6219 ### Brief Summary For the following example, ```python ti.root.dense(ti.i, 10).dense(ti.i, 30).place(x) ``` Under packed mode, if we want to access `x[105]`, we will calculate `105 mod (10 * 30) div 30 = 3` for the coordinate in the first dense SNode, and `105 mod 30 = 15` for the coordinate in the second dense SNode. We can see that `105 mod (10 * 30)` is unnecessary because user coordinate (`105`) is always less than the total shape (`10 * 30`) of the axis. This PR eliminates such redundant `mod` upon first coordinate extraction on an axis. On my local machine, the benchmark script in #6219 runs `0.030s` for `packed=False`, `0.039s` for `packed=True` before this PR, `0.007s` for `packed=True` after this PR (even faster than `packed=False` because `packed=False` still generates a `BitExtractStmt`). This optimization ensures that no `mod` will be generated for accessing `x[i, j]` in common use cases like ```python x = ti.field(ti.i32, shape=(100, 200), order='ji') # or equivalently x = ti.field(ti.i32) ti.root.dense(ti.j, 200).dense(ti.i, 100).place(x) ``` Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information