[opt] Add pass eliminate_immutable_local_vars #6926

strongoier · 2022-12-19T11:30:25Z

Brief Summary

There are many redundant copies of local vars in the initial IR:

  <[Tensor (3, 3) f32]> $128 = [$103, $106, $109, $112, $115, $118, $121, $124, $127]
  $129 : local store [$100 <- $128]
  <[Tensor (3, 3) f32]> $130 = alloca
  $131 = local load [$100]
  $132 : local store [$130 <- $131]
  <[Tensor (3, 3) f32]> $133 = alloca
  $134 = local load [$130]
  $135 : local store [$133 <- $134]
  <[Tensor (3, 3) f32]> $136 = alloca
  $137 = local load [$133]
  $138 : local store [$136 <- $137]
// In fact, `$128` can be used wherever `$136` is loaded.

These can come from many places; one of the main sources is the pass-by-value convention of ti.func. The consequence is that the number of instructions is unnecessarily large, which significantly slows down compilation.

My solution here is to identify and eliminate such redundant instructions in the first place so all later passes can take a much smaller number of instructions as input. These redundant local vars are essentially immutable ones - they are assigned only once and only loaded after the assignment. In this PR, I add an optimization pass eliminate_immutable_local_vars as the first pass.

(P.S. The type check processes of MatrixExpression and LocalLoadStmt are fixed by the way to make the pass work properly.)

Let's study the effects in two cases: #6933 and voxel-rt2.

First, let's compare the number of instructions after scalarization pass (which happens immediately after the first pass).

Kernel	Before this PR	After this PR	Rate of decrease
`test` (#6933)	45859	26452	42%
`spatial_GRIS` (voxel-rt2)	48519	17713	63%

Then, let's compare the total time of compile().

Case	Before this PR	After this PR	Rate of decrease
#6933	20.622s	8.550s	59%
voxel-rt2	27.676s	9.495s	66%

netlify · 2022-12-19T11:30:30Z

✅ Deploy Preview for docsite-preview ready!

Name	Link
🔨 Latest commit	`f191880`
🔍 Latest deploy log	https://app.netlify.com/sites/docsite-preview/deploys/63a1b1fa05926500081038fb
😎 Deploy Preview	https://deploy-preview-6926--docsite-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

for more information, see https://pre-commit.ci

taichi/analysis/gather_immutable_local_vars.cpp

bobcao3

LGTM

Issue: taichi-dev#6933 ### Brief Summary There are many redundant copies of local vars in the initial IR: ``` <[Tensor (3, 3) f32]> $128 = [$103, $106, $109, $112, $115, $118, $121, $124, $127] $129 : local store [$100 <- $128] <[Tensor (3, 3) f32]> $130 = alloca $131 = local load [$100] $132 : local store [$130 <- $131] <[Tensor (3, 3) f32]> $133 = alloca $134 = local load [$130] $135 : local store [$133 <- $134] <[Tensor (3, 3) f32]> $136 = alloca $137 = local load [$133] $138 : local store [$136 <- $137] // In fact, `$128` can be used wherever `$136` is loaded. ``` These can come from many places; one of the main sources is the pass-by-value convention of `ti.func`. The consequence is that the number of instructions is unnecessarily large, which significantly slows down compilation. My solution here is to identify and eliminate such redundant instructions in the first place so all later passes can take a much smaller number of instructions as input. These redundant local vars are essentially immutable ones - they are assigned only once and only loaded after the assignment. In this PR, I add an optimization pass `eliminate_immutable_local_vars` as the first pass. (P.S. The type check processes of `MatrixExpression` and `LocalLoadStmt` are fixed by the way to make the pass work properly.) Let's study the effects in two cases: taichi-dev#6933 and [voxel-rt2](https://github.com/taichi-dev/voxel-rt2/blob/main/example7.py). First, let's compare the number of instructions after `scalarization` pass (which happens immediately after the first pass). | Kernel | Before this PR | After this PR | Rate of decrease | | ------ | ------ | ------ | ------ | | `test` (taichi-dev#6933) | 45859 | 26452 | 42% | | `spatial_GRIS` (voxel-rt2) | 48519 | 17713 | 63% | Then, let's compare the total time of `compile()`. | Case | Before this PR | After this PR | Rate of decrease | | ------ | ------ | ------ | ------ | | taichi-dev#6933 | 20.622s | 8.550s | 59% | | voxel-rt2 | 27.676s | 9.495s | 66% | Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

[opt] Add pass eliminate_immutable_local_vars

6269943

strongoier added the full-ci Run complete set of CI tests label Dec 19, 2022

pre-commit-ci bot and others added 5 commits December 19, 2022 11:31

[pre-commit.ci] auto fixes from pre-commit.com hooks

8d14f4b

for more information, see https://pre-commit.ci

Fix matrix type inference

16373be

[pre-commit.ci] auto fixes from pre-commit.com hooks

5caec78

for more information, see https://pre-commit.ci

Merge branch 'master' of github.com:taichi-dev/taichi into add-eli-pass

8bd1f97

Add comments

f191880

strongoier requested review from ailzhang, jim19930609, lin-hitonami and bobcao3 December 20, 2022 14:37

bobcao3 reviewed Dec 20, 2022

View reviewed changes

taichi/analysis/gather_immutable_local_vars.cpp Show resolved Hide resolved

bobcao3 reviewed Dec 20, 2022

View reviewed changes

taichi/analysis/gather_immutable_local_vars.cpp Show resolved Hide resolved

bobcao3 approved these changes Dec 20, 2022

View reviewed changes

strongoier merged commit 19fce81 into taichi-dev:master Dec 21, 2022

strongoier deleted the add-eli-pass branch December 21, 2022 03:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opt] Add pass eliminate_immutable_local_vars #6926

[opt] Add pass eliminate_immutable_local_vars #6926

strongoier commented Dec 19, 2022 •

edited

Loading

netlify bot commented Dec 19, 2022 •

edited

Loading

bobcao3 left a comment

[opt] Add pass eliminate_immutable_local_vars #6926

[opt] Add pass eliminate_immutable_local_vars #6926

Conversation

strongoier commented Dec 19, 2022 • edited Loading

Brief Summary

netlify bot commented Dec 19, 2022 • edited Loading

✅ Deploy Preview for docsite-preview ready!

bobcao3 left a comment

Choose a reason for hiding this comment

strongoier commented Dec 19, 2022 •

edited

Loading

netlify bot commented Dec 19, 2022 •

edited

Loading