[RFC] [Lang] Representing matrices/vectors in Frontend and CHI IR to enable SIMD #5478

AD1024 · 2022-07-20T06:19:44Z

Background

Many matrix-matrix / matrix-vector operations such as load/store, elementwise-add, dot product, etc. could gain performance benefits from SIMD instructions. The current implementation of Taichi matrices and vectors is only visible at Python frontend - all the operations over these data structures are being unrolled prior to further transformations (e.g. optimizations on CHI IR), blocking us from applying any further optimizations. This issue proposes a plan to implement matrices and vectors in CHI IR to expand the space for more optimizations.

Use cases

Operations over matrices and vectors may benefit from the new implementation. For example,

ti.init(real_matrix=True)
@ti.kernel
def test_kernel(i : ti.i32):
    x = ti.Matrix(...)      # same interface as before
    weights = ...
    result = x <op> weights # same operator as before

Our design aims to be compatible with current user-level API while enabling SIMD optimizations depending on hardware capability. We will ensure that there is no performance regression before and after this optimization.
We will conduct a more detailed study on performance benefits of this optimization later.

Representing local matrices at Python, Frontend IR, and CHI IR level

Here, we mainly consider two major objects: Matrix and indexed Matrix, both should stay as a whole till code generation:

Python level:
Add a flag real_matrix to ti.init(), which controls the transformation for local matrices during AST Transformation.

Local Matrix: Transformed into FrontendAllocaStmt(DataType::Tensor) and FrontendAssignStmt during AST Transformation.
Indexed Matrix: Transform into IndexedExpression

Frontend IR level:

Local Matrix: Verify FrontendAllocaStmt(DataType::Tensor) is supported.
Indexed Matrix: Verify IndexExpression(IdExpression, indices={}) is supported.

CHI IR level:

Local Matrix: Adjust OPs (Load/Store) to support operands of AllocaStmt(DataType::Tensor).
Indexed Matrix: Add new statement "IndexStmt". Adjust existing Ops accordingly.

Lowering and Codegen

For prototyping purpose, LLVM backend will be the major code generation target. SIMD data types and instructions in LLVM are described in the LLVM documentation.

Fallback

To be compatible with targets that do not support vectorized instructions, we will use fallback strategies (e.g. scalarize the operations using a transformation pass). There might be other cases where fallback is preferred (e.g. the matrix is too large). We need further experimentation to explore the preferrable situations to apply this optimization.

Implementation Plan (Tentative)

Make the representation visible in Frontend IR and CHI IR.
- Add a global switch to turn the feature to on/off (e.g. real_matrix=True/False)
- Change unrolling to a single statement, i.e. alloca a TensorType value and assigning to a local variable (Frontend IR).
- Change fetching elements from indices to IndexExpression (Frontend IR)
- Add indexing statements to CHI IR (e.g. IndexStmt)
Transformations & Codegen
- Fill in implementation of lowering to CHI IR
- Add supports to new constructs on transformation passes
- Enable common operations (e.g. add, sub, mul, div, dot, etc.) to accept the new representation, including values with TensorType
- (For LLVM backend) Generate SIMD instructions for constructs & operators supported (e.g. vectorized addition)
Fallback
- Scalarize the operations using a transformation pass when needed; this pass aims to fallback to the original implementation to ensure there's no regression on the performance.
Optimizations and "ablation study"
- When to use the representation, and when not to use it?
Benchmarks
- Ensure no performance regression (at least as well as scalarized operations)

Thank @jim19930609 and @strongoier for their help on editing the plan!

The text was updated successfully, but these errors were encountered:

Related issue = #5478 A part of PR #5551  Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Yi Xu <[email protected]>

Related issue = #5478 Refactored & Combined PR: #5797 and #5861 A part of PR #5551 Co-authored-by: Yi Xu <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Issue: #5478 ### Brief Summary Implement matrix/vector operations as standard libraries. Co-authored-by: Yi Xu <[email protected]>

Issue: #5478 #5819 ### Brief Summary Co-authored-by: Yi Xu <[email protected]> Co-authored-by: Zhanlue Yang <[email protected]>

Issue: taichi-dev#5478 taichi-dev#5819 ### Brief Summary Co-authored-by: Yi Xu <[email protected]> Co-authored-by: Zhanlue Yang <[email protected]>

AD1024 added the advanced optimization The issue or bug is related to advanced optimization label Jul 20, 2022

taichi-gardener added this to Taichi Lang Jul 20, 2022

taichi-gardener moved this to Untriaged in Taichi Lang Jul 20, 2022

neozhaoliang moved this from Untriaged to In Progress in Taichi Lang Jul 22, 2022

AD1024 mentioned this issue Jul 28, 2022

[Lang] Enable matrix representation in Frontend/CHI IR #5551

Closed

This was referenced Aug 15, 2022

[Lang] Enable definition of local matrices/vectors #5782

Merged

[Lang] Indexing for new local matrix implementation #5783

Merged

[Lang] Elementwise-ops for new local matrix implementation #5797

Closed

jim19930609 mentioned this issue Aug 18, 2022

[RFC] Taichi Matrix & Vector refactor plan #5819

Open

AD1024 mentioned this issue Aug 23, 2022

[Lang] Matrix operations for new local matrix representation #5861

Closed

AD1024 mentioned this issue Sep 15, 2022

[Lang] Matrix/Vector refactor: support basic matrix ops #6077

Merged

AD1024 mentioned this issue Oct 13, 2022

[Lang] Matrix/Vector refactor: Matrix operations part 1 #6319

Merged

strongoier added a commit that referenced this issue Oct 19, 2022

[Lang] Matrix/Vector refactor: Matrix operations part 1 (#6319)

bb1d101

Issue: #5478 ### Brief Summary Implement matrix/vector operations as standard libraries. Co-authored-by: Yi Xu <[email protected]>

AD1024 mentioned this issue Oct 25, 2022

[Lang] MatrixType refactor part 2: add more ops #6425

Merged

strongoier added a commit that referenced this issue Nov 4, 2022

[Lang] MatrixType refactor part 2: add more ops (#6425)

ecaf650

Issue: #5478 #5819 ### Brief Summary Co-authored-by: Yi Xu <[email protected]> Co-authored-by: Zhanlue Yang <[email protected]>

strongoier mentioned this issue Mar 6, 2023

An improved design of the Taichi type system #7495

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] [Lang] Representing matrices/vectors in Frontend and CHI IR to enable SIMD #5478

[RFC] [Lang] Representing matrices/vectors in Frontend and CHI IR to enable SIMD #5478

AD1024 commented Jul 20, 2022 •

edited

Loading

[RFC] [Lang] Representing matrices/vectors in Frontend and CHI IR to enable SIMD #5478

[RFC] [Lang] Representing matrices/vectors in Frontend and CHI IR to enable SIMD #5478

Comments

AD1024 commented Jul 20, 2022 • edited Loading

Background

Use cases

Representing local matrices at Python, Frontend IR, and CHI IR level

Lowering and Codegen

Fallback

Implementation Plan (Tentative)

AD1024 commented Jul 20, 2022 •

edited

Loading