Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] [Lang] Representing matrices/vectors in Frontend and CHI IR to enable SIMD #5478

Open
8 of 11 tasks
AD1024 opened this issue Jul 20, 2022 · 0 comments
Open
8 of 11 tasks
Labels
advanced optimization The issue or bug is related to advanced optimization

Comments

@AD1024
Copy link
Contributor

AD1024 commented Jul 20, 2022

Background

Many matrix-matrix / matrix-vector operations such as load/store, elementwise-add, dot product, etc. could gain performance benefits from SIMD instructions. The current implementation of Taichi matrices and vectors is only visible at Python frontend - all the operations over these data structures are being unrolled prior to further transformations (e.g. optimizations on CHI IR), blocking us from applying any further optimizations. This issue proposes a plan to implement matrices and vectors in CHI IR to expand the space for more optimizations.

Use cases

Operations over matrices and vectors may benefit from the new implementation. For example,

ti.init(real_matrix=True)
@ti.kernel
def test_kernel(i : ti.i32):
    x = ti.Matrix(...)      # same interface as before
    weights = ...
    result = x <op> weights # same operator as before

Our design aims to be compatible with current user-level API while enabling SIMD optimizations depending on hardware capability. We will ensure that there is no performance regression before and after this optimization.
We will conduct a more detailed study on performance benefits of this optimization later.

Representing local matrices at Python, Frontend IR, and CHI IR level

Here, we mainly consider two major objects: Matrix and indexed Matrix, both should stay as a whole till code generation:

  1. Python level:
    Add a flag real_matrix to ti.init(), which controls the transformation for local matrices during AST Transformation.
  • Local Matrix: Transformed into FrontendAllocaStmt(DataType::Tensor) and FrontendAssignStmt during AST Transformation.
  • Indexed Matrix: Transform into IndexedExpression
  1. Frontend IR level:
  • Local Matrix: Verify FrontendAllocaStmt(DataType::Tensor) is supported.
  • Indexed Matrix: Verify IndexExpression(IdExpression, indices={}) is supported.
  1. CHI IR level:
  • Local Matrix: Adjust OPs (Load/Store) to support operands of AllocaStmt(DataType::Tensor).
  • Indexed Matrix: Add new statement "IndexStmt". Adjust existing Ops accordingly.

Lowering and Codegen

For prototyping purpose, LLVM backend will be the major code generation target. SIMD data types and instructions in LLVM are described in the LLVM documentation.

Fallback

To be compatible with targets that do not support vectorized instructions, we will use fallback strategies (e.g. scalarize the operations using a transformation pass). There might be other cases where fallback is preferred (e.g. the matrix is too large). We need further experimentation to explore the preferrable situations to apply this optimization.

Implementation Plan (Tentative)

  • Make the representation visible in Frontend IR and CHI IR.
    • Add a global switch to turn the feature to on/off (e.g. real_matrix=True/False)
    • Change unrolling to a single statement, i.e. alloca a TensorType value and assigning to a local variable (Frontend IR).
    • Change fetching elements from indices to IndexExpression (Frontend IR)
    • Add indexing statements to CHI IR (e.g. IndexStmt)
  • Transformations & Codegen
    • Fill in implementation of lowering to CHI IR
    • Add supports to new constructs on transformation passes
    • Enable common operations (e.g. add, sub, mul, div, dot, etc.) to accept the new representation, including values with TensorType
    • (For LLVM backend) Generate SIMD instructions for constructs & operators supported (e.g. vectorized addition)
  • Fallback
    • Scalarize the operations using a transformation pass when needed; this pass aims to fallback to the original implementation to ensure there's no regression on the performance.
  • Optimizations and "ablation study"
    • When to use the representation, and when not to use it?
  • Benchmarks
    • Ensure no performance regression (at least as well as scalarized operations)

Thank @jim19930609 and @strongoier for their help on editing the plan!

@AD1024 AD1024 added the advanced optimization The issue or bug is related to advanced optimization label Jul 20, 2022
@taichi-gardener taichi-gardener moved this to Untriaged in Taichi Lang Jul 20, 2022
@neozhaoliang neozhaoliang moved this from Untriaged to In Progress in Taichi Lang Jul 22, 2022
AD1024 added a commit that referenced this issue Sep 14, 2022
Related issue = #5478 
A part of PR #5551 

<!--
Thank you for your contribution!

If it is your first time contributing to Taichi, please read our
Contributor Guidelines:
  https://docs.taichi-lang.org/docs/contributor_guide

- Please always prepend your PR title with tags such as [CUDA], [Lang],
[Doc], [Example]. For a complete list of valid PR tags, please check out
https://github.com/taichi-dev/taichi/blob/master/misc/prtags.json.
- Use upper-case tags (e.g., [Metal]) for PRs that change public APIs.
Otherwise, please use lower-case tags (e.g., [metal]).
- More details:
https://docs.taichi-lang.org/docs/contributor_guide#pr-title-format-and-tags

- Please fill in the issue number that this PR relates to.
- If your PR fixes the issue **completely**, use the `close` or `fixes`
prefix so that GitHub automatically closes the issue when the PR is
merged. For example,
    Related issue = close #2345
- If the PR does not belong to any existing issue, free to leave it
blank.
-->

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Yi Xu <[email protected]>
AD1024 added a commit that referenced this issue Sep 24, 2022
Related issue = #5478 
Refactored & Combined PR: #5797 and #5861 
A part of PR #5551

Co-authored-by: Yi Xu <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
strongoier added a commit that referenced this issue Oct 19, 2022
Issue: #5478

### Brief Summary
Implement matrix/vector operations as standard libraries.

Co-authored-by: Yi Xu <[email protected]>
strongoier added a commit that referenced this issue Nov 4, 2022
Issue: #5478 #5819 

### Brief Summary

Co-authored-by: Yi Xu <[email protected]>
Co-authored-by: Zhanlue Yang <[email protected]>
quadpixels pushed a commit to quadpixels/taichi that referenced this issue May 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
advanced optimization The issue or bug is related to advanced optimization
Projects
Status: In Progress
Development

No branches or pull requests

1 participant