Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt] Eliminate redundant mod for SNode access under packed mode #6444

Merged
merged 2 commits into from
Oct 27, 2022

Conversation

strongoier
Copy link
Contributor

Issue: #6219

Brief Summary

For the following example,

ti.root.dense(ti.i, 10).dense(ti.i, 30).place(x)

Under packed mode, if we want to access x[105], we will calculate 105 mod (10 * 30) div 30 = 3 for the coordinate in the first dense SNode, and 105 mod 30 = 15 for the coordinate in the second dense SNode. We can see that 105 mod (10 * 30) is unnecessary because user coordinate (105) is always less than the total shape (10 * 30) of the axis. This PR eliminates such redundant mod upon first coordinate extraction on an axis.

On my local machine, the benchmark script in #6219 runs
0.030s for packed=False,
0.039s for packed=True before this PR,
0.007s for packed=True after this PR (even faster than packed=False because packed=False still generates a BitExtractStmt).

This optimization ensures that no mod will be generated for accessing x[i, j] in common use cases like

x = ti.field(ti.i32, shape=(100, 200), order='ji')
# or equivalently
x = ti.field(ti.i32)
ti.root.dense(ti.j, 200).dense(ti.i, 100).place(x)

@netlify
Copy link

netlify bot commented Oct 26, 2022

Deploy Preview for docsite-preview ready!

Name Link
🔨 Latest commit decb8c3
🔍 Latest deploy log https://app.netlify.com/sites/docsite-preview/deploys/635910c7067e0400080e96df
😎 Deploy Preview https://deploy-preview-6444--docsite-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@strongoier strongoier merged commit 13b78ab into taichi-dev:master Oct 27, 2022
@strongoier strongoier deleted the packed-scalar-ptr branch October 27, 2022 05:13
strongoier added a commit that referenced this pull request Nov 1, 2022
…acked mode (#6485)

Issue: #6219

### Brief Summary

This PR adds optimization similar to #6444 for non-packed mode so that
we can conduct fair comparisons regarding performance. After this PR,
the benchmark script in #6219 runs `0.007s` on my local machine no
matter `packed=True/False`.

The tests are fixed because they are invalid - the out-of-bound access
used to be hidden by the always inserted `BitExtractStmt` before this
PR.
strongoier added a commit that referenced this pull request Nov 23, 2022
…d mode (#6709)

Issue: #6660

### Brief Summary

This PR applies the same optimization in #6444 to the
`demote_dense_struct_fors` pass.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
quadpixels pushed a commit to quadpixels/taichi that referenced this pull request May 13, 2023
…acked mode (taichi-dev#6485)

Issue: taichi-dev#6219

### Brief Summary

This PR adds optimization similar to taichi-dev#6444 for non-packed mode so that
we can conduct fair comparisons regarding performance. After this PR,
the benchmark script in taichi-dev#6219 runs `0.007s` on my local machine no
matter `packed=True/False`.

The tests are fixed because they are invalid - the out-of-bound access
used to be hidden by the always inserted `BitExtractStmt` before this
PR.
quadpixels pushed a commit to quadpixels/taichi that referenced this pull request May 13, 2023
…d mode (taichi-dev#6709)

Issue: taichi-dev#6660

### Brief Summary

This PR applies the same optimization in taichi-dev#6444 to the
`demote_dense_struct_fors` pass.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants