-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Unify packed mode and non-packed mode #6660
Comments
Was wondering what's the approximate diff in performance though. If the performance isn't too bad, would it be better if we turn the In this way, new bees will get notified if they're concerned with the performance, while advanced users should know what they're doing if they intend to break the limit. |
#6219 gives a ~2x result.
I think the core problem here is whether we want to force or simply advise users to follow the best practice. I agree that giving warnings instead of raising errors is a more gentle solution, which will not break potential existing code and can still achieve the purpose of guiding users towards best practice. @yuanming-hu WDYT? |
IMHO we could give a deprecation notice in v1.3 saying that we consider enforce power of 2 for second-time division in v1.4, if they have a valid use case they should open an issue to let us know. This way we can know better how heavily this feature is used atm and not limiting ourselves in unknown use cases. wdyt? |
Sounds good! |
…6718) Issue: #6660 ### Brief Summary After this PR, the two main goals in #6660, > - Make the runtime overhead of SNode access always the same for both modes if the limitation is obeyed. > - Make the runtime overhead of struct for on a SNode whose path to root contains no non-power-of-two division always the same for both modes. are both achieved. See comments in the code for implementation details. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: #6660 ### Brief Summary This PR does the following: - Provides missing part of packed mode support for Metal sparse; - Removes `ti.extension.packed`; - Sets `packed=True` by default; - Fixes illegal tests. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: #6660 ### Brief Summary We no longer need the concept of packed mode. Now we only have to suggest users using power-of-two block size in hierarchical fields.
…6753) Issue: #6660 ### Brief Summary Apart from the deprecated warning, the previously added limitation that "non-first division on an axis must be a power of two" is now made purely a warning because it can already let users learn the best practice. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: taichi-dev#6660 ### Brief Summary We no longer need the concept of packed mode. Now we only have to suggest users using power-of-two block size in hierarchical fields.
…aichi-dev#6753) Issue: taichi-dev#6660 ### Brief Summary Apart from the deprecated warning, the previously added limitation that "non-first division on an axis must be a power of two" is now made purely a warning because it can already let users learn the best practice. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…ower of two (taichi-dev#6690) Issue: taichi-dev#6660 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…d mode (taichi-dev#6709) Issue: taichi-dev#6660 ### Brief Summary This PR applies the same optimization in taichi-dev#6444 to the `demote_dense_struct_fors` pass. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…aichi-dev#6718) Issue: taichi-dev#6660 ### Brief Summary After this PR, the two main goals in taichi-dev#6660, > - Make the runtime overhead of SNode access always the same for both modes if the limitation is obeyed. > - Make the runtime overhead of struct for on a SNode whose path to root contains no non-power-of-two division always the same for both modes. are both achieved. See comments in the code for implementation details. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: taichi-dev#6660 ### Brief Summary This PR does the following: - Provides missing part of packed mode support for Metal sparse; - Removes `ti.extension.packed`; - Sets `packed=True` by default; - Fixes illegal tests. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Issue: taichi-dev#6660 ### Brief Summary We no longer need the concept of packed mode. Now we only have to suggest users using power-of-two block size in hierarchical fields.
…aichi-dev#6753) Issue: taichi-dev#6660 ### Brief Summary Apart from the deprecated warning, the previously added limitation that "non-first division on an axis must be a power of two" is now made purely a warning because it can already let users learn the best practice. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Motivation
Packed mode was introduced to allow users to trade running time for memory usage:
taichi/docs/lang/articles/basic/layout.md
Lines 362 to 380 in db6ab0d
A more detailed introduction can be found at https://github.com/taichi-dev/taichicon/releases/download/taichicon02/Packed.Mode-Effectively.Reducing.Memory.Costs.of.Non-power-of-two.Fields-Yi.Xu.pdf.
However, simply providing users a switch is far from perfect. The default automatic padding behavior is non-intuitive to newcomers, so it is hard for them to diagnose related problems (e.g. 2-10x larger memory usage than expected, or sparsity manipulation in a wrong way) and then find out this feature.
To improve user experience, I propose to unify packed mode and the default non-packed mode - making the memory allocation and coordinate mapping behavior of Taichi SNodes as straightforward to understand as under packed mode but the runtime performance in most cases as optimized as under non-packed mode.
Proposal
The main idea is to try our best to optimize the runtime overhead of packed mode and then make it the default mode.
#6444 has taken the first step. It has made overhead of SNode access under normal cases (no second-time division on one axis) the same for both packed mode and non-packed mode.
After a discussion with @yuanming-hu, we agree that when users make a second-time division on some axis, it is very likely that they are prioritizing runtime performance and in the process of fine-tuning memory access patterns. In this case, it makes no sense to introduce slow operations like mod or div in each access. Therefore, we can add a limitation on the definition of SNodes - the non-first-time division on one axis must be a power of two. With this limitation, we can make the runtime overhead of SNode access always the same for both modes.
The only remaining parts involving coordinate calculation are struct fors. As long as the struct for is operating on a SNode whose path to root contains no non-power-of-two division, we can make the runtime overhead of the struct for the same for both modes. Unfortunately, for the remaining case, where the struct for is operating on a SNode whose path to root does contain some non-power-of-two division, the runtime overhead of the struct for for packed mode will be higher. Luckily, the runtime overhead of a struct for only appears once in each thread and is unlikely to become the performance bottleneck. In the extreme case where it does become the bottleneck, the user can change that non-power-of-two division to power-of-two as an optimization strategy.
Plan
packed=False
as a fallback option in case the new packed mode causes trouble. If things go well, in v1.4.0 we will remove thepacked
switch and the non-packed mode.The text was updated successfully, but these errors were encountered: