-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support basic sparsity SNode on Metal #593
Comments
I think it's a space-time-scalability tradeoff issue:
My suggestion is to stick to your Metal implementation for now, and gradually switch to the LLVM style for consistent behavior on different backends. Meanwhile, I'll document the confusing |
I've implemented Here's the ported It looks a bit different from the original example, note the screen-burning effect (残影?? what's the english word of it...) I'm not sure if this is due to the difference between I'll do some clean up and break down the implementation for review. BTW, I have a question about non-power-of-two sizes. Say we have the following hierarchy:
Inside Taichi, it will be padded to POT, i.e. Then, for an index at |
Yes, it is expected. Note that refine_coordinates limits all the operands within it to be powers of two. This is for performance considerations. Otherwise, we'll end up with very expensive integer division and mod... |
Maybe the screen-burning is because bitmasked blocks are not filled-with-zero when re-activated? In garbage collection, we zero-fill the deactivated blocks to make sure they are 0 when reactivated. GC is invoked after offload statements with deactivation. (Note that we are assuming no activation/deactivation on the same SNode can happen within the same offloaded task...Currently there's no compile-time check for this though. #607) |
A few optimizations to consider:
|
Some cleanups:
|
Superseded by #678 |
Concisely describe the proposed feature
I'd like to support
bitmasked()
on the Metal backend. This decorator is the easiest sparsity feature to support, because it does not require dynamic memory allocation on the device side.Describe the solution you'd like (if any)
I have a seemingly working solution in https://github.com/k-ye/taichi/tree/mtlbit. It (seems to) works on a toy example, but I need more tests (maybe good to see if it can work on a modified
taichi_sparse.py
).I tried to follow LLVM's runtime system as much as possible, e.g. runtime's
listgen
and snode'sactivate
,is_active
, etc. One thing that trips me is that, it is still not entirely clear to me how the coordinate refinement works.That said, I think I found a simpler approach that fits under Metal backend's current implementation. Instead of storing
Element
inside the list for each SNodetaichi/taichi/runtime/llvm/runtime.cpp
Lines 438 to 442 in b6c6c1e
The
Element
for the Metal backend looks like this:Inside a
struct_for
kernel, we iterate through theListManager
of that SNode, andloop_index
here can be used as the "thread_id". The rest of the stmt IRs already knows how to map thisloop_index
into the correct leaf node through a series ofOffsetAndExtractBitsStmt
andSNodeLookupStmt
. Do you see any problem with this approach?Additional comments
The text was updated successfully, but these errors were encountered: