Support basic sparsity SNode on Metal #593

k-ye · 2020-03-14T12:19:34Z

Concisely describe the proposed feature

I'd like to support bitmasked() on the Metal backend. This decorator is the easiest sparsity feature to support, because it does not require dynamic memory allocation on the device side.

Describe the solution you'd like (if any)

I have a seemingly working solution in https://github.com/k-ye/taichi/tree/mtlbit. It (seems to) works on a toy example, but I need more tests (maybe good to see if it can work on a modified taichi_sparse.py).

I tried to follow LLVM's runtime system as much as possible, e.g. runtime's listgen and snode's activate, is_active, etc. One thing that trips me is that, it is still not entirely clear to me how the coordinate refinement works.

That said, I think I found a simpler approach that fits under Metal backend's current implementation. Instead of storing Element inside the list for each SNode

taichi/taichi/runtime/llvm/runtime.cpp

Lines 438 to 442 in b6c6c1e

    
           struct Element { 
        
             Ptr element; 
        
             int loop_bounds[2]; 
        
             PhysicalCoordinates pcoord; 
        
           };

The Element for the Metal backend looks like this:

    struct ListgenElement {
      int32_t loop_index = 0;
      int32_t root_mem_offset = 0;  // used by is_active()
    };

Inside a struct_for kernel, we iterate through the ListManager of that SNode, and loop_index here can be used as the "thread_id". The rest of the stmt IRs already knows how to map this loop_index into the correct leaf node through a series of OffsetAndExtractBitsStmt and SNodeLookupStmt. Do you see any problem with this approach?

Additional comments

The entire runtime structs and methods are emitted as strings, making it very painful to maintain now... I'll probably switch to a source code file-based solution eventually.
Be super careful about the coronavirus & please stay at home!!

The text was updated successfully, but these errors were encountered:

yuanming-hu · 2020-03-14T18:56:02Z

Inside a struct_for kernel, we iterate through the ListManager of that SNode, and loop_index here can be used as the "thread_id". The rest of the stmt IRs already knows how to map this loop_index into the correct leaf node through a series of OffsetAndExtractBitsStmt and SNodeLookupStmt. Do you see any problem with this approach?

I think it's a space-time-scalability tradeoff issue:

The LLVM runtime approach stored an i32 per coordinate. Currently, taichi supports up to 8-D tensors. This means the extreme case would support 2^256 (background) voxels. The downside is more storage/bandwidth consumption per list element.
Your Metal implementation saves space but limits the total amount of background voxels to 2^32. The struct-for loops will also have to recompute the physical coordinates according to the compressed i32 loop_index.

My suggestion is to stick to your Metal implementation for now, and gradually switch to the LLVM style for consistent behavior on different backends.

Meanwhile, I'll document the confusing refine_coordinate function (#597)

k-ye · 2020-03-16T12:15:49Z

My suggestion is to stick to your Metal implementation for now, and gradually switch to the LLVM style for consistent behavior on different backends.

I've implemented refine_coordinate now that I get what it's doing 😄

Here's the ported taichi_sparse.py using bitmasked().

It looks a bit different from the original example, note the screen-burning effect (残影?? what's the english word of it...) I'm not sure if this is due to the difference between bitmasked and pointer, or a bug in my implementation. However, when i switched to x64 and ran it again, it just crashed...

I'll do some clean up and break down the implementation for review.

BTW, I have a question about non-power-of-two sizes. Say we have the following hierarchy:

root.dense(ti.i, 3).dense(ti.i, 5)

Inside Taichi, it will be padded to POT, i.e. S1.n=4, S2.n=8.

Then, for an index at 11, logically it should belong to (S1@2, S2@1) because 2 * 5 + 1 = 11. However, due to the padding, it seems that this is actually located at (S1@1, S2@3) (1 * 8 + 3 = 11). Is this expected?

yuanming-hu · 2020-03-16T15:07:18Z

Then, for an index at 11, logically it should belong to (S1@2, S2@1) because 2 * 5 + 1 = 11. However, due to the padding, it seems that this is actually located at (S1@1, S2@3) (1 * 8 + 3 = 11). Is this expected?

Yes, it is expected. Note that refine_coordinates limits all the operands within it to be powers of two. This is for performance considerations. Otherwise, we'll end up with very expensive integer division and mod...

yuanming-hu · 2020-03-16T15:14:42Z

Maybe the screen-burning is because bitmasked blocks are not filled-with-zero when re-activated?

In garbage collection, we zero-fill the deactivated blocks to make sure they are 0 when reactivated. GC is invoked after offload statements with deactivation.

(Note that we are assuming no activation/deactivation on the same SNode can happen within the same offloaded task...Currently there's no compile-time check for this though. #607)

k-ye · 2020-03-16T22:51:00Z

Maybe the screen-burning is because bitmasked blocks are not filled-with-zero when re-activated?

You are right :)! I skipped GC tasks completely on Metal, but didn't realize it also zero-filled the elements. After doing a .fill(0) it looks correct:

k-ye · 2020-03-18T23:35:14Z

A few optimizations to consider:

Generate refine_coordinates for each SNode. This way all the bits manipulation's operands can be baked in at compile time. It also avoids the access to extractors in the global memory.
Have a better way to run the hierarchical listgen. Right now each thread covers one SNode, and if that SNode has a big branch-out factor, then we are likely under-utilizing the GPU resources. Figure out a way so that the workload can be balanced in a more fine-grained way.

k-ye · 2020-03-27T12:35:42Z

Some cleanups:

Move from platforms/metal to backends/metal [metal] Move platform/metal to backends/metal #667
Rename classes from metal::MetalX to just metal::X

k-ye · 2020-03-29T02:27:10Z

Superseded by #678

k-ye added the feature request Suggest an idea on this project label Mar 14, 2020

k-ye self-assigned this Mar 14, 2020

k-ye mentioned this issue Mar 14, 2020

Experimental Metal backend #396

Closed

k-ye added the mac Mac OS X platform label Mar 14, 2020

yuanming-hu mentioned this issue Mar 14, 2020

Document refine_coordinate #597

Closed

k-ye mentioned this issue Mar 15, 2020

Dump Metal codegen result to a temporary source file #604

Closed

This was referenced Mar 17, 2020

[Metal] Move Metal shader code to shaders/ folder #611

Merged

[Metal] Add Runtime shaders to support sparse SNode #614

Merged

k-ye mentioned this issue Mar 27, 2020

[metal] Move platform/metal to backends/metal #667

Merged

k-ye mentioned this issue Mar 29, 2020

[metal] Simplify Metal backend's namings #675

Merged

k-ye closed this as completed Mar 29, 2020

This was referenced Apr 1, 2020

[metal] Fix bug in listgen where it goes beyond ListManager's capacity #691

Merged

[OpenGL] Sparse data structure support #711

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support basic sparsity SNode on Metal #593

Support basic sparsity SNode on Metal #593

k-ye commented Mar 14, 2020 •

edited

Loading

yuanming-hu commented Mar 14, 2020

k-ye commented Mar 16, 2020 •

edited

Loading

yuanming-hu commented Mar 16, 2020

yuanming-hu commented Mar 16, 2020 •

edited

Loading

k-ye commented Mar 16, 2020

k-ye commented Mar 18, 2020 •

edited

Loading

k-ye commented Mar 27, 2020 •

edited

Loading

k-ye commented Mar 29, 2020

Support basic sparsity SNode on Metal #593

Support basic sparsity SNode on Metal #593

Comments

k-ye commented Mar 14, 2020 • edited Loading

yuanming-hu commented Mar 14, 2020

k-ye commented Mar 16, 2020 • edited Loading

yuanming-hu commented Mar 16, 2020

yuanming-hu commented Mar 16, 2020 • edited Loading

k-ye commented Mar 16, 2020

k-ye commented Mar 18, 2020 • edited Loading

k-ye commented Mar 27, 2020 • edited Loading

k-ye commented Mar 29, 2020

k-ye commented Mar 14, 2020 •

edited

Loading

k-ye commented Mar 16, 2020 •

edited

Loading

yuanming-hu commented Mar 16, 2020 •

edited

Loading

k-ye commented Mar 18, 2020 •

edited

Loading

k-ye commented Mar 27, 2020 •

edited

Loading