Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Minor doc update #6952

Merged
merged 1 commit into from
Dec 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/lang/articles/basic/sparse.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Sparse data structures are traditionally based on [Quadtrees](https://en.wikiped
[VDB](https://www.openvdb.org/) and [SPGrid](http://pages.cs.wisc.edu/~sifakis/papers/SPGrid.pdf) are such examples.
In Taichi, programmers can compose data structures similar to VDB and SPGrid with SNodes. The advantages of Taichi spatially sparse data structures include:

- Array and List access time, which is the equivalent to accessing a dense data structure.
- Access with indices, which just like accessing a dense data structure.
- Automatic parallelization when iterating.
- Automatic memory access optimization.

Expand Down
2 changes: 1 addition & 1 deletion docs/lang/articles/performance_tuning/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ hierarchy matches `ti.root.(sparse SNode)+.dense`), Taichi will assign one CUDA
thread block to each `dense` container (or `dense` block). BLS optimization works
specifically for such kinds of fields.

BLS intends to enhance stencil computing processes by utilising CUDA shared memory. This optimization begins with users annotating the set of fields they want to cache using `ti.block local`. At *compile time*, Taichi tries to identify the accessing range in relation to the `dense` block of these annotated fields. If Taichi is successful, it creates code that first loads all of the accessible data in range into a *block local* buffer (CUDA's shared memory), then replaces all accesses to the relevant slots into this buffer.
BLS intends to enhance stencil computing processes by utilizing CUDA shared memory. This optimization begins with users annotating the set of fields they want to cache using `ti.block local`. At *compile time*, Taichi tries to identify the accessing range in relation to the `dense` block of these annotated fields. If Taichi is successful, it creates code that first loads all of the accessible data in range into a *block local* buffer (CUDA's shared memory), then replaces all accesses to the relevant slots into this buffer.

Here is an example illustrating the usage of BLS. `a` is a sparse field with a
block size of `4x4`.
Expand Down