diff --git a/docs/lang/api/arithmetics.md b/docs/lang/api/arithmetics.md
new file mode 100644
index 0000000000000..fa39140859fa7
--- /dev/null
+++ b/docs/lang/api/arithmetics.md
@@ -0,0 +1,195 @@
+---
+sidebar_position: 4
+---
+
+# Scalar operations
+
+## Operators
+
+### Arithmetic operators
+
+|     Operation   |               Result            |
+| :-------------- | :-------------------------------|
+|      `-a`       |  `a`negated                     |
+|      `+a`       |  `a`unchanged                   |
+|      `a + b`    |  sum of `a` and `b`             |
+|      `a - b`    |  difference of `a` and `b`      |
+|      `a * b`    |  product of `a` and `b`         |
+|      `a / b`    |  quotient of `a` and `b`        |
+|      `a // b`   |  floored quotient of `a` and `b`|
+|      `a % b`    |  remainder of `a` / `b`         |
+|      `a ** b`   |  `a` to the power `b`           |
+
+:::note
+
+The `%` operator in Taichi follows the Python style instead of C style,
+e.g.,
+
+```python
+# In Taichi-scope or Python-scope:
+print(2 % 3)   # 2
+print(-2 % 3)  # 1
+```
+
+For C-style mod (`%`), please use `ti.raw_mod`:
+
+```python
+print(ti.raw_mod(2, 3))   # 2
+print(ti.raw_mod(-2, 3))  # -2
+```
+
+:::
+
+:::note
+
+Python 3 distinguishes `/` (true division) and `//` (floor division).
+For example, `1.0 / 2.0 = 0.5`, `1 / 2 = 0.5`, `1 // 2 = 0`,
+`4.2 // 2 = 2`. And Taichi follows the same design:
+
+- **true divisions** on integral types will first cast their
+  operands to the default float point type.
+- **floor divisions** on float-point types will first cast their
+  operands to the default integer type.
+
+To avoid such implicit casting, you can manually cast your operands to
+desired types, using `ti.cast`. Please see
+[Default precisions](../articles/basic/type.md#default-precisions) for more details on
+default numerical types.
+:::
+
+### Logic operators
+
+|      Operation    |               Result            |
+| :---------------  | :----------- ----------------------------------------|
+| `a == b`          | if `a` equal `b`, then True, else False              |
+| `a != b`          | if `a` not equal `b`, then True, else False          |
+| `a > b`           | if `a` strictly greater than `b`, then True, else False  |
+| `a < b`           | if `a` strictly less than `b`, then True, else False |
+| `a >= b`          | if `a` greater than or equal `b`, then True, else False |
+| `a <= b`          | if `a` less than or equal `b`, then True, else False |
+| `not a`           | if `a` is False, then True, else False               |
+| `a or b`          | if `a` is False, then `b`, else `a`                  |
+| `a and b`         | if `a` is False, then `a`, else `b`                  |
+| `a if cond else b`| if `cond` is True, then `a`, else `b`                |
+
+### Bitwise operators
+|      Operation    |               Result            |
+| :---------------  | :----------- ----------------------------------------|
+| `~a`              | the bits of `a` inverted               |
+| `a & b`|  bitwise and of `a` and  `b`                      |
+| `a ^ b`|  bitwise exclusive or of `a` and `b`             |
+| `a \| b`|  bitwise or of `a` and `b`                      |
+
+
+## Functions
+
+### Trigonometric functions
+
+```python
+ti.sin(x)
+
+ti.cos(x)
+
+ti.tan(x)
+
+ti.asin(x)
+
+ti.acos(x)
+
+ti.atan2(x, y)
+
+ti.tanh(x)
+```
+
+### Other arithmetic functions
+
+```python
+ti.sqrt(x)
+
+ti.rsqrt(x)  # A fast version for `1 / ti.sqrt(x)`.
+
+ti.exp(x)
+
+ti.log(x)
+
+ti.floor(x)
+
+ti.ceil(x)
+```
+
+### Casting types
+
+```python
+ti.cast(x, dtype)
+```
+
+See [Type system](../articles/basic/type.md#type-system) for more details.
+
+```python
+int(x)
+```
+
+A shortcut for `ti.cast(x, int)`.
+
+```python
+float(x)
+```
+
+A shortcut for `ti.cast(x, float)`.
+
+### Builtin-alike functions
+
+```python
+abs(x)
+
+max(x, y, \...)
+
+min(x, y, \...)
+
+pow(x, y)  # Same as `x ** y`.
+```
+
+### Random number generator
+
+```python
+ti.random(dtype = float)
+```
+
+## Element-wise arithmetics for vectors and matrices
+
+When these scalar functions are applied on [Matrices](./matrix.md) and [Vectors](./vector.md), they are applied in an element-wise manner. For example:
+
+```python
+B = ti.Matrix([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
+C = ti.Matrix([[3.0, 4.0, 5.0], [6.0, 7.0, 8.0]])
+
+A = ti.sin(B)
+# is equivalent to
+for i in ti.static(range(2)):
+    for j in ti.static(range(3)):
+        A[i, j] = ti.sin(B[i, j])
+
+A = B ** 2
+# is equivalent to
+for i in ti.static(range(2)):
+    for j in ti.static(range(3)):
+        A[i, j] = B[i, j] ** 2
+
+A = B ** C
+# is equivalent to
+for i in ti.static(range(2)):
+    for j in ti.static(range(3)):
+        A[i, j] = B[i, j] ** C[i, j]
+
+A += 2
+# is equivalent to
+for i in ti.static(range(2)):
+    for j in ti.static(range(3)):
+        A[i, j] += 2
+
+A += B
+# is equivalent to
+for i in ti.static(range(2)):
+    for j in ti.static(range(3)):
+        A[i, j] += B[i, j]
+```
diff --git a/docs/lang/api/atomic.md b/docs/lang/api/atomic.md
new file mode 100644
index 0000000000000..e8396ebd530f2
--- /dev/null
+++ b/docs/lang/api/atomic.md
@@ -0,0 +1,94 @@
+---
+sidebar_position: 5
+---
+
+# Atomic operations
+
+In Taichi, augmented assignments (e.g., `x[i] += 1`) are automatically
+[atomic](https://en.wikipedia.org/wiki/Fetch-and-add).
+
+:::caution
+
+When modifying global variables in parallel, make sure you use atomic
+operations. For example, to sum up all the elements in `x`, :
+
+    @ti.kernel
+    def sum():
+        for i in x:
+            # Approach 1: OK
+            total[None] += x[i]
+
+            # Approach 2: OK
+            ti.atomic_add(total[None], x[i])
+
+            # Approach 3: Wrong result since the operation is not atomic.
+            total[None] = total[None] + x[i]
+
+:::
+
+:::note
+
+When atomic operations are applied to local values, the Taichi compiler
+will try to demote these operations into their non-atomic counterparts.
+:::
+
+Apart from the augmented assignments, explicit atomic operations, such
+as `ti.atomic_add`, also do read-modify-write atomically. These
+operations additionally return the **old value** of the first argument.
+
+Below is a list of all explicit atomic operations:
+
+::: {.function}
+ti.atomic_add(x, y)
+:::
+
+::: {.function}
+ti.atomic_sub(x, y)
+
+Atomically compute `x + y` or `x - y` and store the result in `x`.
+
+return
+
+: The old value of `x`.
+
+For example, :
+
+    x[i] = 3
+    y[i] = 4
+    z[i] = ti.atomic_add(x[i], y[i])
+    # now x[i] = 7, y[i] = 4, z[i] = 3
+
+:::
+
+::: {.function}
+ti.atomic_and(x, y)
+:::
+
+::: {.function}
+ti.atomic_or(x, y)
+:::
+
+::: {.function}
+ti.atomic_xor(x, y)
+
+Atomically compute `x & y` (bitwise and), `x | y` (bitwise or), or
+`x ^ y` (bitwise xor), and store the result in `x`.
+
+return
+
+: The old value of `x`.
+:::
+
+:::note
+
+Supported atomic operations on each backend:
+
+| type | CPU/CUDA | OpenGL | Metal | C source |
+| ---- | -------- | ------ | ----- | -------- |
+| i32  | > OK     | > OK   | > OK  | > OK     |
+| f32  | > OK     | > OK   | > OK  | > OK     |
+| i64  | > OK     | > EXT  | > N/A | > OK     |
+| f64  | > OK     | > EXT  | > N/A | > OK     |
+
+(OK: supported; EXT: require extension; N/A: not available)
+:::
diff --git a/docs/lang/api/index.md b/docs/lang/api/index.md
new file mode 100644
index 0000000000000..6d6337bb4bb7a
--- /dev/null
+++ b/docs/lang/api/index.md
@@ -0,0 +1,10 @@
+---
+sidebar_position: 1
+---
+
+# API Docs
+
+:::danger WIP Notice
+Sorry for the inconvenience, the Taichi API docs are under construction and not in
+a stable state yet!
+:::
diff --git a/docs/lang/api/matrix.md b/docs/lang/api/matrix.md
new file mode 100644
index 0000000000000..11dcc5faebf5a
--- /dev/null
+++ b/docs/lang/api/matrix.md
@@ -0,0 +1,298 @@
+---
+sidebar_position: 3
+---
+
+# Matrices
+
+- `ti.Matrix` is for small matrices (e.g. [3x3]{.title-ref}) only. If
+  you have [64x64]{.title-ref} matrices, you should consider using a
+  2D scalar field.
+- `ti.Vector` is the same as `ti.Matrix`, except that it has only one
+  column.
+- Differentiate element-wise product `*` and matrix product `@`.
+- `ti.Vector.field(n, dtype=ti.f32)` or
+  `ti.Matrix.field(n, m, dtype=ti.f32)` to create vector/matrix
+  fields.
+- `A.transpose()`
+- `R, S = ti.polar_decompose(A, ti.f32)`
+- `U, sigma, V = ti.svd(A, ti.f32)` (Note that `sigma` is a `3x3`
+  diagonal matrix)
+- `any(A)` (Taichi-scope only)
+- `all(A)` (Taichi-scope only)
+
+TODO: doc here better like Vector. WIP
+
+A matrix in Taichi can have two forms:
+
+- as a temporary local variable. An `n by m` matrix consists of
+  `n * m` scalar values.
+- as a an element of a global field. In this case, the field is an
+  N-dimensional array of `n by m` matrices.
+
+## Declaration
+
+### As global matrix fields
+
+::: {.function}
+ti.Matrix.field(n, m, dtype, shape = None, offset = None)
+
+parameter n
+
+: (scalar) the number of rows in the matrix
+
+parameter m
+
+: (scalar) the number of columns in the matrix
+
+parameter dtype
+
+: (DataType) data type of the components
+
+parameter shape
+
+: (optional, scalar or tuple) shape of the matrix field, see
+`tensor`{.interpreted-text role="ref"}
+
+parameter offset
+
+: (optional, scalar or tuple) see `offset`{.interpreted-text
+role="ref"}
+
+For example, this creates a 5x4 matrix field with each entry being a 3x3
+matrix: :
+
+    # Python-scope
+    a = ti.Matrix.field(3, 3, dtype=ti.f32, shape=(5, 4))
+
+:::
+
+:::note
+
+In Python-scope, `ti.field` declares a scalar field
+([Scalar fields](./scalar_field.md)), while `ti.Matrix.field`
+declares a matrix field.
+:::
+
+### As a temporary local variable
+
+::: {.function}
+ti.Matrix(\[\[x, y, \...\], \[z, w, \...\], \...\])
+
+parameter x
+
+: (scalar) the first component of the first row
+
+parameter y
+
+: (scalar) the second component of the first row
+
+parameter z
+
+: (scalar) the first component of the second row
+
+parameter w
+
+: (scalar) the second component of the second row
+
+For example, this creates a 2x3 matrix with components (2, 3, 4) in the
+first row and (5, 6, 7) in the second row: :
+
+    # Taichi-scope
+    a = ti.Matrix([[2, 3, 4], [5, 6, 7]])
+
+:::
+
+::: {.function}
+ti.Matrix.rows(\[v0, v1, v2, \...\])
+:::
+
+::: {.function}
+ti.Matrix.cols(\[v0, v1, v2, \...\])
+
+parameter v0
+
+: (vector) vector of elements forming first row (or column)
+
+parameter v1
+
+: (vector) vector of elements forming second row (or column)
+
+parameter v2
+
+: (vector) vector of elements forming third row (or column)
+
+For example, this creates a 3x3 matrix by concactinating vectors into
+rows (or columns): :
+
+    # Taichi-scope
+    v0 = ti.Vector([1.0, 2.0, 3.0])
+    v1 = ti.Vector([4.0, 5.0, 6.0])
+    v2 = ti.Vector([7.0, 8.0, 9.0])
+
+    # to specify data in rows
+    a = ti.Matrix.rows([v0, v1, v2])
+
+    # to specify data in columns instead
+    a = ti.Matrix.cols([v0, v1, v2])
+
+    # lists can be used instead of vectors
+    a = ti.Matrix.rows([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]])
+
+:::
+
+## Accessing components
+
+### As global matrix fields
+
+::: {.attribute}
+a\[p, q, \...\]\[i, j\]
+
+parameter a
+
+: (ti.Matrix.field) the matrix field
+
+parameter p
+
+: (scalar) index of the first field dimension
+
+parameter q
+
+: (scalar) index of the second field dimension
+
+parameter i
+
+: (scalar) row index of the matrix
+
+parameter j
+
+: (scalar) column index of the matrix
+
+This extracts the first element in matrix `a[6, 3]`: :
+
+    x = a[6, 3][0, 0]
+
+    # or
+    mat = a[6, 3]
+    x = mat[0, 0]
+
+:::
+
+:::note
+
+**Always** use two pair of square brackets to access scalar elements
+from matrix fields.
+
+- The indices in the first pair of brackets locate the matrix inside
+  the matrix fields;
+- The indices in the second pair of brackets locate the scalar
+  element inside the matrix.
+
+For 0-D matrix fields, indices in the first pair of brackets should be
+`[None]`.
+:::
+
+### As a temporary local variable
+
+::: {.attribute}
+a\[i, j\]
+
+parameter a
+
+: (Matrix) the matrix
+
+parameter i
+
+: (scalar) row index of the matrix
+
+parameter j
+
+: (scalar) column index of the matrix
+
+For example, this extracts the element in row 0 column 1 of matrix `a`:
+:
+
+    x = a[0, 1]
+
+This sets the element in row 1 column 3 of `a` to 4: :
+
+    a[1, 3] = 4
+
+:::
+
+## Methods
+
+::: {.function}
+a.transpose()
+
+parameter a
+
+: (ti.Matrix) the matrix
+
+return
+
+: (ti.Matrix) the transposed matrix of `a`.
+
+For example:
+
+    a = ti.Matrix([[2, 3], [4, 5]])
+    b = a.transpose()
+    # Now b = ti.Matrix([[2, 4], [3, 5]])
+
+:::note
+
+`a.transpose()` will not effect the data in `a`, it just return the
+result.
+:::
+:::
+
+::: {.function}
+a.trace()
+
+parameter a
+
+: (ti.Matrix) the matrix
+
+return
+
+: (scalar) the trace of matrix `a`.
+
+The return value can be computed as `a[0, 0] + a[1, 1] + ...`.
+:::
+
+::: {.function}
+a.determinant()
+
+parameter a
+
+: (ti.Matrix) the matrix
+
+return
+
+: (scalar) the determinant of matrix `a`.
+
+:::note
+
+The matrix size of matrix must be 1x1, 2x2, 3x3 or 4x4 for now.
+
+This function only works in Taichi-scope for now.
+:::
+:::
+
+::: {.function}
+a.inverse()
+
+parameter a
+
+: (ti.Matrix) the matrix
+
+return
+
+: (ti.Matrix) the inverse of matrix `a`.
+
+:::note
+
+The matrix size of matrix must be 1x1, 2x2, 3x3 or 4x4 for now.
+
+This function only works in Taichi-scope for now.
+:::
+:::
diff --git a/docs/lang/api/reference/field.md b/docs/lang/api/reference/field.md
new file mode 100644
index 0000000000000..c3217f42eefd2
--- /dev/null
+++ b/docs/lang/api/reference/field.md
@@ -0,0 +1,60 @@
+---
+title: Field
+docs:
+  desc: |
+    We provide interfaces to copy data between Taichi fields and NumPy arrays.
+  functions:
+    - name: to_numpy
+      desc: Converts a field object to numpy ndarray.
+      since: v0.5.14
+      static: false
+      tags: ["numpy"]
+      params:
+        - name: self
+          type: ti.field | ti.Vector.field | ti.Matrix.field
+          desc: The field.
+      returns:
+        - type: np.ndarray
+          desc: The numpy array containing the current data in `x`.
+    - name: from_numpy
+      desc: Creates a field object from numpy ndarray.
+      since: v0.5.14
+      static: false
+      tags: ["numpy"]
+      params:
+        - name: self
+          type: ti.field | ti.Vector.field | ti.Matrix.field
+          desc: The field.
+        - name: array
+          type: np.ndarray
+      returns:
+        - type: None
+    - name: to_torch
+      desc: Converts a field object to PyTorch Tensor.
+      since: v0.5.14
+      static: false
+      tags: ["PyTorch"]
+      params:
+        - name: self
+          type: ti.field | ti.Vector.field | ti.Matrix.field
+          desc: The field.
+        - name: device
+          type: torch.device
+          desc: The device where the PyTorch tensor is stored.
+      returns:
+        - type: torch.Tensor
+          desc: The PyTorch tensor containing data in `x`.
+    - name: from_torch
+      desc: Creates a field object from PyTorch Tensor.
+      since: v0.5.14
+      static: false
+      tags: ["PyTorch"]
+      params:
+        - name: self
+          type: ti.field | ti.Vector.field | ti.Matrix.field
+          desc: The field.
+        - name: torch.Tensor
+          type: The PyTorch tensor with data to initialize the field.
+      returns:
+        - type: None
+---
diff --git a/docs/lang/api/reference/ti.md b/docs/lang/api/reference/ti.md
new file mode 100644
index 0000000000000..75770787584f4
--- /dev/null
+++ b/docs/lang/api/reference/ti.md
@@ -0,0 +1,23 @@
+---
+title: ti
+docs:
+  desc: |
+    This page documents functions and properties under `ti` (assume you have done `import taichi as ti`) namespace.
+  functions:
+    - name: block_dim
+      desc: |
+        A decorator to tweak the property of a for-loop, specify the threads per block of the next parallel for-loop.
+
+        :::note
+        The argument `n` must be a power-of-two for now.
+        :::
+      since: v0.5.14
+      static: true
+      tags: ["decorator"]
+      params:
+        - name: n
+          type: int
+          desc: threads per block / block dimension.
+      returns:
+        - type: Callable
+---
diff --git a/docs/lang/api/scalar_field.md b/docs/lang/api/scalar_field.md
new file mode 100644
index 0000000000000..3ce5280d4678b
--- /dev/null
+++ b/docs/lang/api/scalar_field.md
@@ -0,0 +1,233 @@
+---
+id: scalar_field
+sidebar_position: 1
+---
+
+# Scalar fields
+
+**Taichi fields** are used to store data.
+
+Field **elements** could be either a scalar, a vector, or a matrix (see
+[Matrices](../articles/basic/field.md)). In this paragraph, we will only
+talk about **scalar fields**, whose elements are simply scalars.
+
+Fields can have up to eight **dimensions**.
+
+- A 0D scalar field is simply a single scalar.
+- A 1D scalar field is a 1D linear array.
+- A 2D scalar field can be used to represent a 2D regular grid of
+  values. For example, a gray-scale image.
+- A 3D scalar field can be used for volumetric data.
+
+Fields could be either dense or sparse, see [Sparse Computation](../articles/advanced/sparse.md) for
+details on sparse fields. We will only talk about **dense fields** in
+this paragraph.
+
+:::note
+
+We once used the term **tensor** instead of **field**. **Tensor** will
+no longer be used.
+:::
+
+## Declaration
+
+::: {.function}
+ti.field(dtype, shape = None, offset = None)
+
+parameter dtype
+
+: (DataType) type of the field element
+
+parameter shape
+
+: (optional, scalar or tuple) the shape of field
+
+parameter offset
+
+: (optional, scalar or tuple) see [coordinate offset](../articles/advanced/offset.md)
+
+For example, this creates a _dense_ field with four `int32` as elements:
+:
+
+    x = ti.field(ti.i32, shape=4)
+
+This creates a 4x3 _dense_ field with `float32` elements: :
+
+    x = ti.field(ti.f32, shape=(4, 3))
+
+If shape is `()` (empty tuple), then a 0-D field (scalar) is created: :
+
+    x = ti.field(ti.f32, shape=())
+
+Then access it by passing `None` as index: :
+
+    x[None] = 2
+
+If shape is **not provided** or `None`, the user must manually `place`
+it afterwards: :
+
+    x = ti.field(ti.f32)
+    ti.root.dense(ti.ij, (4, 3)).place(x)
+    # equivalent to: x = ti.field(ti.f32, shape=(4, 3))
+
+:::
+
+:::note
+
+Not providing `shape` allows you to _place_ the field in a layout other
+than the default _dense_, see [Advanced dense layouts](../articles/advanced/layout.md) for
+more details.
+:::
+
+:::caution
+
+All variables should be created and placed before any kernel invocation
+or any of them accessed from python-scope. For example:
+
+```python
+x = ti.field(ti.f32)
+x[None] = 1 # ERROR: x not placed!
+```
+
+```python
+x = ti.field(ti.f32, shape=())
+@ti.kernel
+def func():
+    x[None] = 1
+
+func()
+y = ti.field(ti.f32, shape=())
+# ERROR: cannot create fields after kernel invocation!
+```
+
+```python
+x = ti.field(ti.f32, shape=())
+x[None] = 1
+y = ti.field(ti.f32, shape=())
+# ERROR: cannot create fields after any field accesses from the Python-scope!
+```
+
+:::
+
+## Accessing components
+
+You can access an element of the Taichi field by an index or indices.
+
+::: {.attribute}
+a\[p, q, \...\]
+
+parameter a
+
+: (ti.field) the sclar field
+
+parameter p
+
+: (scalar) index of the first field dimension
+
+parameter q
+
+: (scalar) index of the second field dimension
+
+return
+
+: (scalar) the element at `[p, q, ...]`
+
+This extracts the element value at index `[3, 4]` of field `a`: :
+
+    x = a[3, 4]
+
+This sets the element value at index `2` of 1D field `b` to `5`: :
+
+    b[2] = 5
+
+:::
+
+:::note
+In Python, x[(exp1, exp2, …, expN)] is equivalent to x[exp1, exp2, …, expN]; the latter is just syntactic sugar for the former.
+:::
+
+:::note
+The returned value can also be `Vector` / `Matrix` if `a` is a vector/matrix field, see [Vectors./vector.md) for more details.
+:::
+
+## Meta data
+
+::: {.attribute}
+a.shape
+
+parameter a
+
+: (ti.field) the field
+
+return
+
+: (tuple) the shape of field `a`
+
+```{=html}
+<!-- -->
+```
+
+    x = ti.field(ti.i32, (6, 5))
+    x.shape  # (6, 5)
+
+    y = ti.field(ti.i32, 6)
+    y.shape  # (6,)
+
+    z = ti.field(ti.i32, ())
+    z.shape  # ()
+
+:::
+
+::: {.attribute}
+a.dtype
+
+parameter a
+
+: (ti.field) the field
+
+return
+
+: (DataType) the data type of `a`
+
+```{=html}
+<!-- -->
+```
+
+    x = ti.field(ti.i32, (2, 3))
+    x.dtype  # ti.i32
+
+:::
+
+::: {.function}
+a.parent(n = 1)
+
+parameter a
+
+: (ti.field) the field
+
+parameter n
+
+: (optional, scalar) the number of parent steps, i.e. `n=1` for
+parent, `n=2` grandparent, etc.
+
+return
+
+: (SNode) the parent of `a`'s containing SNode
+
+```{=html}
+<!-- -->
+```
+
+    x = ti.field(ti.i32)
+    y = ti.field(ti.i32)
+    blk1 = ti.root.dense(ti.ij, (6, 5))
+    blk2 = blk1.dense(ti.ij, (3, 2))
+    blk1.place(x)
+    blk2.place(y)
+
+    x.parent()   # blk1
+    y.parent()   # blk2
+    y.parent(2)  # blk1
+
+See [Structural nodes (SNodes)](./snode.md) for more details.
+:::
diff --git a/docs/lang/api/snode.md b/docs/lang/api/snode.md
new file mode 100644
index 0000000000000..dee0c5c0f4f94
--- /dev/null
+++ b/docs/lang/api/snode.md
@@ -0,0 +1,357 @@
+---
+sidebar_position: 6
+---
+
+# Structural nodes (SNodes)
+
+After writing the computation code, the user needs to specify the
+internal data structure hierarchy. Specifying a data structure includes
+choices at both the macro level, dictating how the data structure
+components nest with each other and the way they represent sparsity, and
+the micro level, dictating how data are grouped together (e.g. structure
+of arrays vs. array of structures). Taichi provides _Structural Nodes
+(SNodes)_ to compose the hierarchy and particular properties. These
+constructs and their semantics are listed below:
+
+- dense: A fixed-length contiguous array.
+- bitmasked: This is similar to dense, but it also uses a mask to
+  maintain sparsity information, one bit per child.
+- pointer: Store pointers instead of the whole structure to save
+  memory and maintain sparsity.
+- dynamic: Variable-length array, with a predefined maximum length. It
+  serves the role of `std::vector` in C++ or `list` in Python, and can
+  be used to maintain objects (e.g. particles) contained in a block.
+
+:::note
+
+Supported SNode types on each backend:
+
+|   SNode   | CPU/CUDA | OpenGL | Metal | C source |
+| :-------: | :------: | :----: | :---: | :------: |
+|   dense   |    OK    |   OK   |  OK   |    OK    |
+| bitmasked |    OK    |  N/A   |  OK   |   N/A    |
+|  pointer  |    OK    |  N/A   |  N/A  |   N/A    |
+|  dynamic  |    OK    |  PAR   |  N/A  |   N/A    |
+
+(OK: supported; PAR: partial support; N/A: not available)
+:::
+
+See `layout`{.interpreted-text role="ref"} for more details. `ti.root`
+is the root node of the data structure.
+
+::: {.function}
+snode.place(x, \...)
+
+parameter snode
+
+: (SNode) where to place
+
+parameter a
+
+: (ti.field) field(s) to be placed
+
+return
+
+: (SNode) the `snode` itself
+
+The following code places two 0-D fields named `x` and `y`:
+
+    x = ti.field(dtype=ti.i32)
+    y = ti.field(dtype=ti.f32)
+    ti.root.place(x, y)
+    assert x.snode.parent == y.snode.parent
+
+:::
+
+::: {.function}
+field.shape
+
+parameter a
+
+: (ti.field)
+
+return
+
+: (tuple of integers) the shape of field
+
+Equivalent to `field.snode.shape`.
+
+For example,
+
+    ti.root.dense(ti.ijk, (3, 5, 4)).place(x)
+    x.shape  # returns (3, 5, 4)
+
+:::
+
+::: {.function}
+field.snode
+
+parameter a
+
+: (ti.field)
+
+return
+
+: (SNode) the structual node where `field` is placed
+
+```{=html}
+<!-- -->
+```
+
+    x = ti.field(dtype=ti.i32)
+    y = ti.field(dtype=ti.f32)
+    blk1 = ti.root.dense(ti.i, 4)
+    blk1.place(x, y)
+    assert x.snode == blk1
+
+:::
+
+::: {.function}
+snode.shape
+
+parameter snode
+
+: (SNode)
+
+return
+
+: (tuple) the size of node along that axis
+
+```{=html}
+<!-- -->
+```
+
+    blk1 = ti.root
+    blk2 = blk1.dense(ti.i,  3)
+    blk3 = blk2.dense(ti.jk, (5, 2))
+    blk4 = blk3.dense(ti.k,  2)
+    blk1.shape  # ()
+    blk2.shape  # (3, )
+    blk3.shape  # (3, 5, 2)
+    blk4.shape  # (3, 5, 4)
+
+:::
+
+::: {.function}
+snode.parent(n = 1)
+
+parameter snode
+
+: (SNode)
+
+parameter n
+
+: (optional, scalar) the number of steps, i.e. `n=1` for parent, `n=2`
+grandparent, etc.
+
+return
+
+: (SNode) the parent node of `snode`
+
+```{=html}
+<!-- -->
+```
+
+    blk1 = ti.root.dense(ti.i, 8)
+    blk2 = blk1.dense(ti.j, 4)
+    blk3 = blk2.bitmasked(ti.k, 6)
+    blk1.parent()  # ti.root
+    blk2.parent()  # blk1
+    blk3.parent()  # blk2
+    blk3.parent(1) # blk2
+    blk3.parent(2) # blk1
+    blk3.parent(3) # ti.root
+    blk3.parent(4) # None
+
+:::
+
+## Node types
+
+::: {.function}
+snode.dense(indices, shape)
+
+parameter snode
+
+: (SNode) parent node where the child is derived from
+
+parameter indices
+
+: (Index or Indices) indices used for this node
+
+parameter shape
+
+: (scalar or tuple) shape of the field
+
+return
+
+: (SNode) the derived child node
+
+The following code places a 1-D field of size `3`:
+
+    x = ti.field(dtype=ti.i32)
+    ti.root.dense(ti.i, 3).place(x)
+
+The following code places a 2-D field of shape `(3, 4)`:
+
+    x = ti.field(dtype=ti.i32)
+    ti.root.dense(ti.ij, (3, 4)).place(x)
+
+:::note
+
+If `shape` is a scalar and there are multiple indices, then `shape` will
+be automatically expanded to fit the number of indices. For example,
+
+    snode.dense(ti.ijk, 3)
+
+is equivalent to
+
+    snode.dense(ti.ijk, (3, 3, 3))
+
+:::
+:::
+
+::: {.function}
+snode.dynamic(index, size, chunk_size = None)
+
+parameter snode
+
+: (SNode) parent node where the child is derived from
+
+parameter index
+
+: (Index) the `dynamic` node indices
+
+parameter size
+
+: (scalar) the maximum size of the dynamic node
+
+parameter chunk_size
+
+: (optional, scalar) the number of elements in each dynamic memory
+allocation chunk
+
+return
+
+: (SNode) the derived child node
+
+`dynamic` nodes acts like `std::vector` in C++ or `list` in Python.
+Taichi's dynamic memory allocation system allocates its memory on the
+fly.
+
+The following places a 1-D dynamic field of maximum size `16`:
+
+    ti.root.dynamic(ti.i, 16).place(x)
+
+:::
+
+::: {.function}
+snode.bitmasked
+:::
+
+::: {.function}
+snode.pointer
+:::
+
+::: {.function}
+snode.hash
+
+TODO: add descriptions here
+:::
+
+## Working with `dynamic` SNodes
+
+::: {.function}
+ti.length(snode, indices)
+
+parameter snode
+
+: (SNode, dynamic)
+
+parameter indices
+
+: (scalar or tuple of scalars) the `dynamic` node indices
+
+return
+
+: (int32) the current size of the dynamic node
+:::
+
+::: {.function}
+ti.append(snode, indices, val)
+
+parameter snode
+
+: (SNode, dynamic)
+
+parameter indices
+
+: (scalar or tuple of scalars) the `dynamic` node indices
+
+parameter val
+
+: (depends on SNode data type) value to store
+
+return
+
+: (int32) the size of the dynamic node, before appending
+
+Inserts `val` into the `dynamic` node with indices `indices`.
+:::
+
+## Taichi fields like powers of two
+
+Non-power-of-two field dimensions are promoted into powers of two and
+thus these fields will occupy more virtual address space. For example, a
+(dense) field of size `(18, 65)` will be materialized as `(32, 128)`.
+
+## Indices
+
+::: {.attribute}
+ti.i
+:::
+
+::: {.attribute}
+ti.j
+:::
+
+::: {.attribute}
+ti.k
+:::
+
+::: {.attribute}
+ti.ij
+:::
+
+::: {.attribute}
+ti.ji
+:::
+
+::: {.attribute}
+ti.jk
+:::
+
+::: {.attribute}
+ti.kj
+:::
+
+::: {.attribute}
+ti.ik
+:::
+
+::: {.attribute}
+ti.ki
+:::
+
+::: {.attribute}
+ti.ijk
+:::
+
+::: {.attribute}
+ti.ijkl
+:::
+
+::: {.function}
+ti.indices(a, b, \...)
+:::
+
+(TODO)
diff --git a/docs/lang/api/vector.md b/docs/lang/api/vector.md
new file mode 100644
index 0000000000000..93d702b0096a8
--- /dev/null
+++ b/docs/lang/api/vector.md
@@ -0,0 +1,432 @@
+---
+sidebar_position: 2
+---
+
+# Vectors
+
+A vector in Taichi can have two forms:
+
+- as a temporary local variable. An `n` component vector consists of
+  `n` scalar values.
+- as an element of a global field. In this case, the field is an
+  N-dimensional array of `n` component vectors.
+
+In fact, `Vector` is simply an alias of `Matrix`, just with `m = 1`. See
+`matrix`{.interpreted-text role="ref"} and `tensor`{.interpreted-text
+role="ref"} for more details.
+
+## Declaration
+
+### As global vector fields
+
+::: {.function}
+ti.Vector.field(n, dtype, shape = None, offset = None)
+
+parameter n
+
+: (scalar) the number of components in the vector
+
+parameter dtype
+
+: (DataType) data type of the components
+
+parameter shape
+
+: (optional, scalar or tuple) shape of the vector field, see
+`tensor`{.interpreted-text role="ref"}
+
+parameter offset
+
+: (optional, scalar or tuple) see `offset`{.interpreted-text
+role="ref"}
+
+For example, this creates a 3-D vector field of the shape of `5x4`: :
+
+    # Python-scope
+    a = ti.Vector.field(3, dtype=ti.f32, shape=(5, 4))
+
+:::
+
+:::note
+
+In Python-scope, `ti.field` declares a scalar field
+[Scalar fields](./scalar_field.md)), while `ti.Vector.field`
+declares a vector field.
+:::
+
+### As a temporary local variable
+
+::: {.function}
+ti.Vector(\[x, y, \...\])
+
+parameter x
+
+: (scalar) the first component of the vector
+
+parameter y
+
+: (scalar) the second component of the vector
+
+For example, this creates a 3D vector with components (2, 3, 4): :
+
+    # Taichi-scope
+    a = ti.Vector([2, 3, 4])
+
+:::
+
+## Accessing components
+
+### As global vector fields
+
+::: {.attribute}
+a\[p, q, \...\]\[i\]
+
+parameter a
+
+: (ti.Vector.field) the vector
+
+parameter p
+
+: (scalar) index of the first field dimension
+
+parameter q
+
+: (scalar) index of the second field dimension
+
+parameter i
+
+: (scalar) index of the vector component
+
+This extracts the first component of vector `a[6, 3]`: :
+
+    x = a[6, 3][0]
+
+    # or
+    vec = a[6, 3]
+    x = vec[0]
+
+:::
+
+:::note
+
+**Always** use two pairs of square brackets to access scalar elements
+from vector fields.
+
+- The indices in the first pair of brackets locate the vector inside
+  the vector fields;
+- The indices in the second pair of brackets locate the scalar
+  element inside the vector.
+
+For 0-D vector fields, indices in the first pair of brackets should be
+`[None]`.
+:::
+
+### As a temporary local variable
+
+::: {.attribute}
+a\[i\]
+
+parameter a
+
+: (Vector) the vector
+
+parameter i
+
+: (scalar) index of the component
+
+For example, this extracts the first component of vector `a`: :
+
+    x = a[0]
+
+This sets the second component of `a` to 4: :
+
+    a[1] = 4
+
+TODO: add descriptions about `a(i, j)`
+:::
+
+### XYZW vector component accessors
+
+We also provide four handy accessors for the first four vector
+components:
+
+::: {.attribute}
+a.x
+
+Same as `a[0]`.
+:::
+
+::: {.attribute}
+a.y
+
+Same as `a[1]`.
+:::
+
+::: {.attribute}
+a.z
+
+Same as `a[2]`.
+:::
+
+::: {.attribute}
+a.w
+
+Same as `a[3]`.
+:::
+
+::: {.note}
+::: {.title}
+Note
+:::
+
+XYZW accessors can be used for both reading and writing:
+
+    v = ti.Vector([2, 3, 4])
+    print(v.x)  # 2
+    print(v.y)  # 3
+    print(v.z)  # 4
+    v.y = 8
+    print(v.y)  # 8
+
+XYZW accessors can be used in both Taichi-scope and Python-scope.
+
+XYZW accessors don't work for `ti.Matrix`.
+
+For GLSL-alike shuffling accessors, consider using
+[taichi_glsl](https://taichi-glsl.readthedocs.io):
+
+    import taichi_glsl as tl
+
+    v = tl.vec(2, 3, 4)
+    print(v.xy)  # [2 3]
+    print(v._xYzX_z)  # [0 2 -3 4 -2 0 4]
+
+:::
+
+## Methods
+
+::: {.function}
+a.norm(eps = 0)
+
+parameter a
+
+: (ti.Vector)
+
+parameter eps
+
+: (optional, scalar) a safe-guard value for `sqrt`, usually 0. See the
+note below.
+
+return
+
+: (scalar) the magnitude / length / norm of vector
+
+For example, :
+
+    a = ti.Vector([3, 4])
+    a.norm() # sqrt(3*3 + 4*4 + 0) = 5
+
+`a.norm(eps)` is equivalent to `ti.sqrt(a.dot(a) + eps)`
+:::
+
+:::note
+
+To safeguard the operator's gradient on zero vectors during
+differentiable programming, set `eps` to a small, positive value such as
+`1e-5`.
+:::
+
+::: {.function}
+a.norm_sqr()
+
+parameter a
+
+: (ti.Vector)
+
+return
+
+: (scalar) the square of the magnitude / length / norm of vector
+
+For example, :
+
+    a = ti.Vector([3, 4])
+    a.norm_sqr() # 3*3 + 4*4 = 25
+
+`a.norm_sqr()` is equivalent to `a.dot(a)`
+:::
+
+::: {.function}
+a.normalized()
+
+parameter a
+
+: (ti.Vector)
+
+return
+
+: (ti.Vector) the normalized / unit vector of `a`
+
+For example, :
+
+    a = ti.Vector([3, 4])
+    a.normalized() # [3 / 5, 4 / 5]
+
+`a.normalized()` is equivalent to `a / a.norm()`.
+:::
+
+::: {.function}
+a.dot(b)
+
+parameter a
+
+: (ti.Vector)
+
+parameter b
+
+: (ti.Vector)
+
+return
+
+: (scalar) the dot (inner) product of `a` and `b`
+
+E.g., :
+
+    a = ti.Vector([1, 3])
+    b = ti.Vector([2, 4])
+    a.dot(b) # 1*2 + 3*4 = 14
+
+:::
+
+::: {.function}
+a.cross(b)
+
+parameter a
+
+: (ti.Vector, 2 or 3 components)
+
+parameter b
+
+: (ti.Vector of the same size as a)
+
+return
+
+: (scalar (for 2D inputs), or 3D Vector (for 3D inputs)) the cross
+product of `a` and `b`
+
+We use a right-handed coordinate system. E.g., :
+
+    a = ti.Vector([1, 2, 3])
+    b = ti.Vector([4, 5, 6])
+    c = ti.cross(a, b)
+    # c = [2*6 - 5*3, 4*3 - 1*6, 1*5 - 4*2] = [-3, 6, -3]
+
+    p = ti.Vector([1, 2])
+    q = ti.Vector([4, 5])
+    r = ti.cross(a, b)
+    # r = 1*5 - 4*2 = -3
+
+:::
+
+::: {.function}
+a.outer_product(b)
+
+parameter a
+
+: (ti.Vector)
+
+parameter b
+
+: (ti.Vector)
+
+return
+
+: (ti.Matrix) the outer product of `a` and `b`
+
+E.g., :
+
+    a = ti.Vector([1, 2])
+    b = ti.Vector([4, 5, 6])
+    c = ti.outer_product(a, b) # NOTE: c[i, j] = a[i] * b[j]
+    # c = [[1*4, 1*5, 1*6], [2*4, 2*5, 2*6]]
+
+:::
+
+:::note
+
+The outer product should not be confused with the cross product
+(`ti.cross`). For example, `a` and `b` do not have to be 2- or
+3-component vectors for this function.
+:::
+
+::: {.function}
+a.cast(dt)
+
+parameter a
+
+: (ti.Vector)
+
+parameter dt
+
+: (DataType)
+
+return
+
+: (ti.Vector) vector with all components of `a` casted into type `dt`
+
+E.g., :
+
+    # Taichi-scope
+    a = ti.Vector([1.6, 2.3])
+    a.cast(ti.i32) # [2, 3]
+
+See `type`{.interpreted-text role="ref"} for more details.
+:::
+
+:::note
+
+Vectors are special matrices with only 1 column. In fact, `ti.Vector` is
+just an alias of `ti.Matrix`.
+:::
+
+## Metadata
+
+::: {.attribute}
+a.n
+
+parameter a
+
+: (ti.Vector or ti.Vector.field)
+
+return
+
+: (scalar) return the dimensionality of vector `a`
+
+E.g., :
+
+    # Taichi-scope
+    a = ti.Vector([1, 2, 3])
+    a.n  # 3
+
+    # Python-scope
+    a = ti.Vector.field(3, dtype=ti.f32, shape=(4, 5))
+    a.n  # 3
+
+    See :ref:`meta` for more details.
+
+:::
+
+:::note
+
+When used as a global vector field, it will additionally contain all the
+metadata that a scalar field would have, E.g.:
+
+    # Python-scope
+    a = ti.Vector.field(3, dtype=ti.f32, shape=(4, 5))
+    a.shape  # (4, 5)
+    a.dtype  # ti.f32
+
+:::
+
+## Element-wise operations (WIP)
+
+TODO: add element wise operations docs
diff --git a/docs/lang/articles/advanced/differentiable_programming.md b/docs/lang/articles/advanced/differentiable_programming.md
new file mode 100644
index 0000000000000..cc0f8ad7d3183
--- /dev/null
+++ b/docs/lang/articles/advanced/differentiable_programming.md
@@ -0,0 +1,253 @@
+---
+sidebar_position: 4
+---
+
+# Differentiable programming
+
+We suggest starting with the `ti.Tape()`, and then migrate to more
+advanced differentiable programming using the `kernel.grad()` syntax if
+necessary.
+
+## Introduction
+
+For example, you have the following kernel:
+
+```python
+x = ti.field(float, ())
+y = ti.field(float, ())
+
+@ti.kernel
+def compute_y():
+    y[None] = ti.sin(x[None])
+```
+
+Now if you want to get the derivative of y corresponding to x, i.e.,
+dy/dx. You may want to implement the derivative kernel by yourself:
+
+```python
+x = ti.field(float, ())
+y = ti.field(float, ())
+dy_dx = ti.field(float, ())
+
+@ti.kernel
+def compute_dy_dx():
+    dy_dx[None] = ti.cos(x[None])
+```
+
+But wait, what if I changed the original `compute_y`? We will have to
+recalculate the derivative by hand and rewrite `compute_dy_dx` again,
+which is very error-prone and not convenient at all.
+
+If you run into this situation, don't worry! Taichi provides a handy autodiff system that can help you obtain the derivative of a kernel without any pain!
+
+## Using `ti.Tape()`
+
+Let's still take the `compute_y` in above example for explaination.
+What's the most convienent way to obtain a kernel that computes x to
+$dy/dx$?
+
+1.  Use the `needs_grad=True` option when declaring fields involved in
+    the derivative chain.
+2.  Use `with ti.Tape(y):` to embrace the invocation into kernel(s) you
+    want to compute derivative.
+3.  Now `x.grad[None]` is the dy/dx value at current x.
+
+```python
+x = ti.field(float, (), needs_grad=True)
+y = ti.field(float, (), needs_grad=True)
+
+@ti.kernel
+def compute_y():
+    y[None] = ti.sin(x[None])
+
+with ti.Tape(y):
+    compute_y()
+
+print('dy/dx =', x.grad[None])
+print('at x =', x[None])
+```
+
+It's equivalant to:
+
+```python
+x = ti.field(float, ())
+y = ti.field(float, ())
+dy_dx = ti.field(float, ())
+
+@ti.kernel
+def compute_dy_dx():
+    dy_dx[None] = ti.cos(x[None])
+
+compute_dy_dx()
+
+print('dy/dx =', dy_dx[None])
+print('at x =', x[None])
+```
+
+### Usage example
+
+For a physical simulation, sometimes it could be easy to compute the
+energy but hard to compute the force on each particles.
+
+But recall that we can differentiate (negative) potential energy to get
+forces. a.k.a.: $F_i = -dU / dx_i$. So once you've write a kernel that
+is able to compute the potential energy, you may use Taichi's autodiff
+system to obtain the derivative of it and then the force on each
+particles.
+
+Take
+[examples/ad_gravity.py](https://github.com/taichi-dev/taichi/blob/master/examples/ad_gravity.py)
+as an example:
+
+```python
+import taichi as ti
+ti.init()
+
+N = 8
+dt = 1e-5
+
+x = ti.Vector.field(2, float, N, needs_grad=True)  # position of particles
+v = ti.Vector.field(2, float, N)  # velocity of particles
+U = ti.field(float, (), needs_grad=True)  # potential energy
+
+
+@ti.kernel
+def compute_U():
+    for i, j in ti.ndrange(N, N):
+        r = x[i] - x[j]
+        # r.norm(1e-3) is equivalent to ti.sqrt(r.norm()**2 + 1e-3)
+        # This is to prevent 1/0 error which can cause wrong derivative
+        U[None] += -1 / r.norm(1e-3)  # U += -1 / |r|
+
+
+@ti.kernel
+def advance():
+    for i in x:
+        v[i] += dt * -x.grad[i]  # dv/dt = -dU/dx
+    for i in x:
+        x[i] += dt * v[i]  # dx/dt = v
+
+
+def substep():
+    with ti.Tape(U):
+        # every kernel invocation within this indent scope
+        # will also be accounted into the partial derivate of U
+        # with corresponding input variables like x.
+        compute_U()  # will also computes dU/dx and save in x.grad
+    advance()
+
+
+@ti.kernel
+def init():
+    for i in x:
+        x[i] = [ti.random(), ti.random()]
+
+
+init()
+gui = ti.GUI('Autodiff gravity')
+while gui.running:
+    for i in range(50):
+        substep()
+    print('U = ', U[None])
+    gui.circles(x.to_numpy(), radius=3)
+    gui.show()
+```
+
+:::note
+
+The argument `U` to `ti.Tape(U)` must be a 0D field.
+
+For using autodiff with multiple output variables, please see the
+`kernel.grad()` usage below.
+:::
+
+:::note
+
+`ti.Tape(U)` will automatically set _`U[None]`_ to 0 on
+start up.
+:::
+
+:::tip
+See
+[examples/mpm_lagrangian_forces.py](https://github.com/taichi-dev/taichi/blob/master/examples/mpm_lagrangian_forces.py)
+and
+[examples/fem99.py](https://github.com/taichi-dev/taichi/blob/master/examples/fem99.py)
+for examples on using autodiff for MPM and FEM.
+:::
+
+## Using `kernel.grad()`
+
+TODO: Documentation WIP.
+
+## Kernel Simplicity Rule
+
+Unlike tools such as TensorFlow where **immutable** output buffers are
+generated, the **imperative** programming paradigm adopted in Taichi
+allows programmers to freely modify global fields.
+
+To make automatic differentiation well-defined under this setting, we
+make the following assumption on Taichi programs for differentiable
+programming:
+
+**Global Data Access Rules:**
+
+- If a global field element is written more than once, then starting
+  from the second write, the write **must** come in the form of an
+  atomic add ("accumulation\", using `ti.atomic_add` or simply
+  `+=`).
+- No read accesses happen to a global field element, until its
+  accumulation is done.
+
+**Kernel Simplicity Rule:** Kernel body consists of multiple [simply
+nested]{.title-ref} for-loops. I.e., each for-loop can either contain
+exactly one (nested) for-loop (and no other statements), or a group of
+statements without loops.
+
+Example:
+
+```python
+@ti.kernel
+def differentiable_task():
+    for i in x:
+        x[i] = y[i]
+
+    for i in range(10):
+        for j in range(20):
+            for k in range(300):
+                ... do whatever you want, as long as there are no loops
+
+    # Not allowed. The outer for loop contains two for loops
+    for i in range(10):
+        for j in range(20):
+            ...
+        for j in range(20):
+            ...
+```
+
+Taichi programs that violate this rule will result in an error.
+
+:::note
+**static for-loops** (e.g. `for i in ti.static(range(4))`) will get
+unrolled by the Python frontend preprocessor and therefore does not
+count as a level of loop.
+:::
+
+## DiffTaichi
+
+The [DiffTaichi repo](https://github.com/yuanming-hu/difftaichi)
+contains 10 differentiable physical simulators built with Taichi
+differentiable programming. A few examples with neural network
+controllers optimized using differentiable simulators and brute-force
+gradient descent:
+
+![image](https://github.com/yuanming-hu/public_files/raw/master/learning/difftaichi/ms3_final-cropped.gif)
+
+![image](https://github.com/yuanming-hu/public_files/raw/master/learning/difftaichi/rb_final2.gif)
+
+![image](https://github.com/yuanming-hu/public_files/raw/master/learning/difftaichi/diffmpm3d.gif)
+
+:::tip
+Check out [the DiffTaichi paper](https://arxiv.org/pdf/1910.00935.pdf)
+and [video](https://www.youtube.com/watch?v=Z1xvAZve9aE) to learn more
+about Taichi differentiable programming.
+:::
diff --git a/docs/lang/articles/advanced/layout.md b/docs/lang/articles/advanced/layout.md
new file mode 100644
index 0000000000000..6ce8f69a6487d
--- /dev/null
+++ b/docs/lang/articles/advanced/layout.md
@@ -0,0 +1,274 @@
+---
+sidebar_position: 2
+---
+
+# Advanced dense layouts
+
+Fields ([Scalar fields](../../api/scalar_field.md)) can be _placed_
+in a specific shape and _layout_. Defining a proper layout can be
+critical to performance, especially for memory-bound applications. A
+carefully designed data layout can significantly improve cache/TLB-hit
+rates and cacheline utilization. Although when performance is not the
+first priority, you probably don't have to worry about it.
+
+Taichi decouples algorithms from data layouts, and the Taichi compiler
+automatically optimizes data accesses on a specific data layout. These
+Taichi features allow programmers to quickly experiment with different
+data layouts and figure out the most efficient one on a specific task
+and computer architecture.
+
+In Taichi, the layout is defined in a recursive manner. See
+[Structural nodes (SNodes)](../../api/snode.md) for more details about how this
+works. We suggest starting with the default layout specification (simply
+by specifying `shape` when creating fields using
+`ti.field/ti.Vector.field/ti.Matrix.field`), and then migrate to more
+advanced layouts using the `ti.root.X` syntax if necessary.
+
+## From `shape` to `ti.root.X`
+
+For example, this declares a 0-D field:
+
+```python {1-2}
+x = ti.field(ti.f32)
+ti.root.place(x)
+# is equivalent to:
+x = ti.field(ti.f32, shape=())
+```
+
+This declares a 1D field of size `3`:
+
+```python {1-2}
+x = ti.field(ti.f32)
+ti.root.dense(ti.i, 3).place(x)
+# is equivalent to:
+x = ti.field(ti.f32, shape=3)
+```
+
+This declares a 2D field of shape `(3, 4)`:
+
+```python {1-2}
+x = ti.field(ti.f32)
+ti.root.dense(ti.ij, (3, 4)).place(x)
+# is equivalent to:
+x = ti.field(ti.f32, shape=(3, 4))
+```
+
+You may wonder, why not simply specify the `shape` of the field? Why
+bother using the more complex version? Good question, let's move forward and figure out why.
+
+## Row-major versus column-major
+
+Let's start with the simplest layout.
+
+Since address spaces are linear in modern computers, for 1D Taichi
+fields, the address of the `i`-th element is simply `i`.
+
+To store a multi-dimensional field, however, it has to be flattened, in
+order to fit into the 1D address space. For example, to store a 2D field
+of size `(3, 2)`, there are two ways to do this:
+
+1.  The address of `(i, j)`-th is `base + i * 2 + j` (row-major).
+2.  The address of `(i, j)`-th is `base + j * 3 + i` (column-major).
+
+To specify which layout to use in Taichi:
+
+```python
+ti.root.dense(ti.i, 3).dense(ti.j, 2).place(x)    # row-major (default)
+ti.root.dense(ti.j, 2).dense(ti.i, 3).place(y)    # column-major
+```
+
+Both `x` and `y` have the same shape of `(3, 2)`, and they can be
+accessed in the same manner, where `0 <= i < 3 && 0 <= j < 2`. They can
+be accessed in the same manner: `x[i, j]` and `y[i, j]`. However, they
+have a very different memory layouts:
+
+```
+#     address low ........................... address high
+# x:  x[0,0]   x[0,1]   x[0,2] | x[1,0]   x[1,1]   x[1,2]
+# y:  y[0,0]   y[1,0] | y[0,1]   y[1,1] | y[0,2]   y[1,2]
+```
+
+What do we find here? `x` first increases the first index (i.e. row-major), while `y`
+first increases the second index (i.e. column-major).
+
+:::note
+
+For those people from C/C++, here's what they look like:
+
+```c
+int x[3][2];  // row-major
+int y[2][3];  // column-major
+
+for (int i = 0; i < 3; i++) {
+    for (int j = 0; j < 2; j++) {
+        do_something( x[i][j] );
+        do_something( y[j][i] );
+    }
+}
+```
+
+:::
+
+## Array of Structures (AoS), Structure of Arrays (SoA)
+
+Fields of same size can be placed together.
+
+For example, this places two 1D fields of size `3` (array of structure, AoS):
+
+```python
+ti.root.dense(ti.i, 3).place(x, y)
+```
+
+Their memory layout:
+
+```
+#  address low ............. address high
+#  x[0]   y[0] | x[1]  y[1] | x[2]   y[2]
+```
+
+By contrast, this places two field placed separately (structure of array, SoA):
+
+```python
+ti.root.dense(ti.i, 3).place(x)
+ti.root.dense(ti.i, 3).place(y)
+```
+
+Now, their memory layout:
+
+```
+#  address low ............. address high
+#  x[0]  x[1]   x[2] | y[0]   y[1]   y[2]
+```
+
+Normally, you don't have to worry about the performance nuances between
+different layouts, and should just define the simplest layout as a
+start. However, locality sometimes have a significant impact on the
+performance, especially when the field is huge.
+
+**To improve spatial locality of memory accesses (i.e. cache hit rate /
+cacheline utilization), it's sometimes helpful to place the data
+elements within relatively close storage locations if they are often
+accessed together.** Take a simple 1D wave equation solver for example:
+
+```python
+N = 200000
+pos = ti.field(ti.f32)
+vel = ti.field(ti.f32)
+ti.root.dense(ti.i, N).place(pos)
+ti.root.dense(ti.i, N).place(vel)
+
+@ti.kernel
+def step():
+    pos[i] += vel[i] * dt
+    vel[i] += -k * pos[i] * dt
+```
+
+Here, we placed `pos` and `vel` seperately. So the distance in address
+space between `pos[i]` and `vel[i]` is `200000`. This will result in a
+poor spatial locality and lots of cache-misses, which damages the
+performance. A better placement is to place them together:
+
+```python
+ti.root.dense(ti.i, N).place(pos, vel)
+```
+
+Then `vel[i]` is placed right next to `pos[i]`, this can increase the
+cache-hit rate and therefore increase the performance.
+
+## Flat layouts versus hierarchical layouts
+
+By default, when allocating a `ti.field`, it follows the simplest data
+layout.
+
+```python
+val = ti.field(ti.f32, shape=(32, 64, 128))
+```
+
+C++ equivalent:
+
+```cpp
+float val[32][64][128]
+```
+
+However, at times this data layout can be suboptimal for certain types
+of computer graphics tasks. For example, `val[i, j, k]` and
+`val[i + 1, j, k]` are very far away (`32 KB`) from each other, and
+leads to poor access locality under certain computation tasks.
+Specifically, in tasks such as texture trilinear interpolation, the two
+elements are not even within the same `4KB` pages, creating a huge
+cache/TLB pressure.
+
+A better layout might be
+
+```python
+val = ti.field(ti.f32)
+ti.root.dense(ti.ijk, (8, 16, 32)).dense(ti.ijk, (4, 4, 4)).place(val)
+```
+
+This organizes `val` in `4x4x4` blocks, so that with high probability
+`val[i, j, k]` and its neighbours are close to each other (i.e., in the
+same cacheline or memory page).
+
+## Struct-fors on advanced dense data layouts
+
+Struct-fors on nested dense data structures will automatically follow
+their data order in memory. For example, if 2D scalar field `A` is
+stored in row-major order,
+
+```python
+for i, j in A:
+    A[i, j] += 1
+```
+
+will iterate over elements of `A` following row-major order. If `A` is
+column-major, then the iteration follows the column-major order.
+
+If `A` is hierarchical, it will be iterated level by level. This
+maximizes the memory bandwidth utilization in most cases.
+
+Struct-for loops on sparse fields follow the same philosophy, and will
+be discussed further in [Sparse computation](./sparse.md).
+
+## Examples
+
+2D matrix, row-major
+
+```python
+A = ti.field(ti.f32)
+ti.root.dense(ti.ij, (256, 256)).place(A)
+```
+
+2D matrix, column-major
+
+```python
+A = ti.field(ti.f32)
+ti.root.dense(ti.ji, (256, 256)).place(A) # Note ti.ji instead of ti.ij
+```
+
+_8x8_ blocked 2D array of size _1024x1024_
+
+```python
+density = ti.field(ti.f32)
+ti.root.dense(ti.ij, (128, 128)).dense(ti.ij, (8, 8)).place(density)
+```
+
+3D Particle positions and velocities, AoS
+
+```python
+pos = ti.Vector.field(3, dtype=ti.f32)
+vel = ti.Vector.field(3, dtype=ti.f32)
+ti.root.dense(ti.i, 1024).place(pos, vel)
+# equivalent to
+ti.root.dense(ti.i, 1024).place(pos(0), pos(1), pos(2), vel(0), vel(1), vel(2))
+```
+
+3D Particle positions and velocities, SoA
+
+```python
+pos = ti.Vector.field(3, dtype=ti.f32)
+vel = ti.Vector.field(3, dtype=ti.f32)
+for i in range(3):
+    ti.root.dense(ti.i, 1024).place(pos(i))
+for i in range(3):
+    ti.root.dense(ti.i, 1024).place(vel(i))
+```
diff --git a/docs/lang/articles/advanced/meta.md b/docs/lang/articles/advanced/meta.md
new file mode 100644
index 0000000000000..3e182f452e563
--- /dev/null
+++ b/docs/lang/articles/advanced/meta.md
@@ -0,0 +1,174 @@
+---
+sidebar_position: 1
+---
+
+# Metaprogramming
+
+Taichi provides metaprogramming infrastructures. Metaprogramming can
+
+- Unify the development of dimensionality-dependent code, such as
+  2D/3D physical simulations
+- Improve run-time performance by from run-time costs to compile time
+- Simplify the development of Taichi standard library
+
+Taichi kernels are _lazily instantiated_ and a lot of computation can
+happen at _compile-time_. Every kernel in Taichi is a template kernel,
+even if it has no template arguments.
+
+## Template metaprogramming
+
+You may use `ti.template()` as a type hint to pass a field as an
+argument. For example:
+
+```python {2}
+@ti.kernel
+def copy(x: ti.template(), y: ti.template()):
+    for i in x:
+        y[i] = x[i]
+
+a = ti.field(ti.f32, 4)
+b = ti.field(ti.f32, 4)
+c = ti.field(ti.f32, 12)
+d = ti.field(ti.f32, 12)
+copy(a, b)
+copy(c, d)
+```
+
+As shown in the example above, template programming may enable us to
+reuse our code and provide more flexibility.
+
+## Dimensionality-independent programming using grouped indices
+
+However, the `copy` template shown above is not perfect. For example, it
+can only be used to copy 1D fields. What if we want to copy 2D fields?
+Do we have to write another kernel?
+
+```python
+@ti.kernel
+def copy2d(x: ti.template(), y: ti.template()):
+    for i, j in x:
+        y[i, j] = x[i, j]
+```
+
+:tada: Not necessary! Taichi provides `ti.grouped` syntax which enables you to
+pack loop indices into a grouped vector to unify kernels of different
+dimensionalities. For example:
+
+```python {3-10,15-16}
+@ti.kernel
+def copy(x: ti.template(), y: ti.template()):
+    for I in ti.grouped(y):
+        # I is a vector with same dimensionality with x and data type i32
+        # If y is 0D, then I = ti.Vector([]), which is equivalent to `None` when used in x[I]
+        # If y is 1D, then I = ti.Vector([i])
+        # If y is 2D, then I = ti.Vector([i, j])
+        # If y is 3D, then I = ti.Vector([i, j, k])
+        # ...
+        x[I] = y[I]
+
+@ti.kernel
+def array_op(x: ti.template(), y: ti.template()):
+    # if field x is 2D:
+    for I in ti.grouped(x): # I is simply a 2D vector with data type i32
+        y[I + ti.Vector([0, 1])] = I[0] + I[1]
+
+    # then it is equivalent to:
+    for i, j in x:
+        y[i, j + 1] = i + j
+```
+
+## Field metadata
+
+Sometimes it is useful to get the data type (`field.dtype`) and shape
+(`field.shape`) of fields. These attributes can be accessed in both
+Taichi- and Python-scopes.
+
+```python {2-6}
+@ti.func
+def print_field_info(x: ti.template()):
+    print('Field dimensionality is', len(x.shape))
+    for i in ti.static(range(len(x.shape))):
+        print('Size alone dimension', i, 'is', x.shape[i])
+    ti.static_print('Field data type is', x.dtype)
+```
+
+See [Scalar fields](../../api/scalar_field.md) for more details.
+
+:::note
+For sparse fields, the full domain shape will be returned.
+:::
+
+## Matrix & vector metadata
+
+Getting the number of matrix columns and rows will allow you to write
+dimensionality-independent code. For example, this can be used to unify
+2D and 3D physical simulators.
+
+`matrix.m` equals to the number of columns of a matrix, while `matrix.n`
+equals to the number of rows of a matrix. Since vectors are considered
+as matrices with one column, `vector.n` is simply the dimensionality of
+the vector.
+
+```python {4-5,7-8}
+@ti.kernel
+def foo():
+    matrix = ti.Matrix([[1, 2], [3, 4], [5, 6]])
+    print(matrix.n)  # 3
+    print(matrix.m)  # 2
+    vector = ti.Vector([7, 8, 9])
+    print(vector.n)  # 3
+    print(vector.m)  # 1
+```
+
+## Compile-time evaluations
+
+Using compile-time evaluation will allow certain computations to happen
+when kernels are being instantiated. This saves the overhead of those
+computations at runtime.
+
+- Use `ti.static` for compile-time branching (for those who come from
+  C++17, this is [if
+  constexpr](https://en.cppreference.com/w/cpp/language/if).):
+
+```python {5}
+enable_projection = True
+
+@ti.kernel
+def static():
+  if ti.static(enable_projection): # No runtime overhead
+    x[0] = 1
+```
+
+- Use `ti.static` for forced loop unrolling:
+
+```python {3}
+@ti.kernel
+def func():
+  for i in ti.static(range(4)):
+      print(i)
+
+  # is equivalent to:
+  print(0)
+  print(1)
+  print(2)
+  print(3)
+```
+
+## When to use for loops with `ti.static`
+
+There are several reasons why `ti.static` for loops should be used.
+
+- Loop unrolling for performance.
+- Loop over vector/matrix elements. Indices into Taichi matrices must be a compile-time constant. Indexing into taichi fields can be run-time variables. For example, if you want to access a vector field `x`, accessed as `x[field_index][vector_component_index]`. The first index can be variable, yet the second must be a constant.
+
+For example, code for resetting this vector fields should be
+
+```python {4}
+@ti.kernel
+def reset():
+  for i in x:
+    for j in ti.static(range(x.n)):
+      # The inner loop must be unrolled since j is a vector index instead
+      # of a global field index.
+      x[i][j] = 0
+```
diff --git a/docs/lang/articles/advanced/odop.md b/docs/lang/articles/advanced/odop.md
new file mode 100644
index 0000000000000..8b186690ecd78
--- /dev/null
+++ b/docs/lang/articles/advanced/odop.md
@@ -0,0 +1,90 @@
+---
+sidebar_position: 6
+---
+
+# Objective data-oriented programming
+
+Taichi is a
+[data-oriented](https://en.wikipedia.org/wiki/Data-oriented_design)
+programming (DOP) language. However, simple DOP makes modularization
+hard.
+
+To allow modularized code, Taichi borrow some concepts from
+object-oriented programming (OOP).
+
+For convenience, let's call the hybrid scheme **objective data-oriented
+programming** (ODOP).
+
+:::note
+More documentation on this topic is on the way ...
+:::
+
+A brief example:
+
+```python
+import taichi as ti
+
+ti.init()
+
+@ti.data_oriented
+class Array2D:
+  def __init__(self, n, m, increment):
+    self.n = n
+    self.m = m
+    self.val = ti.field(ti.f32)
+    self.total = ti.field(ti.f32)
+    self.increment = increment
+    ti.root.dense(ti.ij, (self.n, self.m)).place(self.val)
+    ti.root.place(self.total)
+
+  @staticmethod
+  @ti.func
+  def clamp(x):  # Clamp to [0, 1)
+      return max(0, min(1 - 1e-6, x))
+
+  @ti.kernel
+  def inc(self):
+    for i, j in self.val:
+      ti.atomic_add(self.val[i, j], self.increment)
+
+  @ti.kernel
+  def inc2(self, increment: ti.i32):
+    for i, j in self.val:
+      ti.atomic_add(self.val[i, j], increment)
+
+  @ti.kernel
+  def reduce(self):
+    for i, j in self.val:
+      ti.atomic_add(self.total, self.val[i, j] * 4)
+
+arr = Array2D(128, 128, 3)
+
+double_total = ti.field(ti.f32, shape=())
+
+ti.root.lazy_grad()
+
+arr.inc()
+arr.inc.grad()
+assert arr.val[3, 4] == 3
+arr.inc2(4)
+assert arr.val[3, 4] == 7
+
+with ti.Tape(loss=arr.total):
+  arr.reduce()
+
+for i in range(arr.n):
+  for j in range(arr.m):
+    assert arr.val.grad[i, j] == 4
+
+@ti.kernel
+def double():
+  double_total[None] = 2 * arr.total
+
+with ti.Tape(loss=double_total):
+  arr.reduce()
+  double()
+
+for i in range(arr.n):
+  for j in range(arr.m):
+    assert arr.val.grad[i, j] == 8
+```
diff --git a/docs/lang/articles/advanced/offset.md b/docs/lang/articles/advanced/offset.md
new file mode 100644
index 0000000000000..18ceccb037393
--- /dev/null
+++ b/docs/lang/articles/advanced/offset.md
@@ -0,0 +1,40 @@
+---
+sidebar_position: 7
+---
+
+# Coordinate offsets
+
+- A Taichi field can be defined with **coordinate offsets**. The
+  offsets will move field bounds so that field origins are no longer
+  zero vectors. A typical use case is to support voxels with negative
+  coordinates in physical simulations.
+- For example, a matrix of `32x64` elements with coordinate offset
+  `(-16, 8)` can be defined as the following:
+
+```python
+a = ti.Matrix.field(2, 2, dtype=ti.f32, shape=(32, 64), offset=(-16, 8))
+```
+
+In this way, the field's indices are from `(-16, 8)` to `(16, 72)` (exclusive).
+
+```python
+a[-16, 32]  # lower left corner
+a[16, 32]   # lower right corner
+a[-16, 64]  # upper left corner
+a[16, 64]   # upper right corner
+```
+
+:::note
+The dimensionality of field shapes should **be consistent** with that of
+the offset. Otherwise, a `AssertionError` will be raised.
+:::
+
+```python
+a = ti.Matrix.field(2, 3, dtype=ti.f32, shape=(32,), offset=(-16, ))          # Works!
+b = ti.Vector.field(3, dtype=ti.f32, shape=(16, 32, 64), offset=(7, 3, -4))   # Works!
+c = ti.Matrix.field(2, 1, dtype=ti.f32, shape=None, offset=(32,))             # AssertionError
+d = ti.Matrix.field(3, 2, dtype=ti.f32, shape=(32, 32), offset=(-16, ))       # AssertionError
+e = ti.field(dtype=ti.i32, shape=16, offset=-16)                              # Works!
+f = ti.field(dtype=ti.i32, shape=None, offset=-16)                            # AssertionError
+g = ti.field(dtype=ti.i32, shape=(16, 32), offset=-16)                        # AssertionError
+```
diff --git a/docs/lang/articles/advanced/performance.md b/docs/lang/articles/advanced/performance.md
new file mode 100644
index 0000000000000..b7b85f2348c8d
--- /dev/null
+++ b/docs/lang/articles/advanced/performance.md
@@ -0,0 +1,70 @@
+---
+sidebar_position: 5
+---
+
+# Performance tuning
+
+## For-loop decorators
+
+In Taichi kernels, for-loops in the outermost scope is automatically
+parallelized.
+
+However, there are some implementation details about **how it is
+parallelized**.
+
+Taichi provides some API to modify these parameters. This allows
+advanced users to manually fine-tune the performance.
+
+For example, specifying a suitable `ti.block_dim` could yield an almost
+3x performance boost in
+[examples/mpm3d.py](https://github.com/taichi-dev/taichi/blob/master/examples/mpm3d.py).
+
+:::note
+For performance profiling utilities, see [**Profiler** section of the Contribution Guide](../misc/profiler.md).
+:::
+
+### Thread hierarchy of GPUs
+
+GPUs have a **thread hierarchy**.
+
+From small to large, the computation units are: **iteration** \<
+**thread** \< **block** \< **grid**.
+
+- **iteration**: Iteration is the **body of a for-loop**. Each
+  iteration corresponding to a specific `i` value in for-loop.
+- **thread**: Iterations are grouped into threads. Threads are the
+  minimal unit that is parallelized. All iterations within a thread
+  are executed in **serial**. We usually use 1 iteration per thread
+  for maximizing parallel performance.
+- **block**: Threads are grouped into blocks. All threads within a
+  block are executed in **parallel**. Threads within the same block
+  can share their **block local storage**.
+- **grid**: Blocks are grouped into grids. Grid is the minimal unit
+  that being **launched** from host. All blocks within a grid are
+  executed in **parallel**. In Taichi, each **parallelized for-loop**
+  is a grid.
+
+For more details, please see [the CUDA C programming
+guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy).
+The OpenGL and Metal backends follow a similar thread hierarchy.
+
+### API reference
+
+Programmers may **prepend** some decorator(s) to tweak the property of a
+for-loop, e.g.:
+
+```python
+@ti.kernel
+def func():
+    for i in range(8192):  # no decorator, use default settings
+        ...
+
+    ti.block_dim(128)      # change the property of next for-loop:
+    for i in range(8192):  # will be parallelized with block_dim=128
+        ...
+
+    for i in range(8192):  # no decorator, use default settings
+        ...
+```
+
+For details, check [The list of available decorators in API references](../../api/reference/ti.md#block_dim)
diff --git a/docs/lang/articles/advanced/sparse.md b/docs/lang/articles/advanced/sparse.md
new file mode 100644
index 0000000000000..6bb7ed63df85f
--- /dev/null
+++ b/docs/lang/articles/advanced/sparse.md
@@ -0,0 +1,15 @@
+---
+sidebar_position: 3
+---
+
+# Sparse computation
+
+The LLVM backends (CPU/CUDA) and the Metal backend offer the full functionality of spatially sparse computation in Taichi.
+
+Please read our [paper](https://yuanming.taichi.graphics/publication/2019-taichi/taichi-lang.pdf),
+watch the [introduction video](https://www.youtube.com/watch?v=wKw8LMF3Djo), or check out
+the SIGGRAPH Asia 2019 [slides](https://yuanming.taichi.graphics/publication/2019-taichi/taichi-lang-slides.pdf)
+for more details on sparse computation.
+
+[Taichi elements](https://github.com/taichi-dev/taichi_elements) implement a high-performance
+MLS-MPM solver on Taichi's sparse grids.
diff --git a/docs/lang/articles/advanced/syntax_sugars.md b/docs/lang/articles/advanced/syntax_sugars.md
new file mode 100644
index 0000000000000..7ae8d4e944441
--- /dev/null
+++ b/docs/lang/articles/advanced/syntax_sugars.md
@@ -0,0 +1,69 @@
+---
+sidebar_position: 7
+---
+
+# Syntax sugars
+
+## Aliases
+
+Creating aliases for global variables and functions with cumbersome
+names can sometimes improve readability. In Taichi, this can be done by
+assigning kernel and function local variables with `ti.static()`, which
+forces Taichi to use standard python pointer assignment.
+
+For example, consider the simple kernel:
+
+```python
+@ti.kernel
+def my_kernel():
+    for i, j in field_a:
+        field_b[i, j] = some_function(field_a[i, j])
+```
+
+The fields and function be aliased to new names with `ti.static`:
+
+```python {3}
+@ti.kernel
+def my_kernel():
+    a, b, fun = ti.static(field_a, field_b, some_function)
+    for i, j in a:
+        b[i, j] = fun(a[i, j])
+```
+
+Aliases can also be created for class members and methods, which can
+help prevent cluttering objective data-oriented programming code with
+`self`.
+
+For example, consider class kernel to compute the 2-D laplacian of some field:
+
+```python
+@ti.kernel
+def compute_laplacian(self):
+  for i, j in a:
+    self.b[i, j] = (self.a[i + 1, j] - 2.0*self.a[i, j] + self.a[i-1, j])/(self.dx**2) \
+                 + (self.a[i, j + 1] - 2.0*self.a[i, j] + self.a[i, j-1])/(self.dy**2)
+```
+
+Using `ti.static()`, it can be simplified to:
+
+```python {3-6}
+@ti.kernel
+def compute_laplacian(self):
+    a, b, dx, dy = ti.static(self.a, self.b, self.dx, self.dy)
+    for i, j in a:
+        b[i, j] = (a[i+1, j] - 2.0*a[i, j] + a[i-1, j])/(dx**2) \
+                + (a[i, j+1] - 2.0*a[i, j] + a[i, j-1])/(dy**2)
+```
+
+:::note
+`ti.static` can also be used in combination with:
+
+- `if` (compile-time
+  branching) and
+- `for` (compile-time unrolling)
+
+See [Metaprogramming](./meta.md) for more details.
+
+Here, we are using it for _compile-time const values_, i.e. the
+**field/function handles** are constants at compile time.
+:::
diff --git a/docs/lang/articles/basic/external.md b/docs/lang/articles/basic/external.md
new file mode 100644
index 0000000000000..fef58e6a38692
--- /dev/null
+++ b/docs/lang/articles/basic/external.md
@@ -0,0 +1,137 @@
+---
+sidebar_position: 4
+---
+
+# Interacting with external arrays
+
+Although Taichi fields are mainly used in Taichi-scope, in some cases
+efficiently manipulating Taichi field data in Python-scope could also be
+helpful.
+
+We provide various interfaces to copy the data between Taichi fields and
+external arrays. The most typical case maybe copying between Tachi
+fields and Numpy arrays. Let's take a look at two examples below.
+
+**Export data in Taichi fields to a NumPy array** via `to_numpy()`. This
+allows us to export computation results to other Python packages that
+support NumPy, e.g. `matplotlib`.
+
+```python {8}
+@ti.kernel
+def my_kernel():
+   for i in x:
+      x[i] = i * 2
+
+x = ti.field(ti.f32, 4)
+my_kernel()
+x_np = x.to_numpy()
+print(x_np)  # np.array([0, 2, 4, 6])
+```
+
+**Import data from NumPy array to Taichi fields** via `from_numpy()`.
+This allows people to initialize Taichi fields via NumPy arrays. E.g.,
+
+```python {3}
+x = ti.field(ti.f32, 4)
+x_np = np.array([1, 7, 3, 5])
+x.from_numpy(x_np)
+print(x[0])  # 1
+print(x[1])  # 7
+print(x[2])  # 3
+print(x[3])  # 5
+```
+
+## API reference
+
+We provide interfaces to copy data between Taichi field and **external
+arrays**. External arrays refers to NumPy arrays or PyTorch tensors.
+
+We suggest common users to start with NumPy arrays.
+
+For details, check [Field in API references](../../api/reference/field.md)
+
+## External array shapes
+
+Shapes of Taichi fields (see [Scalar fields](../../api/scalar_field.md)) and those of corresponding NumPy arrays are closely
+connected via the following rules:
+
+- For scalar fields, **the shape of NumPy array is exactly the same as
+  the Taichi field**:
+
+```python
+field = ti.field(ti.i32, shape=(233, 666))
+field.shape  # (233, 666)
+
+array = field.to_numpy()
+array.shape  # (233, 666)
+
+field.from_numpy(array)  # the input array must be of shape (233, 666)
+```
+
+- For vector fields, if the vector is `n`-D, then **the shape of NumPy
+  array should be** `(*field_shape, vector_n)`:
+
+```python
+field = ti.Vector.field(3, ti.i32, shape=(233, 666))
+field.shape  # (233, 666)
+field.n      # 3
+
+array = field.to_numpy()
+array.shape  # (233, 666, 3)
+
+field.from_numpy(array)  # the input array must be of shape (233, 666, 3)
+```
+
+- For matrix fields, if the matrix is `n*m`, then **the shape of NumPy
+  array should be** `(*field_shape, matrix_n, matrix_m)`:
+
+```python
+field = ti.Matrix.field(3, 4, ti.i32, shape=(233, 666))
+field.shape  # (233, 666)
+field.n      # 3
+field.m      # 4
+
+array = field.to_numpy()
+array.shape  # (233, 666, 3, 4)
+
+field.from_numpy(array)  # the input array must be of shape (233, 666, 3, 4)
+```
+
+## Using external arrays as Taichi kernel arguments
+
+Use the type hint `ti.ext_arr()` for passing external arrays as kernel
+arguments. For example:
+
+```python {12}
+import taichi as ti
+import numpy as np
+
+ti.init()
+
+n = 4
+m = 7
+
+val = ti.field(ti.i32, shape=(n, m))
+
+@ti.kernel
+def test_numpy(arr: ti.ext_arr()):
+  for i in range(n):
+    for j in range(m):
+      arr[i, j] += i + j
+
+a = np.empty(shape=(n, m), dtype=np.int32)
+
+for i in range(n):
+  for j in range(m):
+    a[i, j] = i * j
+
+test_numpy(a)
+
+for i in range(n):
+  for j in range(m):
+    assert a[i, j] == i * j + i + j
+```
+
+:::note
+Struct-for's are not supported on external arrays.
+:::
diff --git a/docs/lang/articles/basic/field.md b/docs/lang/articles/basic/field.md
new file mode 100644
index 0000000000000..3a17e13728a42
--- /dev/null
+++ b/docs/lang/articles/basic/field.md
@@ -0,0 +1,89 @@
+---
+sidebar_position: 3
+---
+
+# Fields
+
+Fields are global variables provided by Taichi. Currently, it can only be defined before launching any Taichi kernel. Fields can be either
+sparse or dense.  An element of a field can be either a scalar or a
+vector/matrix. This term is borrowed from mathematics and physics. If you
+have already known [scalar field](https://en.wikipedia.org/wiki/Scalar_field) (e.g., heat field), vector field (e.g., [gravitational field](https://en.wikipedia.org/wiki/Gravitational_field)) in mathematics and physics, it would be straightforward to understand the fields in Taichi.
+
+:::note
+Matrices can be used as field elements, so you can have fields with each
+element being a matrix.
+:::
+
+## Scalar fields
+
+A simple example might help you understand scalar fields. Assume you have a rectangular wok on the top of a fire. At each point of the wok, there would be a temperature. The surface of the wok forms a heat field. The width and height of the wok are similar to the `shape` of the Taichi scalar field. The temperature (0-D scalar) is like the element of the Taichi scalar field. We could use the following field to represent the
+heat field on the wok:
+
+``` python
+heat_field = taichi.field(dtype=ti.f32, shape=(width_wok, height_wok))
+```
+
+- Every global variable is an N-dimensional field.
+
+  - Global `scalars` are treated as 0-D scalar fields.
+
+- Fields are always accessed by indices
+
+  - E.g. `x[i, j, k]` if `x` is a 3D scalar field.
+  - Even when accessing 0-D field `x`, use `x[None] = 0` instead of `x = 0`. Please **always** use indexing to access entries in fields. A 0-D field looks like `energy = ti.field(dtype=ti.f32, shape=())`.
+- Field values are initially zero.
+
+- Sparse fields are initially inactive.
+
+- See [Scalar fields](../../api/scalar_field.md) for more details.
+
+## Vector fields
+We are all live in a gravitational field which is a vector field. At each position of the 3D space, there is a gravity force vector. The gravitational field could be represent with:
+```python
+gravitational_field = taichi.Vector.field(n = 3,dtype=ti.f32,shape=(x,y,z))
+```
+`x,y,z` are the sizes of each dimension of the 3D space respectively.  `n` is the number of elements of the gravity force vector.
+
+- See [Vector](../../api/vector.md) for more details.
+
+## Matrix fields
+
+Field elements can also be matrices. In continuum mechanics, each
+infinitesimal point in a material exists a strain and a stress tensor. The strain and stress tensor is a 3 by 3 matrix in the 3D space. To represent this tensor field we could use:
+```python
+strain_tensor_field = taichi.Matrix.field(n = 3,m = 3, dtype=ti.f32, shape=(x,y,z))
+```
+
+`x,y,z` are the sizes of each dimension of the 3D material respectively. `n, m` are the dimensions of the strain tensor.
+
+In general case, suppose you have a `128 x 64` field called `A`, and each element contains
+a `3 x 2` matrix. To allocate a `128 x 64` matrix field which has a
+`3 x 2` matrix for each of its entry, use the statement
+`A = ti.Matrix.field(3, 2, dtype=ti.f32, shape=(128, 64))`.
+
+- If you want to get the matrix of grid node `i, j`, please use
+  `mat = A[i, j]`. `mat` is simply a `3 x 2` matrix.
+- To get the element on the first row and second column of that
+  matrix, use `mat[0, 1]` or `A[i, j][0, 1]`.
+- As you may have noticed, there are **two** indexing operators `[]`
+  when you load a matrix element from a global matrix field: the
+  first is for field indexing, the second for matrix indexing.
+- `ti.Vector` is simply an alias of `ti.Matrix`.
+- See [Matrices](../../api/matrix.md) for more on matrices.
+
+### Matrix size
+
+For performance reasons matrix operations will be unrolled during the compile stage, therefore we
+suggest using only small matrices. For example, `2x1`, `3x3`, `4x4`
+matrices are fine, yet `32x6` is probably too big as a matrix size.
+
+:::caution
+Due to the unrolling mechanisms, operating on large matrices (e.g.
+`32x128`) can lead to a very long compilation time and low performance.
+:::
+
+If you have a dimension that is too large (e.g. `64`), it's better to
+declare a field of size `64`. E.g., instead of declaring
+`ti.Matrix.field(64, 32, dtype=ti.f32, shape=(3, 2))`, declare
+`ti.Matrix.field(3, 2, dtype=ti.f32, shape=(64, 32))`. Try to put large
+dimensions to fields instead of matrices.
diff --git a/docs/lang/articles/basic/overview.md b/docs/lang/articles/basic/overview.md
new file mode 100644
index 0000000000000..80bb9d7c9abd1
--- /dev/null
+++ b/docs/lang/articles/basic/overview.md
@@ -0,0 +1,23 @@
+---
+sidebar_position: 0
+---
+
+# Why new programming language
+
+Taichi is a high-performance programming language for computer graphics
+applications. The design goals are
+
+- Productivity
+- Performance
+- Portability
+- Spatially sparse computation
+- Differentiable programming
+- Metaprogramming
+
+## Design decisions
+
+- Decouple computation from data structures
+- Domain-specific compiler optimizations
+- Megakernels
+- Two-scale automatic differentiation
+- Embedding in Python
diff --git a/docs/lang/articles/basic/syntax.md b/docs/lang/articles/basic/syntax.md
new file mode 100644
index 0000000000000..d6767f7537060
--- /dev/null
+++ b/docs/lang/articles/basic/syntax.md
@@ -0,0 +1,277 @@
+---
+sidebar_position: 1
+---
+
+# Kernels and functions
+
+## Taichi-scope vs Python-scope
+
+Code decorated by `@ti.kernel` or `@ti.func` is in the **Taichi-scope**.
+
+They are to be compiled and executed on CPU or GPU devices with high
+parallelization performance, on the cost of less flexibility.
+
+:::note
+For people from CUDA, Taichi-scope = **device** side.
+:::
+
+Code outside `@ti.kernel` or `@ti.func` is in the **Python-scope**.
+
+They are not compiled by the Taichi compiler and have lower performance
+but with a richer type system and better flexibility.
+
+:::note
+For people from CUDA, Python-scope = **host** side.
+:::
+
+## Kernels
+
+A Python function decorated by `@ti.kernel` is a **Taichi kernel**:
+
+```python {1}
+@ti.kernel
+def my_kernel():
+    ...
+
+my_kernel()
+```
+
+Kernels should be called from **Python-scope**.
+
+:::note
+For people from CUDA, Taichi kernels = `__global__` functions.
+:::
+
+### Arguments
+
+Kernels can have at most 8 parameters so that you can pass values from
+Python-scope to Taichi-scope easily.
+
+Kernel arguments must be type-hinted:
+
+```python {2}
+@ti.kernel
+def my_kernel(x: ti.i32, y: ti.f32):
+    print(x + y)
+
+my_kernel(2, 3.3)  # prints: 5.3
+```
+
+:::note
+
+For now, we only support scalars as arguments. Specifying `ti.Matrix` or
+`ti.Vector` as argument is not supported. For example:
+
+```python {2,6}
+@ti.kernel
+def bad_kernel(v: ti.Vector):
+    ...
+
+@ti.kernel
+def good_kernel(vx: ti.f32, vy: ti.f32):
+    v = ti.Vector([vx, vy])
+    ...
+```
+
+:::
+
+### Return value
+
+A kernel may or may not have a **scalar** return value. If it does, the
+type of return value must be hinted:
+
+```python {2}
+@ti.kernel
+def my_kernel() -> ti.f32:
+    return 233.33
+
+print(my_kernel())  # 233.33
+```
+
+The return value will be automatically cast into the hinted type. e.g.,
+
+```python {2-3,5}
+@ti.kernel
+def add_xy() -> ti.i32:  # int32
+    return 233.33
+
+print(my_kernel())  # 233, since return type is ti.i32
+```
+
+:::note
+
+For now, a kernel can only have one scalar return value. Returning
+`ti.Matrix` or `ti.Vector` is not supported. Python-style tuple return
+is not supported either. For example:
+
+```python {3,9}
+@ti.kernel
+def bad_kernel() -> ti.Matrix:
+    return ti.Matrix([[1, 0], [0, 1]])  # Error
+
+@ti.kernel
+def bad_kernel() -> (ti.i32, ti.f32):
+    x = 1
+    y = 0.5
+    return x, y  # Error
+```
+
+:::
+
+### Advanced arguments
+
+We also support **template arguments** (see
+[Template metaprogramming](../advanced/meta.md#template-metaprogramming)) and **external
+array arguments** (see [Interacting with external arrays](./external.md)) in
+Taichi kernels. Use `ti.template()` or `ti.ext_arr()` as their
+type-hints respectively.
+
+:::note
+
+When using differentiable programming, there are a few more constraints
+on kernel structures. See the [**Kernel Simplicity Rule**](../advanced/differentiable_programming.md#kernel-simplicity-rule).
+
+Also, please do not use kernel return values in differentiable
+programming, since the return value will not be tracked by automatic
+differentiation. Instead, store the result into a global variable (e.g.
+`loss[None]`).
+:::
+
+### Functions
+
+A Python function decorated by `@ti.func` is a **Taichi function**:
+
+```python {8,11}
+@ti.func
+def my_func():
+    ...
+
+@ti.kernel
+def my_kernel():
+    ...
+    my_func()  # call functions from Taichi-scope
+    ...
+
+my_kernel()    # call kernels from Python-scope
+```
+
+Taichi functions should be called from **Taichi-scope**.
+
+:::note
+For people from CUDA, Taichi functions = `__device__` functions.
+:::
+
+:::note
+Taichi functions can be nested.
+:::
+
+:::caution
+Currently, all functions are force-inlined. Therefore, no recursion is
+allowed.
+:::
+
+### Arguments and return values
+
+Functions can have multiple arguments and return values. Unlike kernels,
+arguments in functions don't need to be type-hinted:
+
+```python
+@ti.func
+def my_add(x, y):
+    return x + y
+
+
+@ti.kernel
+def my_kernel():
+    ...
+    ret = my_add(2, 3.3)
+    print(ret)  # 5.3
+    ...
+```
+
+Function arguments are passed by value. So changes made inside function
+scope won't affect the outside value in the caller:
+
+```python {3,9,11}
+@ti.func
+def my_func(x):
+    x = x + 1  # won't change the original value of x
+
+
+@ti.kernel
+def my_kernel():
+    ...
+    x = 233
+    my_func(x)
+    print(x)  # 233
+    ...
+```
+
+### Advanced arguments
+
+You may use `ti.template()` as type-hint to force arguments to be passed
+by reference:
+
+```python {3,9,11}
+@ti.func
+def my_func(x: ti.template()):
+    x = x + 1  # will change the original value of x
+
+
+@ti.kernel
+def my_kernel():
+    ...
+    x = 233
+    my_func(x)
+    print(x)  # 234
+    ...
+```
+
+:::note
+
+Unlike kernels, functions **do support vectors or matrices as arguments
+and return values**:
+
+```python {2,6}
+@ti.func
+def sdf(u):  # functions support matrices and vectors as arguments. No type-hints needed.
+    return u.norm() - 1
+
+@ti.kernel
+def render(d_x: ti.f32, d_y: ti.f32):  # kernels do not support vector/matrix arguments yet. We have to use a workaround.
+    d = ti.Vector([d_x, d_y])
+    p = ti.Vector([0.0, 0.0])
+    t = sdf(p)
+    p += d * t
+    ...
+```
+
+:::
+
+:::caution
+
+Functions with multiple `return` statements are not supported for now.
+Use a **local** variable to store the results, so that you end up with
+only one `return` statement:
+
+```python {1,5,7,9,17}
+# Bad function - two return statements
+@ti.func
+def safe_sqrt(x):
+  if x >= 0:
+    return ti.sqrt(x)
+  else:
+    return 0.0
+
+# Good function - single return statement
+@ti.func
+def safe_sqrt(x):
+  ret = 0.0
+  if x >= 0:
+    ret = ti.sqrt(x)
+  else:
+    ret = 0.0
+  return ret
+```
+
+:::
diff --git a/docs/lang/articles/basic/type.md b/docs/lang/articles/basic/type.md
new file mode 100644
index 0000000000000..c63232b48baf8
--- /dev/null
+++ b/docs/lang/articles/basic/type.md
@@ -0,0 +1,186 @@
+---
+sidebar_position: 2
+---
+
+# Type system
+
+Taichi supports common numerical data types. Each type is denoted as a
+character indicating its _category_ and a number of _precision bits_,
+e.g., `i32` and `f64`.
+
+The _category_ can be one of:
+
+- `i` for signed integers, e.g. 233, -666
+- `u` for unsigned integers, e.g. 233, 666
+- `f` for floating point numbers, e.g. 2.33, 1e-4
+
+The _digital number_ can be one of:
+
+- `8`
+- `16`
+- `32`
+- `64`
+
+It represents how many **bits** are used in storing the data. The larger
+the bit number, the higher the precision is.
+
+For example, the two most commonly used types:
+
+- `i32` represents a 32-bit signed integer.
+- `f32` represents a 32-bit floating pointer number.
+
+## Supported types
+
+Currently, supported basic types in Taichi are
+
+- int8 `ti.i8`
+- int16 `ti.i16`
+- int32 `ti.i32`
+- int64 `ti.i64`
+- uint8 `ti.u8`
+- uint16 `ti.u16`
+- uint32 `ti.u32`
+- uint64 `ti.u64`
+- float32 `ti.f32`
+- float64 `ti.f64`
+
+:::note
+
+Supported types on each backend:
+
+| type | CPU/CUDA | OpenGL | Metal | C source |
+| ---- | -------- | ------ | ----- | -------- |
+| i8   | > OK     | > N/A  | > OK  | > OK     |
+| i16  | > OK     | > N/A  | > OK  | > OK     |
+| i32  | > OK     | > OK   | > OK  | > OK     |
+| i64  | > OK     | > EXT  | > N/A | > OK     |
+| u8   | > OK     | > N/A  | > OK  | > OK     |
+| u16  | > OK     | > N/A  | > OK  | > OK     |
+| u32  | > OK     | > N/A  | > OK  | > OK     |
+| u64  | > OK     | > N/A  | > N/A | > OK     |
+| f32  | > OK     | > OK   | > OK  | > OK     |
+| f64  | > OK     | > OK   | > N/A | > OK     |
+
+(OK: supported, EXT: require extension, N/A: not available)
+:::
+
+:::note
+Boolean types are represented using `ti.i32`.
+:::
+
+## Type promotion
+
+Binary operations on different types will give you a promoted type,
+following the C programming language convention, e.g.:
+
+- `i32 + f32 = f32` (integer + float = float)
+- `i32 + i64 = i64` (less-bits + more-bits = more-bits)
+
+Basically it will try to choose the more precise type to contain the
+result value.
+
+## Default precisions
+
+By default, all numerical literals have 32-bit precisions. For example,
+`42` has type `ti.i32` and `3.14` has type `ti.f32`.
+
+Default integer and float-point precisions (`default_ip` and
+`default_fp`) can be specified when initializing Taichi:
+
+```python
+ti.init(default_fp=ti.f32)
+ti.init(default_fp=ti.f64)
+
+ti.init(default_ip=ti.i32)
+ti.init(default_ip=ti.i64)
+```
+
+Also note that you may use `float` or `int` in type definitions as
+aliases for default precisions, e.g.:
+
+```python
+ti.init(default_ip=ti.i64, default_fp=ti.f32)
+
+x = ti.field(float, 5)
+y = ti.field(int, 5)
+# is equivalent to:
+x = ti.field(ti.f32, 5)
+y = ti.field(ti.i64, 5)
+
+def func(a: float) -> int:
+    ...
+
+# is equivalent to:
+def func(a: ti.f32) -> ti.i64:
+    ...
+```
+
+## Type casts
+
+### Implicit casts
+
+:::caution
+The type of a variable is **determinated on it's initialization**.
+:::
+
+When a _low-precision_ variable is assigned to a _high-precision_
+variable, it will be implicitly promoted to the _high-precision_ type
+and no warning will be raised:
+
+```python {3}
+a = 1.7
+a = 1
+print(a)  # 1.0
+```
+
+When a _high-precision_ variable is assigned to a _low-precision_ type,
+it will be implicitly down-cast into the _low-precision_ type and Taichi
+will raise a warning:
+
+```python {3}
+a = 1
+a = 1.7
+print(a)  # 1
+```
+
+### Explicit casts
+
+You may use `ti.cast` to explicitly cast scalar values between different
+types:
+
+```python {2-3}
+a = 1.7
+b = ti.cast(a, ti.i32)  # 1
+c = ti.cast(b, ti.f32)  # 1.0
+```
+
+Equivalently, use `int()` and `float()` to convert values to float-point
+or integer types of default precisions:
+
+```python {2-3}
+a = 1.7
+b = int(a)    # 1
+c = float(a)  # 1.0
+```
+
+### Casting vectors and matrices
+
+Type casts applied to vectors/matrices are element-wise:
+
+```python {2,4}
+u = ti.Vector([2.3, 4.7])
+v = int(u)              # ti.Vector([2, 4])
+# If you are using ti.i32 as default_ip, this is equivalent to:
+v = ti.cast(u, ti.i32)  # ti.Vector([2, 4])
+```
+
+### Bit casting
+
+Use `ti.bit_cast` to bit-cast a value into another data type. The
+underlying bits will be preserved in this cast. The new type must have
+the same width as the the old type. For example, bit-casting `i32` to
+`f64` is not allowed. Use this operation with caution.
+
+:::note
+For people from C++, `ti.bit_cast` is equivalent to `reinterpret_cast`.
+:::
diff --git a/docs/lang/articles/contribution/compilation.md b/docs/lang/articles/contribution/compilation.md
new file mode 100644
index 0000000000000..35c3203c6b5b6
--- /dev/null
+++ b/docs/lang/articles/contribution/compilation.md
@@ -0,0 +1,128 @@
+---
+sidebar_position: 8
+---
+
+# Life of a Taichi kernel
+
+Sometimes it is helpful to understand the life cycle of a Taichi kernel.
+In short, compilation will only happen on the first invocation of an
+instance of a kernel.
+
+The life cycle of a Taichi kernel has the following stages:
+
+- Kernel registration
+- Template instantiation and caching
+- Python AST transforms
+- Taichi IR compilation, optimization, and executable generation
+- Launching
+
+![image](https://raw.githubusercontent.com/taichi-dev/public_files/fa03e63ca4e161318c8aa9a5db7f4a825604df88/taichi/life_of_kernel.png)
+
+Let's consider the following simple kernel:
+
+```python
+@ti.kernel
+def add(field: ti.template(), delta: ti.i32):
+    for i in field:
+        field[i] += delta
+```
+
+We allocate two 1D fields to simplify discussion:
+
+```python
+x = ti.field(dtype=ti.f32, shape=128)
+y = ti.field(dtype=ti.f32, shape=16)
+```
+
+## Kernel registration
+
+When the `ti.kernel` decorator is executed, a kernel named `add` is
+registered. Specifically, the Python Abstract Syntax Tree (AST) of the
+`add` function will be memorized. No compilation will happen until the
+first invocation of `add`.
+
+## Template instantiation and caching
+
+```python
+add(x, 42)
+```
+
+When `add` is called for the first time, the Taichi frontend compiler
+will instantiate the kernel.
+
+When you have a second call with the same **template signature**
+(explained later), e.g.,
+
+```python
+add(x, 1)
+```
+
+Taichi will directly reuse the previously compiled binary.
+
+Arguments hinted with `ti.template()` are template arguments, and will
+incur template instantiation. For example,
+
+```python
+add(y, 42)
+```
+
+will lead to a new instantiation of **add**.
+
+:::note
+**Template signatures** are what distinguish different instantiations of
+a kernel template. The signature of `add(x, 42)` is `(x, ti.i32)`, which
+is the same as that of `add(x, 1)`. Therefore, the latter can reuse the
+previously compiled binary. The signature of `add(y, 42)` is
+`(y, ti.i32)`, a different value from the previous signature, hence a
+new kernel will be instantiated and compiled.
+:::
+
+:::note
+Many basic operations in the Taichi standard library are implemented
+using Taichi kernels using metaprogramming tricks. Invoking them will
+incur **implicit kernel instantiations**.
+
+Examples include `x.to_numpy()` and `y.from_torch(torch_tensor)`. When
+you invoke these functions, you will see kernel instantiations, as
+Taichi kernels will be generated to offload the hard work to multiple
+CPU cores/GPUs.
+
+As mentioned before, the second time you call the same operation, the
+cached compiled kernel will be reused and no further compilation is
+needed.
+:::
+
+## Code transformation and optimizations
+
+When a new instantiation happens, the Taichi frontend compiler (i.e.,
+the `ASTTransformer` Python class) will transform the kernel body AST
+into a Python script, which, when executed, emits a Taichi frontend AST.
+Basically, some patches are applied to the Python AST so that the Taichi
+frontend can recognize it.
+
+The Taichi AST lowering pass translates Taichi frontend IR into
+hierarchical static single assignment (SSA) IR, which allows a series of
+further IR passes to happen, such as
+
+- Loop vectorization
+- Type inference and checking
+- General simplifications such as common subexpression elimination
+  (CSE), dead instruction elimination (DIE), constant folding, and
+  store forwarding
+- Access lowering
+- Data access optimizations
+- Reverse-mode automatic differentiation (if using differentiable
+  programming)
+- Parallelization and offloading
+- Atomic operation demotion
+
+## The just-in-time (JIT) compilation engine
+
+Finally, the optimized SSA IR is fed into backend compilers such as LLVM
+or Apple Metal/OpenGL shader compilers. The backend compilers then
+generate high-performance executable CPU/GPU programs.
+
+## Kernel launching
+
+Taichi kernels will be ultimately launched as multi-threaded CPU tasks
+or GPU kernels.
diff --git a/docs/lang/articles/contribution/contributor_guide.md b/docs/lang/articles/contribution/contributor_guide.md
new file mode 100644
index 0000000000000..7c1adb64cb0fc
--- /dev/null
+++ b/docs/lang/articles/contribution/contributor_guide.md
@@ -0,0 +1,395 @@
+---
+sidebar_position: 2
+---
+
+# Contribution guidelines
+
+First of all, thank you for contributing! We welcome contributions of
+all forms, including but not limited to
+
+- Bug fixes
+- Proposing and implementing new features
+- Documentation improvements and translations
+- Improved error messages that are more user-friendly
+- New test cases
+- New examples
+- Compiler performance enhancements
+- High-quality blog posts and tutorials
+- Participation in the [Taichi forum](https://forum.taichi.graphics/)
+- Introduce Taichi to your friends or simply star [the
+  project on Github](https://github.com/taichi-dev/taichi).
+- Typo fixes in the documentation, code or comments (please go ahead and
+  make a pull request for minor issues like these)
+
+:::tip reminder
+Please take some time to familiarize yourself with this contribution guide before making any changes.
+:::
+
+## How to contribute bug fixes and new features
+
+Issues marked with ["good first
+issue"](https://github.com/taichi-dev/taichi/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)
+are great chances for starters.
+
+- Please first leave a note (e.g. _I know how to fix this and would
+  like to help!_) on the issue, so that people know someone is already
+  working on it. This helps prevent redundant work;
+- If no core developer has commented and described a potential
+  solution on the issue, please briefly describe your plan, and wait
+  for a core developer to reply before you start. This helps keep
+  implementations simple and effective.
+
+Issues marked with ["welcome
+contribution"](https://github.com/taichi-dev/taichi/issues?q=is%3Aopen+is%3Aissue+label%3A%22welcome+contribution%22)
+are slightly more challenging but still friendly to beginners.
+
+## High-level guidelines
+
+- Be pragmatic: practically solving problems is our ultimate goal.
+- No overkills: always use _easy_ solutions to solve easy problems, so
+  that you have time and energy for real hard ones.
+- Almost every design decision has pros and cons. A decision is
+  *good* if its pros outweigh its cons. Always think about
+  both sides.
+- Debugging is hard. Changesets should be small so that sources of
+  bugs can be easily pinpointed.
+- Unit/integration tests are our friends.
+
+:::note
+"There are two ways of constructing a software design: One way is to
+make it so simple that there are obviously no deficiencies, and the
+other way is to make it so complicated that there are no obvious
+deficiencies. _The first method is far more difficult_."
+— [C.A.R. Hoare](https://en.wikipedia.org/wiki/Tony_Hoare)
+:::
+
+One thing to keep in mind is that, Taichi was originally born as an
+academic research project. This usually means that some parts did not
+have the luxury to go through a solid design. While we are always trying
+to improve the code quality, it doesn't mean that the project is free
+from technical debts. Some places may be confusing or overly
+complicated. Whenever you spot one, you are more than welcome to shoot
+us a PR! :-)
+
+## Effective communication
+
+A few tips for effective communication in the Taichi community:
+
+- How much information one effectively conveys, is way more important
+  than how many words one typed.
+- Be constructive. Be polite. Be organized. Be concise.
+- Bulleted lists are our friends.
+- Proofread before you post: if you are the reader, can you understand
+  what you typed?
+- If you are not a native speaker, consider using a spell checker such
+  as [Grammarly](https://app.grammarly.com/).
+
+Please base your discussion and feedback on facts, and not personal
+feelings. It is very important for all of us to maintain a friendly and
+blame-free community. Some examples:
+
+:::tip Acceptable :-)
+This design could be confusing to new Taichi users.
+:::
+
+:::danger Not Acceptable
+This design is terrible.
+:::
+
+## Making good pull requests
+
+- PRs with **small** changesets are preferred. A PR should ideally
+  address **only one issue**.
+  - It is fine to include off-topic **trivial** refactoring such as
+    typo fixes;
+  - The reviewers reserve the right to ask PR authors to remove
+    off-topic **non-trivial** changes.
+- All commits in a PR will always be **squashed and merged into master
+  as a single commit**.
+- PR authors **should not squash commits on their own**;
+- When implementing a complex feature, consider breaking it down into
+  small PRs, to keep a more detailed development history and to
+  interact with core developers more frequently.
+- If you want early feedback from core developers
+  - Open a PR in
+    [Draft](https://github.blog/2019-02-14-introducing-draft-pull-requests/)
+    state on GitHub so that you can share your progress;
+  - Make sure you @ the corresponding developer in the comments or
+    request the review.
+- If you are making multiple PRs
+  - Independent PRs should be based on **different** branches
+    forking from `master`;
+  - PRs with dependencies should be raised only after all
+    prerequisite PRs are merged into `master`.
+- All PRs should ideally come with corresponding **tests**;
+- All PRs should come with **documentation updates**, except for
+  internal compiler implementations;
+- All PRs must pass **continuous integration tests** before they get
+  merged;
+- PR titles should follow [PR tag rules](./contributor_guide#pr-title-format-and-tags);
+- A great article from Google on [how to have your PR merged
+  quickly](https://testing.googleblog.com/2017/06/code-health-too-many-comments-on-your.html).
+  [\[PDF\]](https://github.com/yuanming-hu/public_files/blob/master/graphics/taichi/google_review_comments.pdf)
+
+## Reviewing & PR merging
+
+- Please try to follow these tips from Google
+  - [Code Health: Understanding Code In
+    Review](https://testing.googleblog.com/2018/05/code-health-understanding-code-in-review.html);
+    [\[PDF\]](https://github.com/yuanming-hu/public_files/blob/master/graphics/taichi/google_understanding_code.pdf)
+  - [Code Health: Respectful Reviews == Useful
+    Reviews](https://testing.googleblog.com/2019/11/code-health-respectful-reviews-useful.html).
+    [\[PDF\]](https://github.com/yuanming-hu/public_files/blob/master/graphics/taichi/google_respectful_reviews.pdf)
+- The merger should always **squash and merge** PRs into the master
+  branch;
+- The master branch is required to have a **linear history**;
+- Make sure the PR passes **continuous integration tests**, except for
+  cases like documentation updates;
+- Make sure the title follows [PR tag rules](./contributor_guide#pr-title-format-and-tags).
+
+## Using continuous integration
+
+- Continuous Integration (CI) will **build** and **test** your
+  commits in a PR in multiple environments.
+- Currently, Taichi uses [Github Actions](https://github.com/features/actions)
+  (for OS X and Linux) and [AppVeyor](https://www.appveyor.com) (for Windows).
+- CI will be triggered every time you push commits to an open PR.
+- You can prepend `[skip ci]` to your commit message to avoid
+  triggering CI. e.g. `[skip ci] This commit will not trigger CI`
+- A tick on the right of commit hash means CI passed, a cross means CI
+  failed.
+
+## Enforcing code style
+
+- Locally, you can run `ti format` in the command line to re-format
+  code style. Note that you have to install `clang-format-6.0` and
+  `yapf v0.29.0` locally before you use `ti format`.
+
+- If you don't have these formatting tools locally, feel free to
+  leverage GitHub actions: simply comment `\format` in a PR
+  (e.g., [#2481](https://github.com/taichi-dev/taichi/pull/2481#issuecomment-872226701))
+  and then [Taichi Gardener](https://github.com/taichi-gardener)
+  will automatically format the code for you.
+
+## PR title format and tags
+
+PR titles will be part of the commit history reflected in the `master`
+branch, therefore it is important to keep PR titles readable.
+
+- Please always prepend **at least one tag** such as `[Lang]` to PR
+  titles:
+  - When using multiple tags, make sure there is exactly one
+    space between tags;
+  - E.g., "[Lang][refactor]" (no space) should be replaced
+    by "[Lang] [refactor]";
+- The first letter of the PR title body should be capitalized:
+  - E.g., `[Doc] improve documentation` should be replaced by
+    `[Doc] Improve documentation`;
+  - `[Lang] "ti.sqr(x)" is now deprecated` is fine because `"`
+    is a symbol.
+- Please do not include back quotes ("`") in PR titles.
+- For example, "[Metal] Support bitmasked SNode", "[Vulkan]
+  ti.atomic_min/max support", or "[Opt] [ir] Enhanced intra-function optimizations".
+
+Frequently used tags:
+
+- `[CPU]`, `[CUDA]`, `[Metal]`, `[Vulkan]`, `[OpenGL]`: backends;
+- `[LLVM]`: the LLVM backend shared by CPUs and CUDA;
+- `[Lang]`: frontend language features, including syntax sugars;
+- `[Std]`: standard library, e.g. `ti.Matrix` and `ti.Vector`;
+- `[Sparse]`: sparse computation;
+- `[IR]`: intermediate representation;
+- `[Opt]`: IR optimization passes;
+- `[GUI]`: the built-in GUI system;
+- `[Refactor]`: code refactoring;
+- `[CLI]`: commandline interfaces, e.g. the `ti` command;
+- `[Doc]`: documentation under `docs/`;
+- `[Example]`: examples under `examples/`;
+- `[Test]`: adding or improving tests under `tests/`;
+- `[Linux]`: Linux platform;
+- `[Mac]`: macOS platform;
+- `[Windows]`: Windows platform;
+- `[Perf]`: performance improvements;
+- `[CI]`: CI/CD workflow;
+- `[Misc]`: something that doesn't belong to any category, such as
+  version bump, reformatting;
+- `[Bug]`: bug fixes;
+- Check out more tags in
+  [misc/prtags.json](https://github.com/taichi-dev/taichi/blob/master/misc/prtags.json).
+- When introducing a new tag, please update the list in
+  `misc/prtags.json` in the first PR with that tag, so that people can
+  follow.
+
+:::note
+
+We do appreciate all kinds of contributions, yet we should not expose
+the title of every PR to end-users. Therefore the changelog will
+distinguish *what the user should know* from *what the
+developers are doing*. This is done by **capitalizing PR
+tags**:
+
+- PRs with visible/notable features to the users should be marked
+  with tags starting with **the first letter capitalized**, e.g.
+  `[Metal]`, `[Vulkan]`, `[IR]`, `[Lang]`, `[CLI]`. When releasing a new
+  version, a script (`python/taichi/make_changelog.py`) will
+  generate a changelog with these changes (PR title) highlighted.
+  Therefore it is **important** to make sure the end-users can
+  understand what your PR does, **based on your PR title**.
+- Other PRs (underlying development/intermediate implementation)
+  should use tags with **everything in lowercase letters**: e.g.
+  `[metal]`, `[vulkan]`, `[ir]`, `[lang]`, `[cli]`.
+- Because of the way the release changelog is generated, there
+  should be **at most one capitalized tag** in a PR title to prevent
+  duplicate PR highlights. For example,
+  `[GUI] [Mac] Support modifier keys` ([#1189](https://github.com/taichi-dev/taichi/pull/1189)) is a bad example, we
+  should have used `[gui] [Mac] Support modifier keys in GUI` instead.
+  Please capitalize the tag that is the *most* relevant to the PR.
+:::
+
+## C++ and Python standards
+
+The C++ part of Taichi is written in C++17, and the Python part in 3.6+.
+You can assume that C++17 and Python 3.6 features are always available.
+
+## Tips on the Taichi compiler development
+
+[Life of a Taichi kernel](./compilation.md) may worth checking out. It
+explains the whole compilation process.
+
+See also [Benchmarking and regression tests](./utilities.md#benchmarking-and-regression-tests) if your work involves
+IR optimization.
+
+When creating a Taichi program using
+`ti.init(arch=desired_arch, **kwargs)`, pass in the following parameters
+to make the Taichi compiler print out IR:
+
+- `print_preprocessed=True`: print results of the frontend Python
+  AST transform. The resulting scripts will generate a Taichi Frontend
+  AST when executed.
+- `print_ir=True`: print the Taichi IR transformation process of
+  kernel (excluding accessors) compilation.
+- `print_accessor_ir=True`: print the IR transformation process of
+  data accessors, which are special and simple kernels. (This is
+  rarely used, unless you are debugging the compilation of data
+  accessors.)
+- `print_struct_llvm_ir=True`: save the emitted LLVM IR by Taichi
+  struct compilers.
+- `print_kernel_llvm_ir=True`: save the emitted LLVM IR by Taichi
+  kernel compilers.
+- `print_kernel_llvm_ir_optimized=True`: save the optimized LLVM IR
+  of each kernel.
+- `print_kernel_nvptx=True`: save the emitted NVPTX of each kernel
+  (CUDA only).
+
+:::note
+Data accessors in Python-scope are implemented as special Taichi
+kernels. For example, `x[1, 2, 3] = 3` will call the writing accessor
+kernel of `x`, and `print(y[42])` will call the reading accessor kernel
+of `y`.
+:::
+
+## Folder structure
+
+Key folders are:
+
+_(the following chart can be generated by [`tree . -L 2`](https://linux.die.net/man/1/tree))_
+
+```
+.
+├── benchmarks              # Performance benchmarks
+├── docs                    # Documentation
+├── examples                # Examples
+├── external                # External libraries
+├── misc                    # Random yet useful files
+├── python                  # Python frontend implementation
+│   ├── core                # Loading & interacting with Taichi core
+│   ├── lang                # Python-embbed Taichi language & syntax (major)
+│   ├── snode               # Structure nodes
+│   ├── tools               # Handy end-user tools
+│   └── misc                # Miscellaneous utilities
+├── taichi                  # The core compiler implementation
+│   ├── analysis            # Static analysis passes
+│   ├── backends            # Device-dependent code generators/runtime environments
+│   ├── codegen             # Code generation base classes
+│   ├── common              # Common headers
+│   ├── gui                 # GUI system
+│   ├── inc                 # Small definition files to be included repeatedly
+│   ├── ir                  # Intermediate representation
+│   ├── jit                 # Just-In-Time compilation base classes
+│   ├── llvm                # LLVM utilities
+│   ├── math                # Math utilities
+│   ├── platform            # Platform supports
+│   ├── program             # Top-level constructs
+│   ├── python              # C++/Python interfaces
+│   ├── runtime             # LLVM runtime environments
+│   ├── struct              # Struct compiler base classes
+│   ├── system              # OS-related infrastructure
+│   ├── transforms          # IR transform passes
+│   └── util                # Miscellaneous utilities
+└── tests                   # Functional tests
+    ├── cpp                 # Python tests (major)
+    └── python              # C++ tests
+```
+
+## Testing
+
+Tests should be added to `tests/`.
+
+### Command-line tools
+
+- Use `ti test` to run all the tests.
+- Use `ti test -v` for verbose outputs.
+- Use `ti test -C` to run tests and record code coverage, see
+  [Code coverage](./utilities.md#coverage) for more information.
+- Use `ti test -a <arch(s)>` for testing against specified backend(s).
+  e.g. `ti test -a cuda,metal`.
+- Use `ti test -na <arch(s)>` for testing all architectures excluding
+  some of them. e.g. `ti test -na opengl,x64`.
+- Use `ti test <filename(s)>` to run specific tests in filenames. e.g.
+  `ti test numpy_io` will run all tests in
+  `tests/python/test_numpy_io.py`.
+- Use `ti test -c` to run only the C++ tests. e.g.
+  `ti test -c alg_simp` will run `tests/cpp/test_alg_simp.cpp`.
+- Use `ti test -k <key>` to run tests that match the specified key.
+  e.g. `ti test linalg -k "cross or diag"` will run the `test_cross`
+  and `test_diag` in `tests/python/test_linalg.py`.
+
+For more options, see `ti test -h`.
+
+For more details on how to write a test case, see
+[Workflow for writing a Python test](./write_test.md).
+
+## Documentation
+
+Documentation source files are under the `docs/` folder of [**the main Taichi repo**](https://github.com/taichi-dev/taichi).
+An automatic service syncs the updated content with our [documentation repo](https://github.com/taichi-dev/docs.taichi.graphics) and deploys the documentation at [the Taichi documentation site](https://docs.taichi.graphics).
+
+We use [Markdown](https://www.markdownguide.org/getting-started/) (.md) to write documentation.
+Please see [the documentation writing guide](./doc_writing) for more tips.
+
+To set up a local server and preview your documentation edits in real time,
+see instructions for [Local Development](https://github.com/taichi-dev/docs.taichi.graphics#local-development).
+
+## Efficient code navigation across Python/C++
+
+If you work on the language frontend (Python/C++ interface), to navigate
+around the code base,
+[ffi-navigator](https://github.com/tqchen/ffi-navigator) allows you to
+jump from Python bindings to their definitions in C++, please follow their
+README to set up your editor.
+
+## Upgrading CUDA
+
+Right now we are targeting CUDA 10. Since we use run-time loaded
+[CUDA driver APIs](https://docs.nvidia.com/cuda/cuda-driver-api/index.html)
+which are relatively stable across CUDA versions, a compiled Taichi binary
+should work for all CUDA versions >= 10. When upgrading CUDA version, the
+file `external/cuda_libdevice/slim_libdevice.10.bc` should also be
+replaced with a newer version.
+
+To generate the slimmed version of libdevice based on a full
+`libdevice.X.bc` file from a CUDA installation, use:
+
+```bash
+ti task make_slim_libdevice [libdevice.X.bc file]
+```
diff --git a/docs/lang/articles/contribution/cpp_style.md b/docs/lang/articles/contribution/cpp_style.md
new file mode 100644
index 0000000000000..1f8bc5140909a
--- /dev/null
+++ b/docs/lang/articles/contribution/cpp_style.md
@@ -0,0 +1,57 @@
+---
+sidebar_position: 4
+---
+
+# C++ style
+
+We generally follow [Google C++ Style
+Guide](https://google.github.io/styleguide/cppguide.html).
+
+## Naming
+
+- Variable names should consist of lowercase words connected by
+  underscores, e.g. `llvm_context`.
+
+- Class and struct names should consist of words with first letters
+  capitalized, e.g. `CodegenLLVM`.
+
+- Macros should be capital start with `TI`, such as `TI_INFO`,
+  `TI_IMPLEMENTATION`.
+
+  - We do not encourage the use of macro, although there are cases
+    where macros are inevitable.
+
+- Filenames should consist of lowercase words connected by
+  underscores, e.g. `ir_printer.cpp`.
+
+## Dos
+
+- Use `auto` for local variables when appropriate.
+- Mark `override` and `const` when necessary.
+
+## Don'ts
+
+- C language legacies:
+
+  - `printf` (Use `fmtlib::print` instead).
+  - `new` and `free`. (Use smart pointers
+    `std::unique_ptr, std::shared_ptr` instead for ownership
+    management).
+  - `#include <math.h>` (Use `#include <cmath>` instead).
+
+- Exceptions (We are on our way to **remove** all C++ exception usages
+  in Taichi).
+
+- Prefix member functions with `m_` or `_`.
+
+- Virtual function call in constructors/destructors.
+
+- `NULL` (Use `nullptr` instead).
+
+- `using namespace std;` in the global scope.
+
+- `typedef` (Use `using` instead).
+
+## Automatic code formatting
+
+- Please run `ti format`
diff --git a/docs/lang/articles/contribution/dev_install.md b/docs/lang/articles/contribution/dev_install.md
new file mode 100644
index 0000000000000..7e41ee54fe599
--- /dev/null
+++ b/docs/lang/articles/contribution/dev_install.md
@@ -0,0 +1,445 @@
+---
+sidebar_position: 3
+---
+
+# Developer installation
+This section documents how to configure the Taichi devolopment environment and build Taichi from source for the compiler developers. The installation instructions are highly varied between different operationg systems. We also provide a Dockerfile which may help setup a containerized Taichi development environment with CUDA support based on the Ubuntu base docker image.
+
+[Developer installation for Linux](#linux)
+
+[Developer installation for macOS](#macos)
+
+[Developer installation for Windows](#windows)
+
+[Developer installation for Docker](#docker)
+
+:::note
+End users should use the pip packages instead of building from source.
+:::
+
+## Linux
+### Installing Dependencies
+1. Make sure you are using Python 3.6/3.7/3.8
+
+- Install Python dependencies:
+
+  ```bash
+  python3 -m pip install --user setuptools astor pybind11 pylint sourceinspect
+  python3 -m pip install --user pytest pytest-rerunfailures pytest-xdist yapf
+  python3 -m pip install --user numpy GitPython coverage colorama autograd
+  ```
+
+
+
+2.  Make sure you have `clang` with version \>= 7:
+
+  - On Ubuntu, execute `sudo apt install libtinfo-dev clang-8`.
+  - On Arch Linux, execute `sudo pacman -S clang`. (This is
+    `clang-10`).
+  - On other Linux distributions, please search [this
+    site](https://pkgs.org) for clang version \>= 7.
+
+:::note
+Note that on Linux, `clang` is the **only** supported compiler for
+compiling the Taichi compiler.
+:::
+
+3. Make sure you have LLVM 10.0.0. Note that Taichi uses a **customized
+  LLVM** so the pre-built binaries from the LLVM official website or
+  other sources probably won't work. Here we provide LLVM binaries
+  customized for Taichi, which may or may not work depending on your
+  system environment:
+  - [LLVM 10.0.0 for
+    Linux](https://github.com/taichi-dev/taichi_assets/releases/download/llvm10/taichi-llvm-10.0.0-linux.zip)
+
+- If the downloaded LLVM does not work, please build from source:
+
+    ```bash
+    wget https://github.com/llvm/llvm-project/releases/download/llvmorg-10.0.0/llvm-10.0.0.src.tar.xz
+    tar xvJf llvm-10.0.0.src.tar.xz
+    cd llvm-10.0.0.src
+    mkdir build
+    cd build
+    cmake .. -DLLVM_ENABLE_RTTI:BOOL=ON -DBUILD_SHARED_LIBS:BOOL=OFF -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_TERMINFO=OFF
+    # If you are building on NVIDIA Jetson TX2, use -DLLVM_TARGETS_TO_BUILD="ARM;NVPTX"
+    # If you are building Taichi for a PyPI release, add -DLLVM_ENABLE_Z3_SOLVER=OFF to reduce the library dependency.
+
+    make -j 8
+    sudo make install
+
+    # Check your LLVM installation
+    llvm-config --version  # You should get 10.0.0
+    ```
+### Setting up CUDA (optional)
+:::note
+To build with NVIDIA GPU support, CUDA 10.0+ is needed. This
+installation guide works for Ubuntu 16.04+.
+:::
+
+If you don't have CUDA, go to [this
+website](https://developer.nvidia.com/cuda-downloads) and download the
+installer.
+
+- To check if CUDA is installed, run `nvcc --version` or
+  `cat /usr/local/cuda/version.txt`.
+- On **Ubuntu** we recommend choosing `deb (local)` as **Installer
+  Type**.
+- On **Arch Linux**, you can easily install CUDA via `pacman -S cuda`
+  without downloading the installer manually.
+
+### Setting up Taichi for development
+
+1. Set up environment variables for Taichi:
+
+  - Please add the following script to your rc file
+    (`~/.bashrc`, `~/.zshrc` or etc. , same for other occurrences in
+    this documentation):
+
+    ```bash
+    export TAICHI_REPO_DIR=/path/to/taichi  # Path to your taichi repository
+    export PYTHONPATH=$TAICHI_REPO_DIR/python:$PYTHONPATH
+    export PATH=$TAICHI_REPO_DIR/bin:$PATH
+    # export CXX=/path/to/clang  # Uncomment if you encounter issue about compiler in the next step.
+    # export PATH=/opt/llvm/bin:$PATH  # Uncomment if your llvm or clang is installed in /opt
+    ```
+
+  - To reload shell config, please execute
+
+    ```bash
+    source ~/.bashrc
+    ```
+
+    :::note
+    If you're using fish, use `set -x NAME VALUES`, otherwise it
+    won't be loaded by child processes.
+    :::
+
+2. Clone the Taichi repo **recursively**, and build:
+
+  ```bash
+  git clone https://github.com/taichi-dev/taichi --depth=1 --branch=master
+  cd taichi
+  git submodule update --init --recursive --depth=1
+  mkdir build
+  cd build
+  cmake ..
+  # On Linux, if you do not set clang as the default compiler
+  # use the line below:
+  #   cmake .. -DCMAKE_CXX_COMPILER=clang
+  #
+  # Alternatively, if you would like to set clang as the default compiler
+  # On Unix CMake honors environment variables $CC and $CXX upon deciding which C and C++ compilers to use
+  make -j 8
+  ```
+
+3. Check out `examples` for runnable examples. Run them with commands
+  like `python3 examples/mpm128.py`.
+
+4. Execute `python3 -m taichi test` to run all the tests. It may take
+  up to 5 minutes to run all tests.
+
+## macOS
+### Installing Dependencies
+1. Make sure you are using Python 3.6/3.7/3.8
+
+- Install Python dependencies:
+
+  ```bash
+  python3 -m pip install --user setuptools astor pybind11 pylint sourceinspect
+  python3 -m pip install --user pytest pytest-rerunfailures pytest-xdist yapf
+  python3 -m pip install --user numpy GitPython coverage colorama autograd
+  ```
+
+2. Make sure you have LLVM 10.0.0. Note that Taichi uses a **customized
+  LLVM** so the pre-built binaries from the LLVM official website or
+  other sources probably won't work. Here we provide LLVM binaries
+  customized for Taichi, which may or may not work depending on your
+  system environment:
+  - [LLVM 10.0.0 for macOS](https://github.com/taichi-dev/taichi_assets/releases/download/llvm10/taichi-llvm-10.0.0-macos.zip)
+
+- If the downloaded LLVM does not work, please build from source:
+    ```bash
+    wget https://github.com/llvm/llvm-project/releases/download/llvmorg-10.0.0/llvm-10.0.0.src.tar.xz
+    tar xvJf llvm-10.0.0.src.tar.xz
+    cd llvm-10.0.0.src
+    mkdir build
+    cd build
+    cmake .. -DLLVM_ENABLE_RTTI:BOOL=ON -DBUILD_SHARED_LIBS:BOOL=OFF -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_TERMINFO=OFF
+    # If you are building on NVIDIA Jetson TX2, use -DLLVM_TARGETS_TO_BUILD="ARM;NVPTX"
+
+    make -j 8
+    sudo make install
+
+    # Check your LLVM installation
+    llvm-config --version  # You should get 10.0.0
+    ```
+### Setting up Taichi for development
+
+1. Set up environment variables for Taichi:
+
+  - Please add the following script to your rc file
+    (`~/.bashrc`, `~/.zshrc` or etc. , same for other occurrences in
+    this guide):
+
+    ```bash
+    export TAICHI_REPO_DIR=/path/to/taichi  # Path to your taichi repository
+    export PYTHONPATH=$TAICHI_REPO_DIR/python:$PYTHONPATH
+    export PATH=$TAICHI_REPO_DIR/bin:$PATH
+    # export CXX=/path/to/clang  # Uncomment if you encounter issue about compiler in the next step.
+    # export PATH=/opt/llvm/bin:$PATH  # Uncomment if your llvm or clang is installed in /opt
+    ```
+
+  - To reload shell config, please execute
+
+    ```bash
+    source ~/.bashrc
+    ```
+
+    :::note
+    If you're using fish, use `set -x NAME VALUES`, otherwise it
+    won't be loaded by child processes.
+    :::
+
+2. Clone the taichi repo **recursively**, and build:
+
+  ```bash
+  git clone https://github.com/taichi-dev/taichi --depth=1 --branch=master
+  cd taichi
+  git submodule update --init --recursive --depth=1
+  mkdir build
+  cd build
+  cmake ..
+  # On macOS, if you do not set clang as the default compiler
+  # use the line below:
+  #   cmake .. -DCMAKE_CXX_COMPILER=clang
+  #
+  # Alternatively, if you would like to set clang as the default compiler
+  # On Unix CMake honors environment variables $CC and $CXX upon deciding which C and C++ compilers to use
+  make -j 8
+  ```
+
+3. Check out `examples` for runnable examples. Run them with commands
+  like `python3 examples/mpm128.py`.
+
+4. Execute `python3 -m taichi test` to run all the tests. It may take
+  up to 5 minutes to run all tests.
+
+## Windows
+### Setting up Taichi for development
+For precise build instructions on Windows, please check out
+[appveyor.yml](https://github.com/taichi-dev/taichi/blob/master/appveyor.yml),
+which does basically the same thing as the following instructions. We
+use MSBUILD.exe to build the generated project. Please note that Windows
+could have multiple instances of MSBUILD.exe shipped with different
+products. Please make sure you add the path for MSBUILD.exe within your
+MSVS directory and make it a higher priority (for instance than the one
+shipped with .NET).
+
+:::note
+On Windows, MSVC is the only supported compiler.
+:::
+
+- On Windows, please add these variables by accessing your system
+    settings:
+
+    1.  Add `TAICHI_REPO_DIR` whose value is the path to your taichi
+        repository so that Taichi knows you're a developer.
+    2.  Add or append `PYTHONPATH` with `%TAICHI_REPO_DIR%/python`
+        so that Python imports Taichi from the local repo.
+    3.  Add or append `PATH` with `%TAICHI_REPO_DIR%/bin` so that
+        you can use `ti` command.
+    4.  Add or append `PATH` with path to LLVM binary directory
+        installed in previous section.
+
+### Installing Dependencies
+1. Make sure you are using Python 3.6/3.7/3.8
+
+- Install Python dependencies:
+
+  ```bash
+  python3 -m pip install --user setuptools astor pybind11 pylint sourceinspect
+  python3 -m pip install --user pytest pytest-rerunfailures pytest-xdist yapf
+  python3 -m pip install --user numpy GitPython coverage colorama autograd
+  ```
+2. Make sure you have `clang` with version \>= 7:
+Download [clang-10](https://github.com/taichi-dev/taichi_assets/releases/download/llvm10/clang-10.0.0-win.zip). Make sure you add the `bin` folder containing `clang.exe` to the `PATH` environment variable.
+
+3. Make sure you have LLVM 10.0.0. Note that Taichi uses a **customized
+  LLVM** so the pre-built binaries from the LLVM official website or
+  other sources probably won't work. Here we provide LLVM binaries
+  customized for Taichi, which may or may not work depending on your
+  system environment:
+  - [LLVM 10.0.0 for Windows MSVC
+    2019](https://github.com/taichi-dev/taichi_assets/releases/download/llvm10/taichi-llvm-10.0.0-msvc2019.zip)
+
+:::note
+On Windows, if you use the pre-built LLVM for Taichi, please add
+`$LLVM_FOLDER/bin` to `PATH`. Later, when you build Taichi using
+`CMake`, set `LLVM_DIR` to `$LLVM_FOLDER/lib/cmake/llvm`.
+:::
+- If the downloaded LLVM does not work, please build from source:
+  ```bash
+    # LLVM 10.0.0 + MSVC 2019
+    cmake .. -G"Visual Studio 16 2019" -A x64 -DLLVM_ENABLE_RTTI:BOOL=ON -DBUILD_SHARED_LIBS:BOOL=OFF -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_ENABLE_ASSERTIONS=ON -Thost=x64 -DLLVM_BUILD_TESTS:BOOL=OFF -DCMAKE_INSTALL_PREFIX=installed
+    ```
+
+    - Then open `LLVM.sln` and use Visual Studio 2017+ to build.
+    - Please make sure you are using the `Release` configuration.
+      After building the `INSTALL` project (under folder
+      `CMakePredefinedTargets` in the Solution Explorer window).
+    - If you use MSVC 2019, **make sure you use C++17** for the
+      `INSTALL` project.
+    - After the build is complete, find your LLVM binaries and
+      headers in `build/installed`.
+
+    Please add `build/installed/bin` to `PATH`. Later, when you
+    build Taichi using `CMake`, set `LLVM_DIR` to
+    `build/installed/lib/cmake/llvm`.
+
+### Setting up CUDA (optional)
+If you don't have CUDA, go to [this
+website](https://developer.nvidia.com/cuda-downloads) and download the
+installer.
+
+- To check if CUDA is installed, run `nvcc --version` or
+  `cat /usr/local/cuda/version.txt`.
+
+
+## Docker
+
+For those who prefer to use Docker, we also provide a Dockerfile which
+helps setup the Taichi development environment with CUDA support based
+on Ubuntu docker image.
+
+:::note
+In order to follow the instructions in this section, please make sure
+you have the [Docker DeskTop (or Engine for
+Linux)](https://www.docker.com/products/docker-desktop) installed and
+set up properly.
+:::
+
+### Build the Docker Image
+
+From within the root directory of the taichi Git repository, execute
+`docker build -t taichi:latest .` to build a Docker image based off the
+local master branch tagged with _latest_. Since this builds the image
+from source, please expect up to 40 mins build time if you don't have
+cached Docker image layers.
+
+:::note
+
+In order to save the time on building Docker images, you could always
+visit our [Docker Hub
+repository](https://hub.docker.com/r/taichidev/taichi) and pull the
+versions of pre-built images you would like to use. Currently the builds
+are triggered per taichi Github release.
+
+For example, to pull a image built from release v0.6.17, run
+`docker pull taichidev/taichi:v0.6.17`
+:::
+
+:::caution
+
+The nature of Docker container determines that no changes to the file
+system on the container could be preserved once you exit from the
+container. If you want to use Docker as a persistent development
+environment, we recommend you [mount the taichi Git repository to the
+container as a volume](https://docs.docker.com/storage/volumes/) and set
+the Python path to the mounted directory.
+:::
+
+### Use Docker Image on macOS (cpu only)
+
+1.  Make sure `XQuartz` and `socat` are installed:
+
+```bash
+brew cask install xquartz
+brew install socat
+```
+
+2.  Temporally disable the xhost access-control: `xhost +`
+3.  Start the Docker container with
+    `docker run -it -e DISPLAY=$(ipconfig getifaddr en0):0 taichidev/taichi:v0.6.17`
+4.  Do whatever you want within the container, e.g. you could run tests
+    or an example, try: `ti test` or `ti example mpm88`
+5.  Exit from the container with `exit` or `ctrl+D`
+6.  \[To keep your xhost safe\] Re-enable the xhost access-control:
+    `xhost -`
+
+### Use Docker Image on Ubuntu (with CUDA support)
+
+1.  Make sure your host machine has CUDA properly installed and
+    configured. Usually you could verify it by running `nvidia-smi`
+2.  Make sure [NVIDIA Container
+    Toolkit](https://github.com/NVIDIA/nvidia-docker) is properly
+    installed:
+
+```bash
+distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
+curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+
+sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
+sudo systemctl restart docker
+```
+
+3.  Make sure `xorg` is installed: `sudo apt-get install xorg`
+4.  Temporally disable the xhost access-control: `xhost +`
+5.  Start the Docker container with
+    `sudo docker run -it --gpus all -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix taichidev/taichi:v0.6.17`
+6.  Do whatever you want within the container, e.g. you could run tests
+    or an example, try: `ti test` or `ti example mpm88`
+7.  Exit from the container with `exit` or `ctrl+D`
+8.  **[To keep your xhost safe]** Re-enable the xhost access-control:
+    `xhost -`
+
+
+## Troubleshooting Developer Installation
+
+- If `make` fails to compile and reports
+  `fatal error: 'spdlog/XXX.h' file not found`, please try runing
+  `git submodule update --init --recursive --depth=1`.
+
+- If importing Taichi causes
+
+  ```
+  FileNotFoundError: [Errno 2] No such file or directory: '/root/taichi/python/taichi/core/../lib/taichi_core.so' -> '/root/taichi/python/taichi/core/../lib/libtaichi_core.so'``
+  ```
+
+  Please try adding `TAICHI_REPO_DIR` to environment variables, see
+  [dev_env_settings](/docs/lang/articles/misc/global_settings).
+
+- If the build succeeded but running any Taichi code results in errors
+  like
+
+  ```
+  Bitcode file (/tmp/taichi-tero94pl/runtime//runtime_x64.bc) not found
+  ```
+
+  please double check `clang` is in your `PATH`:
+
+  ```bash
+  clang --version
+  # version should be >= 7
+  ```
+
+  and our **Taichi configured** `llvm-as`:
+
+  ```bash
+  llvm-as --version
+  # version should be >= 8
+  which llvm-as
+  # should be /usr/local/bin/llvm-as or /opt/XXX/bin/llvm-as, which is our configured installation
+  ```
+
+  If not, please install `clang` and **build LLVM from source** with
+  instructions above in [dev_install](#installing-dependencies-1),
+  then add their path to environment variable `PATH`.
+
+- If you encounter other issues, feel free to report (please include the details) by [opening an
+  issue on
+  GitHub](https://github.com/taichi-dev/taichi/issues/new?labels=potential+bug&template=bug_report.md).
+  We are willing to help!
+
+- See also [Installation Troubleshooting](../misc/install.md) for issues
+  that may share with end-user installation.
diff --git a/docs/lang/articles/contribution/doc_writing.md b/docs/lang/articles/contribution/doc_writing.md
new file mode 100644
index 0000000000000..1f68a8666fd4c
--- /dev/null
+++ b/docs/lang/articles/contribution/doc_writing.md
@@ -0,0 +1,268 @@
+---
+sidebar_position: 6
+---
+
+# Documentation writing guide
+
+Thank you for your contribution! This article briefly introduces syntax that will help you write documentation on this website. Note, the documentation is written in an extended version of [Markdown](https://daringfireball.net/projects/markdown/syntax), so for most of the time, you don't need special syntax besides the basic markdown syntax.
+
+## 1. Insert code blocks
+
+This site supports inserting code blocks with highlighted lines, for examples, the following:
+
+````md
+```python {1-2,4,6} title=snippet.py
+@ti.kernel
+def paint(t: float):
+    for i, j in pixels:  # Parallized over all pixels
+        c = ti.Vector([-0.8, ti.cos(t) * 0.2])
+        z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
+        iterations = 0
+        while z.norm() < 20 and iterations < 50:
+            z = complex_sqr(z) + c
+            iterations += 1
+        pixels[i, j] = 1 - iterations * 0.02
+```
+````
+
+will result in a code block like:
+
+```python {1-2,4,6} title=snippet.py
+@ti.kernel
+def paint(t: float):
+    for i, j in pixels:  # Parallized over all pixels
+        c = ti.Vector([-0.8, ti.cos(t) * 0.2])
+        z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
+        iterations = 0
+        while z.norm() < 20 and iterations < 50:
+            z = complex_sqr(z) + c
+            iterations += 1
+        pixels[i, j] = 1 - iterations * 0.02
+```
+
+## 2. Insert tables
+
+```md
+| Some Table Col 1 | Some Table Col 2 |
+| :--------------: | :--------------: |
+|       Val1       |       Val4       |
+|       Val2       |       Val5       |
+|       Val3       |       Val6       |
+```
+
+| Some Table Col 1 | Some Table Col 2 |
+| :--------------: | :--------------: |
+|       Val1       |       Val4       |
+|       Val2       |       Val5       |
+|       Val3       |       Val6       |
+
+:::tip TIP
+It's worth mentioning that [Tables Generator](https://www.tablesgenerator.com/markdown_tables) is a great tool for generating and re-formatting markdown tables.
+:::
+
+## 3. Cross-reference and anchor
+
+To link to another section within the same article, you would use `[Return to ## 1. Insert code blocks](#1-insert-code-blocks)`: [Return to ## 1. Insert code blocks](#1-insert-code-blocks).
+
+We follow the best practices suggested by [Docusaurus](https://docusaurus.io/docs/docs-markdown-features#referencing-other-documents) to cross-reference other documents, so to link to sections in other articles, please use the following relative-path based syntax, which
+is docs-versioning and IDE/Github friendly:
+
+- `[Return to Contribution guidelines](./contributor_guide.md)`: [Return to Contribution guidelines](./contributor_guide.md)
+- `[Return to The Documentation which is at root](/docs/#portability)`: [Return to The Documentation](/docs/#portability)
+
+## 4. Centered text block
+
+To make a text or image block centered, use:
+
+```md
+<center>
+
+Centered Text Block!
+
+</center>
+```
+
+<center>
+
+Centered Text Block!
+
+</center>
+
+:::danger NOTE
+You **HAVE TO** insert blank lines to make them work:
+
+```md
+<center>
+
+![](./some_pic.png)
+
+</center>
+```
+
+:::
+
+## 5. Text with color backgorund
+
+You could use the following to highlight your text:
+
+```html
+<span id="inline-blue"> Text with blue background </span>,
+<span id="inline-purple"> Text with purple background </span>,
+<span id="inline-yellow"> Text with yellow background </span>,
+<span id="inline-green"> Text with green background </span>
+```
+
+<span id="inline-blue"> Text with blue background </span>,
+<span id="inline-purple"> Text with purple background </span>,
+<span id="inline-yellow"> Text with yellow background </span>,
+<span id="inline-green"> Text with green background </span>
+
+## 6. Custom containers
+
+As we already saw in this guide several places, we could add custom containers:
+
+```md
+:::tip
+This is a tip without title!
+:::
+```
+
+:::tip
+This is a tip without title!
+:::
+
+```md
+:::tip
+This is a tip with a title!
+:::
+```
+
+:::tip TITLE
+This is a tip with a title!
+:::
+
+```md
+:::note
+This is a note!
+:::
+```
+
+:::note
+This is a note!
+:::
+
+```md caution
+:::caution
+This is a warning!
+:::
+```
+
+:::caution WARNING
+This is a warning!
+:::
+
+```md
+:::danger DANGER
+This is a danger!
+:::
+```
+
+:::danger DANGER
+This is a danger!
+:::
+
+## 7. Code groups
+
+You could also insert tab-based code groups:
+
+```markdown
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<Tabs
+  defaultValue="apple"
+  values={[
+    {label: 'Apple', value: 'apple'},
+    {label: 'Orange', value: 'orange'},
+    {label: 'Banana', value: 'banana'},
+  ]}>
+  <TabItem value="apple">This is an apple 🍎</TabItem>
+  <TabItem value="orange">This is an orange 🍊</TabItem>
+  <TabItem value="banana">This is a banana 🍌</TabItem>
+</Tabs>
+```
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+<Tabs
+  defaultValue="apple"
+  values={[
+    {label: 'Apple', value: 'apple'},
+    {label: 'Orange', value: 'orange'},
+    {label: 'Banana', value: 'banana'},
+  ]}>
+  <TabItem value="apple">This is an apple 🍎</TabItem>
+  <TabItem value="orange">This is an orange 🍊</TabItem>
+  <TabItem value="banana">This is a banana 🍌</TabItem>
+</Tabs>
+
+## 8. Footnotes
+
+It is important to cite the references, to do so, use the `markdown-it`'s footnotes syntax:
+
+```md
+This sentence has a footnote[^1]. (See footnote at the bottom of this guide.)
+
+[^1]: I'm a footnote!
+````
+
+which results in:
+
+---
+
+This sentence has a footnote[^1]. (See footnote at the bottom of this guide.)
+
+[^1]: I'm a footnote!
+
+---
+
+We could also write in-line footnotes, which is much easier to write without counting back and forth:
+
+```md
+This sentence has another footnote ^[I'm another footnote] (See footnote at the bottom of this page.)
+```
+
+which has the same effect:
+
+---
+
+This sentence has another footnote ^[I'm another footnote] (See footnote at the bottom of this page.)
+
+---
+
+## 9. Insert images
+
+Insert images is as straight-forward as using the ordinary markdown syntax:
+
+```md
+![kernel](./life_of_kernel_lowres.jpg)
+```
+
+![kernel](./life_of_kernel_lowres.jpg)
+
+## 10. Insert Table of Contents (ToC)
+
+You could use:
+
+```md
+import TOCInline from '@theme/TOCInline';
+
+<TOCInline toc={toc} />
+```
+
+to insert in-line ToC:
+
+import TOCInline from '@theme/TOCInline';
+
+<TOCInline toc={toc} />
diff --git a/docs/lang/articles/contribution/utilities.md b/docs/lang/articles/contribution/utilities.md
new file mode 100644
index 0000000000000..6d01930f2d865
--- /dev/null
+++ b/docs/lang/articles/contribution/utilities.md
@@ -0,0 +1,222 @@
+---
+sidebar_position: 7
+---
+
+# Developer utilities
+
+This section provides a detailed description of some commonly used
+utilities for Taichi developers.
+
+## Logging
+
+Taichi uses [spdlog](https://github.com/gabime/spdlog) as its logging
+system. Logs can have different levels, from low to high, they are:
+
+| LEVELS |
+| ------ |
+| trace  |
+| debug  |
+| info   |
+| warn   |
+| error  |
+
+The higher the level is, the more critical the message is.
+
+The default logging level is `info`. You may override the default
+logging level by:
+
+1.  Setting the environment variable like `export TI_LOG_LEVEL=warn`.
+2.  Setting the log level from Python side:
+    `ti.set_logging_level(ti.WARN)`.
+
+In **Python**, you may write logs using the `ti.*` interface:
+
+```python
+# Python
+ti.trace("Hello world!")
+ti.debug("Hello world!")
+ti.info("Hello world!")
+ti.warn("Hello world!")
+ti.error("Hello world!")
+```
+
+In **C++**, you may write logs using the `TI_*` interface:
+
+```cpp
+// C++
+TI_TRACE("Hello world!");
+TI_DEBUG("Hello world!");
+TI_INFO("Hello world!");
+TI_WARN("Hello world!");
+TI_ERROR("Hello world!");
+```
+
+If one raises a message of the level `error`, Taichi will be
+**terminated** immediately and result in a `RuntimeError` on Python
+side.
+
+```cpp
+// C++
+int func(void *p) {
+  if (p == nullptr)
+    TI_ERROR("The pointer cannot be null!");
+
+  // will not reach here if p == nullptr
+  do_something(p);
+}
+```
+
+:::note
+For people from Linux kernels, `TI_ERROR` is just `panic`.
+:::
+
+You may also simplify the above code by using `TI_ASSERT`:
+
+```cpp
+int func(void *p) {
+  TI_ASSERT_INFO(p != nullptr, "The pointer cannot be null!");
+  // or
+  // TI_ASSERT(p != nullptr);
+
+  // will not reach here if p == nullptr
+  do_something(p);
+}
+```
+
+## Benchmarking and regression tests
+
+- Run `ti benchmark` to run tests in benchmark mode. This will record
+  the performance of `ti test`, and save it in `benchmarks/output`.
+- Run `ti regression` to show the difference between the previous
+  result in `benchmarks/baseline`. And you can see if the performance
+  is increasing or decreasing after your commits. This is really
+  helpful when your work is related to IR optimizations.
+- Run `ti baseline` to save the benchmark result to
+  `benchmarks/baseline` for future comparison, this may be executed on
+  performance-related PRs, before they are merged into master.
+
+For example, this is part of the output by `ti regression` after
+enabling constant folding optimization pass:
+
+```
+linalg__________________polar_decomp______________________________
+codegen_offloaded_tasks                       37 ->    39    +5.4%
+codegen_statements                          3179 ->  3162    -0.5%
+codegen_kernel_statements                   2819 ->  2788    -1.1%
+codegen_evaluator_statements                   0 ->    14    +inf%
+
+linalg__________________init_matrix_from_vectors__________________
+codegen_offloaded_tasks                       37 ->    39    +5.4%
+codegen_statements                          3180 ->  3163    -0.5%
+codegen_kernel_statements                   2820 ->  2789    -1.1%
+codegen_evaluator_statements                   0 ->    14    +inf%
+```
+
+:::note
+Currently `ti benchmark` only supports benchmarking
+number-of-statements, no time benchmarking is included since it depends
+on hardware performance and therefore hard to compare if the baseline is
+from another machine. We are to purchase a fixed-performance machine as
+a time benchmark server at some point. See detailed discussion at [Github Issue #948](https://github.com/taichi-dev/taichi/issues/948)
+:::
+
+The suggested workflow for the performance-related PR author to run the
+regression tests is:
+
+- Run `ti benchmark && ti baseline` in `master` to save the current
+  performance as a baseline.
+- Run `git checkout -b your-branch-name`.
+- Do works on the issue, stage 1.
+- Run `ti benchmark && ti regression` to obtain the result.
+- (If result BAD) Do further improvements, until the result is
+  satisfying.
+- (If result OK) Run `ti baseline` to save stage 1 performance as a
+  baseline.
+- Go forward to stage 2, 3, ..., and the same workflow is applied.
+
+## (Linux only) Trigger `gdb` when programs crash
+
+```python
+# Python
+ti.set_gdb_trigger(True)
+```
+
+```cpp
+// C++
+CoreState::set_trigger_gdb_when_crash(true);
+```
+
+```bash
+# Shell
+export TI_GDB_TRIGGER=1
+```
+
+:::note
+**Quickly pinpointing segmentation faults/assertion failures using**
+`gdb`: When Taichi crashes, `gdb` will be triggered and attach to the
+current thread. You might be prompt to enter sudo password required for
+gdb thread attaching. After entering `gdb`, check the stack backtrace
+with command `bt` (`backtrace`), then find the line of code triggering
+the error.
+:::
+
+## Code coverage
+
+To ensure that our tests covered every situation, we need to have
+**coverage report**. That is, to detect how many percents of code lines
+in is executed in test.
+
+- Generally, the higher the coverage percentage is, the stronger our
+  tests are.
+- When making a PR, we want to **ensure that it comes with
+  corresponding tests**. Or code coverage will decrease.
+- Code coverage statuses are visible at
+  [Codecov](https://codecov.io/gh/taichi-dev/taichi).
+- Currently, Taichi coverage report is only set up for Python code,
+  not C++ yet.
+
+```bash
+ti test -C       # run tests and save results to .coverage
+coverage report  # generate a coverage report on terminal output
+coverage html    # generate a HTML form report in htmlcov/index.html
+```
+
+## Serialization (legacy)
+
+The serialization module of taichi allows you to serialize/deserialize
+objects into/from binary strings.
+
+You can use `TI_IO` macros to explicitly define fields necessary in
+Taichi.
+
+```cpp
+// TI_IO_DEF
+struct Particle {
+    Vector3f position, velocity;
+    real mass;
+    string name;
+
+    TI_IO_DEF(position, velocity, mass, name);
+}
+
+// TI_IO_DECL
+struct Particle {
+    Vector3f position, velocity;
+    real mass;
+    bool has_name
+    string name;
+
+    TI_IO_DECL() {
+        TI_IO(position);
+        TI_IO(velocity);
+        TI_IO(mass);
+        TI_IO(has_name);
+        // More flexibility:
+        if (has_name) {
+            TI_IO(name);
+        }
+    }
+}
+
+// TI_IO_DEF_VIRT();
+```
diff --git a/docs/lang/articles/contribution/versioning_releases.md b/docs/lang/articles/contribution/versioning_releases.md
new file mode 100644
index 0000000000000..29709163cea6a
--- /dev/null
+++ b/docs/lang/articles/contribution/versioning_releases.md
@@ -0,0 +1,72 @@
+---
+sidebar_position: 10
+---
+
+# Versioning and releases
+
+## Pre-1.0 versioning
+
+Taichi follows [Semantic Versioning 2.0.0](https://semver.org/).
+
+Since Taichi is still under version 1.0.0, we use minor version bumps
+(e.g., `0.6.17->0.7.0`) for breaking API changes, and patch version
+bumps (e.g., `0.6.9->0.6.10`) for backward-compatible changes.
+
+## Workflow: releasing a new version
+
+- Trigger a Linux build on
+  [Jenkins](http://f11.csail.mit.edu:8080/job/taichi/) to see if
+  CUDA passes all tests. Note that Jenkins is the only build bot we
+  have that tests CUDA. (This may take half an hour.)
+
+- Create a branch for the release PR, forking from the latest commit
+  of the `master` branch.
+
+  - Update Taichi version number at the beginning of
+    `CMakeLists.txt`. For example, change
+    `SET(TI_VERSION_PATCH 9)` to `SET(TI_VERSION_PATCH 10)` for
+    a patch release.
+  - commit with message "[release] vX.Y.Z", e.g.
+    "[release] v0.6.10".
+  - You should see two changes in this commit: one line in
+    `CMakeLists.txt` and one line in `docs/version`.
+  - Execute `ti changelog` and save its outputs. You will need
+    this later.
+
+- Open a PR titled "[release] vX.Y.Z" with the branch and commit
+  you just now created.
+
+  - Use the `ti changelog` output you saved in the previous step
+    as the content of the PR description.
+  - Wait for all the checks and build bots to complete. (This step
+    may take up to two hours).
+
+- Squash and merge the PR.
+
+- Trigger the Linux build on Jenkins, again, so that Linux packages
+  are uploaded to PyPI.
+
+- Wait for all build bots to finish. This step uploads PyPI packages
+  for macOS and Windows. You may have to wait for up to two hours.
+
+- Update the `stable` branch so that the head of that branch is your
+  release commit on `master`.
+
+- Draft a new release
+  [(here)](https://github.com/taichi-dev/taichi/releases):
+
+  - The title should be \"vX.Y.Z\".
+  - The tag should be \"vX.Y.Z\".
+  - Target should be \"recent commit\" -\> the release commit.
+  - The release description should be copy-pasted from the release
+    PR description.
+  - Click the \"Publish release\" button.
+
+## Release cycle
+
+Taichi releases new versions twice per week:
+
+- The first release happens on Wednesdays.
+- The second release happens on Saturdays.
+
+Additional releases may happen if anything needs an urgent fix.
diff --git a/docs/lang/articles/contribution/write_test.md b/docs/lang/articles/contribution/write_test.md
new file mode 100644
index 0000000000000..229c9aa65c659
--- /dev/null
+++ b/docs/lang/articles/contribution/write_test.md
@@ -0,0 +1,243 @@
+---
+sidebar_position: 5
+---
+
+# Workflow for writing a Python test
+
+Normally we write functional tests in Python.
+
+- We use [pytest](https://github.com/pytest-dev/pytest) for our Python
+  test infrastructure.
+- Python tests should be added to `tests/python/test_xxx.py`.
+
+For example, you've just added a utility function `ti.log10`. Now you
+want to write a **test** to ensure that it functions properly.
+
+## Adding a new test case
+
+Look into `tests/python`, see if there is already a file suitable for your
+test. If not, create a new file for it. In this case,
+let's create a new file `tests/python/test_logarithm.py` for
+simplicity.
+
+Add a function, the function name **must** start with `test_` so
+that `pytest` could find it. e.g:
+
+```python {3}
+import taichi as ti
+
+def test_log10():
+    pass
+```
+
+Add some simple code that makes use of `ti.log10` to ensure it works
+well. Hint: You may pass/return values to/from Taichi-scope using 0-D
+fields, i.e. `r[None]`.
+
+```python
+import taichi as ti
+
+def test_log10():
+    ti.init(arch=ti.cpu)
+
+    r = ti.field(ti.f32, ())
+
+    @ti.kernel
+    def foo():
+        r[None] = ti.log10(r[None])
+
+    r[None] = 100
+    foo()
+    assert r[None] == 2
+```
+
+Execute `ti test logarithm`, and the functions starting with `test_` in
+`tests/python/test_logarithm.py` will be executed.
+
+## Testing against multiple backends
+
+The line `ti.init(arch=ti.cpu)` in the test above means that it will only test on the CPU backend. In order to test against multiple backends, please use the `@ti.test` decorator, as illustrated below:
+
+```python
+import taichi as ti
+
+# will test against both CPU and CUDA backends
+@ti.test(ti.cpu, ti.cuda)
+def test_log10():
+    r = ti.field(ti.f32, ())
+
+    @ti.kernel
+    def foo():
+        r[None] = ti.log10(r[None])
+
+    r[None] = 100
+    foo()
+    assert r[None] == 2
+```
+
+And you may test against **all backends** by simply not specifying the
+argument:
+
+```python
+import taichi as ti
+
+# will test against all backends available on your end
+@ti.test()
+def test_log10():
+    r = ti.field(ti.f32, ())
+
+    @ti.kernel
+    def foo():
+        r[None] = ti.log10(r[None])
+
+    r[None] = 100
+    foo()
+    assert r[None] == 2
+```
+
+## Using `ti.approx` for comparison with tolerance
+
+Sometimes the precision of math operations could be relatively low on certain backends such as OpenGL,
+e.g. `ti.log10(100)` may return `2.001` or `1.999` in this case.
+
+Adding tolerance with `ti.approx` can be helpful to mitigate
+such errors on different backends, for example `2.001 == ti.approx(2)`
+will return `True` on the OpenGL backend.
+
+```python
+import taichi as ti
+
+# will test against all backends available on your end
+@ti.test()
+def test_log10():
+    r = ti.field(ti.f32, ())
+
+    @ti.kernel
+    def foo():
+        r[None] = ti.log10(r[None])
+
+    r[None] = 100
+    foo()
+    assert r[None] == ti.approx(2)
+```
+
+:::caution
+Simply using `pytest.approx` won't work well here, since it's
+tolerance won't vary among different Taichi backends. It'll likely
+fail on the OpenGL backend.
+
+`ti.approx` also correctly treats boolean types, e.g.:
+`2 == ti.approx(True)`.
+:::
+
+## Parametrize test inputs
+
+In the test above, `r[None] = 100` means that it will only test that `ti.log10` works correctly for the input `100`. In order to test against different input values, you may use the `@pytest.mark.parametrize` decorator:
+
+```python {5}
+import taichi as ti
+import pytest
+import math
+
+@pytest.mark.parametrize('x', [1, 10, 100])
+@ti.test()
+def test_log10(x):
+    r = ti.field(ti.f32, ())
+
+    @ti.kernel
+    def foo():
+        r[None] = ti.log10(r[None])
+
+    r[None] = x
+    foo()
+    assert r[None] == math.log10(x)
+```
+
+Use a comma-separated list for multiple input values:
+
+```python
+import taichi as ti
+import pytest
+import math
+
+@pytest.mark.parametrize('x,y', [(1, 2), (1, 3), (2, 1)])
+@ti.test()
+def test_atan2(x, y):
+    r = ti.field(ti.f32, ())
+    s = ti.field(ti.f32, ())
+
+    @ti.kernel
+    def foo():
+        r[None] = ti.atan2(r[None])
+
+    r[None] = x
+    s[None] = y
+    foo()
+    assert r[None] == math.atan2(x, y)
+```
+
+Use two separate `parametrize` to test **all combinations** of input
+arguments:
+
+```python {5-6}
+import taichi as ti
+import pytest
+import math
+
+@pytest.mark.parametrize('x', [1, 2])
+@pytest.mark.parametrize('y', [1, 2])
+# same as:  .parametrize('x,y', [(1, 1), (1, 2), (2, 1), (2, 2)])
+@ti.test()
+def test_atan2(x, y):
+    r = ti.field(ti.f32, ())
+    s = ti.field(ti.f32, ())
+
+    @ti.kernel
+    def foo():
+        r[None] = ti.atan2(r[None])
+
+    r[None] = x
+    s[None] = y
+    foo()
+    assert r[None] == math.atan2(x, y)
+```
+
+## Specifying `ti.init` configurations
+
+You may specify keyword arguments to `ti.init()` in `ti.test()`, e.g.:
+
+```python {1}
+@ti.test(ti.cpu, debug=True, log_level=ti.TRACE)
+def test_debugging_utils():
+    # ... (some tests have to be done in debug mode)
+```
+
+is the same as:
+
+```python {2}
+def test_debugging_utils():
+    ti.init(arch=ti.cpu, debug=True, log_level=ti.TRACE)
+    # ... (some tests have to be done in debug mode)
+```
+
+## Exclude some backends from test
+
+Some backends are not capable of executing certain tests, you may have to
+exclude them from the test in order to move forward:
+
+```python
+# Run this test on all backends except for OpenGL
+@ti.test(excludes=[ti.opengl])
+def test_sparse_field():
+    # ... (some tests that requires sparse feature which is not supported by OpenGL)
+```
+
+You may also use the `extensions` keyword to exclude backends without
+a specific feature:
+
+```python
+# Run this test on all backends except for OpenGL
+@ti.test(extensions=[ti.extension.sparse])
+def test_sparse_field():
+    # ... (some tests that requires sparse feature which is not supported by OpenGL)
+```
diff --git a/docs/lang/articles/faq.md b/docs/lang/articles/faq.md
new file mode 100755
index 0000000000000..20705cd3c9b8d
--- /dev/null
+++ b/docs/lang/articles/faq.md
@@ -0,0 +1,46 @@
+---
+sidebar_position: 9999
+---
+
+# Frequently Asked Questions
+
+### Why does my `pip` complain `package not found` when installing Taichi?
+
+You may have a Python interpreter with an unsupported version. Currently, Taichi only supports Python 3.6/3.7/3.8 (64-bit) . For more information about installation related issues, please check [Installation Troubleshooting](./misc/install.md).
+
+### Does Taichi provide built-in constants such as `ti.pi`?
+
+There is no built-in constant such as `pi`. We recommended using `math.pi` directly.
+
+### Outer-most loops in Taichi kernels are by default parallel. How can I **serialize** one of them?
+
+A solution is to add an additional *ghost* loop with only one iteration outside the loop you want to serialize.
+
+```python {1}
+for _ in range(1):  # This "ghost" loop will be "parallelized", but with only one thread. Therefore, the containing loop below is serialized.
+    for i in range(100):  # The loop you want to serialize
+        ...
+```
+
+### What is the most convenient way to load images into Taichi fields?
+
+One feasible solution is `field.from_numpy(ti.imread('filename.png'))`.
+
+### Can Taichi interact with **other Python packages** such as `matplotlib`?
+
+Yes, Taichi supports various popular Python packages. Please check out [Interacting with other Python packages](/docs/#interacting-with-other-python-packages).
+
+### How do I declare a field with a **dynamic length**?
+
+The `dynamic` SNode supports variable-length fields. It acts similarly to `std::vector` in C++ or `list` in Python. Please check out [Working with dynamic SNodes](../api/snode.md#working-with-dynamic-snodes) for more details.
+
+:::tip
+An alternative solution is to allocate a large enough `dense` field, with a corresponding 0-D field
+`field_len[None]` tracking its length. In practice, programs allocating memory using `dynamic`
+SNodes may be less efficient than using `dense` SNodes, due to dynamic data structure
+maintainance overheads.
+:::
+
+### How do I program on less structured data structures (such as graphs and tetrahedral meshes) in Taichi?
+
+These structures have to be decomposed into 1D Taichi fields. For example, when representing a graph, you can allocate two fields, one for the vertices and the other for the edges. You can then traverse the elements using `for v in vertices` or `for v in range(n)`.
diff --git a/docs/lang/articles/get-started.md b/docs/lang/articles/get-started.md
new file mode 100644
index 0000000000000..0c257008f8c3c
--- /dev/null
+++ b/docs/lang/articles/get-started.md
@@ -0,0 +1,390 @@
+---
+sidebar_position: 1
+slug: /
+---
+
+# Getting Started
+
+Welcome to the Taichi Language documentation!
+
+## Installation
+
+To get started with the Taichi Language, simply install it with `pip`:
+
+```shell
+python3 -m pip install taichi
+```
+
+:::note
+Currently, Taichi only supports Python 3.6/3.7/3.8 (64-bit).
+:::
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+There are a few of extra requirements depend on which operating system you are using:
+
+<Tabs
+  defaultValue="ubuntu"
+  values={[
+    {label: 'Ubuntu', value: 'ubuntu'},
+    {label: 'Arch Linux', value: 'arch-linux'},
+    {label: 'Windows', value: 'windows'},
+  ]}>
+
+  <TabItem value="ubuntu">
+
+  On Ubuntu 19.04+, you need to install `libtinfo5`:
+
+  ```sudo apt install libtinfo5```
+
+  </TabItem>
+  <TabItem value="arch-linux">
+
+  On Arch Linux, you need to install `ncurses5-compat-libs` package from the Arch User Repository:
+
+  ```yaourt -S ncurses5-compat-libs```
+
+  </TabItem>
+  <TabItem value="windows">
+
+  On Windows, please install [Microsoft Visual C++
+  Redistributable](https://aka.ms/vs/16/release/vc_redist.x64.exe) if you haven't done so.
+
+  </TabItem>
+</Tabs>
+
+Please refer to the [Installation Troubleshooting](./misc/install.md) section if you run into any issues when installing Taichi.
+
+## Hello, world!
+
+We introduce the Taichi programming language through a very basic _fractal_ example.
+
+Running the Taichi code below using either `python3 fractal.py` or `ti example fractal` _(you can find more information about the Taichi CLI in the [Command line utilities](./misc/cli_utilities.md) section)_ will give you an animation of [Julia set](https://en.wikipedia.org/wiki/Julia_set):
+
+<center>
+
+![image](https://raw.githubusercontent.com/taichi-dev/public_files/master/taichi/fractal.gif)
+
+</center>
+
+```python title=fractal.py
+import taichi as ti
+
+ti.init(arch=ti.gpu)
+
+n = 320
+pixels = ti.field(dtype=float, shape=(n * 2, n))
+
+@ti.func
+def complex_sqr(z):
+    return ti.Vector([z[0]**2 - z[1]**2, z[1] * z[0] * 2])
+
+@ti.kernel
+def paint(t: float):
+    for i, j in pixels:  # Parallized over all pixels
+        c = ti.Vector([-0.8, ti.cos(t) * 0.2])
+        z = ti.Vector([i / n - 1, j / n - 0.5]) * 2
+        iterations = 0
+        while z.norm() < 20 and iterations < 50:
+            z = complex_sqr(z) + c
+            iterations += 1
+        pixels[i, j] = 1 - iterations * 0.02
+
+gui = ti.GUI("Julia Set", res=(n * 2, n))
+
+for i in range(1000000):
+    paint(i * 0.03)
+    gui.set_image(pixels)
+    gui.show()
+```
+
+Let's dive into this simple Taichi program.
+
+### import taichi as ti
+
+Taichi is a domain-specific language (DSL) embedded in Python.
+
+To make Taichi as easy to use as a Python package, we have done heavy
+engineering with this goal in mind - letting every Python programmer
+write Taichi programs with minimal learning effort.
+
+You can even use your favorite Python package management system, Python IDEs and other Python packages in conjunction with Taichi.
+
+```python
+# Run on GPU, automatically detect backend
+ti.init(arch=ti.gpu)
+
+# Run on GPU, with the NVIDIA CUDA backend
+ti.init(arch=ti.cuda)
+# Run on GPU, with the OpenGL backend
+ti.init(arch=ti.opengl)
+# Run on GPU, with the Apple Metal backend, if you are on macOS
+ti.init(arch=ti.metal)
+
+# Run on CPU (default)
+ti.init(arch=ti.cpu)
+```
+
+:::info
+
+Supported backends on different platforms:
+
+| **platform** | **CPU** | **CUDA** | **OpenGL** | **Metal** | **C source** |
+| :----------: | :-----: | :------: | :--------: | :-------: | :----------: |
+|   Windows    |   OK    |    OK    |     OK     |    N/A    |     N/A      |
+|    Linux     |   OK    |    OK    |     OK     |    N/A    |      OK      |
+|    macOS     |   OK    |   N/A    |    N/A     |    OK     |     N/A      |
+
+(OK: supported; N/A: not available)
+
+With `arch=ti.gpu`, Taichi will first try to run with CUDA. If CUDA is
+not supported on your machine, Taichi will fall back on Metal or OpenGL.
+If no GPU backend (CUDA, Metal, or OpenGL) is supported, Taichi will
+fall back on CPUs.
+:::
+
+:::note
+
+When used with the CUDA backend on Windows or ARM devices (e.g., NVIDIA
+Jetson), Taichi allocates 1 GB GPU memory for field storage by default.
+
+You can override this behavior by initializing with
+`ti.init(arch=ti.cuda, device_memory_GB=3.4)` to allocate `3.4` GB GPU
+memory, or `ti.init(arch=ti.cuda, device_memory_fraction=0.3)` to
+allocate `30%` of the total GPU memory.
+
+On other platforms, Taichi will make use of its on-demand memory
+allocator to allocate memory adaptively.
+:::
+
+### Fields
+
+Taichi is a **data**-oriented programming language where dense or
+spatially-sparse fields are the first-class citizens. See [Scalar fields](../api/scalar_field.md) for more details on fields.
+
+In the code above, `pixels = ti.field(dtype=float, shape=(n * 2, n))`
+allocates a 2D dense field named `pixels` of size `(640, 320)` and
+element data type `float`.
+
+### Functions and kernels
+
+Computation resides in Taichi **kernels** and Taichi **functions**.
+
+Taichi **kernels** are defined with the decorator `@ti.kernel`. They can
+be called from Python to perform computation. Kernel arguments must be
+type-hinted (if any).
+
+Taichi **functions** are defined with the decorator `@ti.func`. They can
+**only** be called by Taichi kernels or other Taichi functions.
+
+See [syntax](./basic/syntax.md) for more details about Taichi
+kernels and functions.
+
+The language used in Taichi kernels and functions looks exactly like
+Python, yet the Taichi frontend compiler converts it into a language
+that is **compiled, statically-typed, lexically-scoped, parallel and
+differentiable**.
+
+:::info
+
+**Taichi-scopes v.s. Python-scopes**:
+
+Everything decorated with `@ti.kernel` and `@ti.func` is in Taichi-scope
+and hence will be compiled by the Taichi compiler.
+
+Everything else is in Python-scope. They are simply Python native code.
+:::
+
+:::caution
+
+Taichi kernels must be called from the Python-scope. Taichi functions
+must be called from the Taichi-scope.
+:::
+
+:::tip
+
+For those who come from the world of CUDA, `ti.func` corresponds to
+`__device__` while `ti.kernel` corresponds to `__global__`.
+:::
+
+:::note
+
+Nested kernels are **not supported**.
+
+Nested functions are **supported**.
+
+Recursive functions are **not supported for now**.
+:::
+
+### Parallel for-loops
+
+For loops at the outermost scope in a Taichi kernel is **automatically
+parallelized**. For loops can have two forms, i.e. _range-for
+loops_ and _struct-for loops_.
+
+**Range-for loops** are no different from Python for loops, except that
+they will be parallelized when used at the outermost scope. Range-for
+loops can be nested.
+
+```python {3,7,14-15}
+@ti.kernel
+def fill():
+    for i in range(10): # Parallelized
+        x[i] += i
+
+        s = 0
+        for j in range(5): # Serialized in each parallel thread
+            s += j
+
+        y[i] = s
+
+@ti.kernel
+def fill_3d():
+    # Parallelized for all 3 <= i < 8, 1 <= j < 6, 0 <= k < 9
+    for i, j, k in ti.ndrange((3, 8), (1, 6), 9):
+        x[i, j, k] = i + j + k
+```
+
+:::note
+
+It is the loop **at the outermost scope** that gets parallelized, not
+the outermost loop.
+
+```python {3,9}
+@ti.kernel
+def foo():
+    for i in range(10): # Parallelized :-)
+        ...
+
+@ti.kernel
+def bar(k: ti.i32):
+    if k > 42:
+        for i in range(10): # Serial :-(
+            ...
+```
+
+:::
+
+**Struct-for loops** are particularly useful when iterating over
+(sparse) field elements. In the `fractal.py` above, `for i, j in pixels` loops
+over all the pixel coordinates, i.e.,
+`(0, 0), (0, 1), (0, 2), ... , (0, 319), (1, 0), ..., (639, 319)`.
+
+:::note
+
+Struct-for is the key to [sparse computation](./advanced/sparse.md) in
+Taichi, as it will only loop over active elements in a sparse field. In
+dense fields, all elements are active.
+:::
+
+:::caution
+
+Struct-for loops must live at the outer-most scope of kernels.
+
+It is the loop **at the outermost scope** that gets parallelized, not
+the outermost loop.
+
+```python
+@ti.kernel
+def foo():
+    for i in x:
+        ...
+
+@ti.kernel
+def bar(k: ti.i32):
+    # The outermost scope is a `if` statement
+    if k > 42:
+        for i in x: # Not allowed. Struct-fors must live in the outermost scope.
+            ...
+```
+
+:::
+
+:::caution
+
+`break` **is not supported in parallel loops**:
+
+```python {5,9,16}
+@ti.kernel
+def foo():
+  for i in x:
+      ...
+      break # Error!
+
+  for i in range(10):
+      ...
+      break # Error!
+
+@ti.kernel
+def foo():
+  for i in x:
+      for j in range(10):
+          ...
+          break # OK!
+```
+
+:::
+
+### Interacting with other Python packages
+
+#### Python-scope data access
+
+Everything outside Taichi-scopes (`ti.func` and `ti.kernel`) is simply
+Python code. In Python-scopes, you can access Taichi field elements
+using plain indexing syntax. For example, to access a single pixel of
+the rendered image in Python-scope, you can simply use:
+
+```python
+import taichi as ti
+pixels = ti.field(ti.f32, (1024, 512))
+
+pixels[42, 11] = 0.7  # store data into pixels
+print(pixels[42, 11]) # prints 0.7
+```
+
+### Sharing data with other packages
+
+Taichi provides helper functions such as `from_numpy` and `to_numpy` to
+transfer data between Taichi fields and NumPy arrays, so that you can
+also use your favorite Python packages (e.g., `numpy`, `pytorch`,
+`matplotlib`) together with Taichi as below:
+
+```python
+import taichi as ti
+pixels = ti.field(ti.f32, (1024, 512))
+
+import numpy as np
+arr = np.random.rand(1024, 512)
+pixels.from_numpy(arr)   # load numpy data into taichi fields
+
+import matplotlib.pyplot as plt
+arr = pixels.to_numpy()  # store taichi data into numpy arrays
+plt.imshow(arr)
+plt.show()
+
+import matplotlib.cm as cm
+cmap = cm.get_cmap('magma')
+gui = ti.GUI('Color map')
+while gui.running:
+    render_pixels()
+    arr = pixels.to_numpy()
+    gui.set_image(cmap(arr))
+    gui.show()
+```
+
+See [Interacting with external arrays](./basic/external.md#interacting-with-external-arrays) for more details.
+
+## What's next?
+
+Now we have gone through core features of the
+Taichi programming language using the fractal example,
+feel free to dive into the language concepts in
+the next section, or jump to the advanced topics, such as the [Metaprogramming](./advanced/meta.md) or [Differentiable programming](./advanced/differentiable_programming.md). Remember that you can
+use the search bar at the top right corner to search for topics or keywords
+at any time!
+
+If you are interested in joining the Taichi community, we strongly recommend you take some time to
+familiarize yourself with our [contribution guide](./contribution/contributor_guide.md).
+
+We hope you enjoy your adventure with Taichi!
diff --git a/docs/lang/articles/misc/cli_utilities.md b/docs/lang/articles/misc/cli_utilities.md
new file mode 100644
index 0000000000000..fcb8893b93265
--- /dev/null
+++ b/docs/lang/articles/misc/cli_utilities.md
@@ -0,0 +1,71 @@
+---
+sidebar_position: 6
+---
+
+# Command line utilities
+
+A successful installation of Taichi should add a CLI (Command-Line
+Interface) to your system, which is helpful to perform several rountine
+tasks quickly. To invoke the CLI, please run `ti` or
+`python3 -m taichi`.
+
+## Examples
+
+Taichi provides a set of bundled examples. You could run `ti example -h`
+to print the help message and get a list of available example names.
+
+For instance, to run the basic `fractal` example, try: `ti example fractal`
+from your shell. (`ti example fractal.py` should also work)
+
+You may print the source code of example by running
+`ti example -p fractal`, or `ti example -P fractal` for print with
+syntax highlight.
+
+You may also save the example to current work directory by running
+`ti example -s fractal`.
+
+## Changelog
+
+Sometimes it's convenient to view the changelog of the current version
+of Taichi. To do so, you could run `ti changelog` in your shell.
+
+## REPL Shell
+
+Sometimes it's convenient to start a Python shell with
+`import taichi as ti` as a pre-loaded module for fast testing and
+confirmation. To do so from your shell, you could run `ti repl`.
+
+## System information
+
+When you try to report potential bugs in an issue, please consider
+running `ti diagnose` and offer its output as an attachment. This could
+help maintainers to learn more about the context and the system
+information of your environment to make the debugging process more
+efficient and solve your issue more easily.
+
+:::caution
+**Before posting it, please review and make sure there's no sensitive information about your data or yourself gets carried in.**
+:::
+
+## Converting PNGs to video
+
+Sometimes it's convenient to convert a series of `png` files into a
+single video when showing your result to others.
+
+For example, suppose you have `000000.png`, `000001.png`, \... generated
+according to [Export your results](./export_results.md) in the
+**current working directory**.
+
+Then you could run `ti video` to create a file `video.mp4` containing
+all these images as frames (sorted by file name).
+
+Use `ti video -f40` for creating a video with 40 FPS.
+
+## Converting video to GIF
+
+Sometimes we need `gif` images in order to post the result on forums.
+
+To do so, you could run `ti gif -i video.mp4`, where `video.mp4` is the
+`mp4` video (generated with instructions above).
+
+Use `ti gif -i video.mp4 -f40` for creating a GIF with 40 FPS.
diff --git a/docs/lang/articles/misc/debugging.md b/docs/lang/articles/misc/debugging.md
new file mode 100644
index 0000000000000..8d6b5cb9a6d85
--- /dev/null
+++ b/docs/lang/articles/misc/debugging.md
@@ -0,0 +1,410 @@
+---
+sidebar_position: 2
+---
+
+# Debugging
+
+Debugging a parallel program is not easy, so Taichi provides builtin
+utilities that could hopefully help you debug your Taichi program.
+
+## Run-time `print` in kernels
+
+```python
+print(arg1, ..., sep='', end='\n')
+```
+
+Debug your program with `print()` in Taichi-scope. For example:
+
+```python {1}
+@ti.kernel
+def inside_taichi_scope():
+    x = 233
+    print('hello', x)
+    #=> hello 233
+
+    print('hello', x * 2 + 200)
+    #=> hello 666
+
+    print('hello', x, sep='')
+    #=> hello233
+
+    print('hello', x, sep='', end='')
+    print('world', x, sep='')
+    #=> hello233world233
+
+    m = ti.Matrix([[2, 3, 4], [5, 6, 7]])
+    print('m =', m)
+    #=> m = [[2, 3, 4], [5, 6, 7]]
+
+    v = ti.Vector([3, 4])
+    print('v =', v)
+    #=> v = [3, 4]
+```
+
+For now, Taichi-scope `print` supports string, scalar, vector, and
+matrix expressions as arguments. `print` in Taichi-scope may be a little
+different from `print` in Python-scope. Please see details below.
+
+:::caution
+For the **CPU and CUDA backends**, `print` will not work in Graphical
+Python Shells including IDLE and Jupyter notebook. This is because these
+backends print the outputs to the console instead of the GUI. Use the
+**OpenGL or Metal backend** if you wish to use `print` in IDLE /
+Jupyter.
+:::
+
+:::caution
+
+For the **CUDA backend**, the printed result will not show up until
+`ti.sync()` is called:
+
+```python
+import taichi as ti
+ti.init(arch=ti.cuda)
+
+@ti.kernel
+def kern():
+    print('inside kernel')
+
+print('before kernel')
+kern()
+print('after kernel')
+ti.sync()
+print('after sync')
+```
+
+results in:
+
+```
+before kernel
+after kernel
+inside kernel
+after sync
+```
+
+Note that host access or program end will also implicitly invoke
+`ti.sync()`.
+:::
+
+:::note
+Note that `print` in Taichi-scope can only receive **comma-separated
+parameters**. Neither f-string nor formatted string should be used. For
+example:
+
+```python {9-11}
+import taichi as ti
+ti.init(arch=ti.cpu)
+a = ti.field(ti.f32, 4)
+
+
+@ti.kernel
+def foo():
+    a[0] = 1.0
+    print('a[0] = ', a[0]) # right
+    print(f'a[0] = {a[0]}') # wrong, f-string is not supported
+    print("a[0] = %f" % a[0]) # wrong, formatted string is not supported
+
+foo()
+```
+
+:::
+
+## Compile-time `ti.static_print`
+
+Sometimes it is useful to print Python-scope objects and constants like
+data types or SNodes in Taichi-scope. So, similar to `ti.static`, Taichi
+provides `ti.static_print` to print compile-time constants, which is similar
+to Python-scope `print`:
+
+```python
+x = ti.field(ti.f32, (2, 3))
+y = 1
+
+@ti.kernel
+def inside_taichi_scope():
+    ti.static_print(y)
+    # => 1
+    ti.static_print(x.shape)
+    # => (2, 3)
+    ti.static_print(x.dtype)
+    # => DataType.float32
+    for i in range(4):
+            ti.static_print(i.dtype)
+            # => DataType.int32
+            # will only print once
+```
+
+Unlike `print`, `ti.static_print` will only print the expression once at
+compile-time, and therefore it has no runtime cost.
+
+## Serial execution
+
+The automatic parallelization feature of Taichi may lead to
+nondeterministic behaviors. For debugging purposes, it may be useful to
+serialize program execution to get repeatable results and to diagnose
+data races. When running your Taichi program on CPUs, you can initialize
+Taichi to use a single thread with `cpu_max_num_threads=1`, so that the
+whole program becomes serial and deterministic. For example,
+
+```
+ti.init(arch=ti.cpu, cpu_max_num_threads=1)
+```
+
+If you program works well in serial but not in parallel, check
+parallelization-related issues such as data races.
+
+## Runtime `assert` in kernel
+
+Programmers may use `assert` statements in Taichi-scope. When the
+assertion condition failed, a `RuntimeError` will be raised to indicate
+the error.
+
+:::note
+`assert` is currently supported on the CPU, CUDA, and Metal backends.
+:::
+
+For performance reason, `assert` only works when `debug` mode
+is on. For example:
+
+```python
+ti.init(arch=ti.cpu, debug=True)
+
+x = ti.field(ti.f32, 128)
+
+@ti.kernel
+def do_sqrt_all():
+    for i in x:
+        assert x[i] >= 0
+        x[i] = ti.sqrt(x)
+```
+
+When you are done with debugging, simply set `debug=False`. Now `assert`
+will be ignored and there will be no runtime overhead.
+
+## Compile-time `ti.static_assert`
+
+```python
+ti.static_assert(cond, msg=None)
+```
+
+Like `ti.static_print`, Taichi also provides a static version of `assert`:
+`ti.static_assert`. It can be useful to make assertions on data types,
+dimensionality, and shapes. It works whether `debug=True` is specified
+or not. When an assertion fails, it will raise an `AssertionError`, just
+like a Python-scope `assert`.
+
+For example:
+
+```python
+@ti.func
+def copy(dst: ti.template(), src: ti.template()):
+    ti.static_assert(dst.shape == src.shape, "copy() needs src and dst fields to be same shape")
+    for I in ti.grouped(src):
+        dst[I] = src[I]
+    return x % 2 == 1
+```
+
+## Pretty Taichi-scope traceback
+
+Sometimes the Python stack tracebacks resulted from **Taichi-scope** errors
+could be too complicated to read. For example:
+
+```python
+import taichi as ti
+ti.init()
+
+@ti.func
+def func3():
+    ti.static_assert(1 + 1 == 3)
+
+@ti.func
+def func2():
+    func3()
+
+@ti.func
+def func1():
+    func2()
+
+@ti.kernel
+def func0():
+    func1()
+
+func0()
+```
+
+The above snippet would result in an `AssertionError`:
+
+```
+Traceback (most recent call last):
+  File "misc/demo_excepthook.py", line 20, in <module>
+    func0()
+  File "/root/taichi/python/taichi/lang/kernel.py", line 559, in wrapped
+    return primal(*args, **kwargs)
+  File "/root/taichi/python/taichi/lang/kernel.py", line 488, in __call__
+    self.materialize(key=key, args=args, arg_features=arg_features)
+  File "/root/taichi/python/taichi/lang/kernel.py", line 367, in materialize
+    taichi_kernel = taichi_kernel.define(taichi_ast_generator)
+  File "/root/taichi/python/taichi/lang/kernel.py", line 364, in taichi_ast_generator
+    compiled()
+  File "misc/demo_excepthook.py", line 18, in func0
+    func1()
+  File "/root/taichi/python/taichi/lang/kernel.py", line 39, in decorated
+    return fun.__call__(*args)
+  File "/root/taichi/python/taichi/lang/kernel.py", line 79, in __call__
+    ret = self.compiled(*args)
+  File "misc/demo_excepthook.py", line 14, in func1
+    func2()
+  File "/root/taichi/python/taichi/lang/kernel.py", line 39, in decorated
+    return fun.__call__(*args)
+  File "/root/taichi/python/taichi/lang/kernel.py", line 79, in __call__
+    ret = self.compiled(*args)
+  File "misc/demo_excepthook.py", line 10, in func2
+    func3()
+  File "/root/taichi/python/taichi/lang/kernel.py", line 39, in decorated
+    return fun.__call__(*args)
+  File "/root/taichi/python/taichi/lang/kernel.py", line 79, in __call__
+    ret = self.compiled(*args)
+  File "misc/demo_excepthook.py", line 6, in func3
+    ti.static_assert(1 + 1 == 3)
+  File "/root/taichi/python/taichi/lang/error.py", line 14, in wrapped
+    return foo(*args, **kwargs)
+  File "/root/taichi/python/taichi/lang/impl.py", line 252, in static_assert
+    assert cond
+AssertionError
+```
+
+Many of the stack frames are the Taichi compiler implementation details, which
+could be too noisy to read. You could choose to elide them by using
+`ti.init(excepthook=True)`, which _hooks_ on the exception handler, and makes
+the stack traceback from Taichi-scope more intuitive:
+
+```python {2}
+import taichi as ti
+ti.init(excepthook=True)
+...
+```
+
+which makes the result look like:
+
+```python
+========== Taichi Stack Traceback ==========
+In <module>() at misc/demo_excepthook.py:21:
+--------------------------------------------
+@ti.kernel
+def func0():
+    func1()
+
+func0()  <--
+--------------------------------------------
+In func0() at misc/demo_excepthook.py:19:
+--------------------------------------------
+    func2()
+
+@ti.kernel
+def func0():
+    func1()  <--
+
+func0()
+--------------------------------------------
+In func1() at misc/demo_excepthook.py:15:
+--------------------------------------------
+    func3()
+
+@ti.func
+def func1():
+    func2()  <--
+
+@ti.kernel
+--------------------------------------------
+In func2() at misc/demo_excepthook.py:11:
+--------------------------------------------
+    ti.static_assert(1 + 1 == 3)
+
+@ti.func
+def func2():
+    func3()  <--
+
+@ti.func
+--------------------------------------------
+In func3() at misc/demo_excepthook.py:7:
+--------------------------------------------
+ti.enable_excepthook()
+
+@ti.func
+def func3():
+    ti.static_assert(1 + 1 == 3)  <--
+
+@ti.func
+--------------------------------------------
+AssertionError
+```
+
+:::note
+For IPython / Jupyter notebook users, the IPython stack traceback hook
+will be overriden by the Taichi one when `ti.enable_excepthook()` is called.
+:::
+
+## Debugging Tips
+
+Debugging a Taichi program can be hard even with the above builtin tools.
+Here we showcase some common bugs that one may encounter in a
+Taichi program.
+
+### Static type system
+
+Python code in Taichi-scope is translated into a statically typed
+language for high performance. This means code in Taichi-scope can have
+a different behavior compared with that in Python-scope, especially when
+it comes to types.
+
+The type of a variable is simply **determined at its initialization and
+never changes later**.
+
+Although Taichi's static type system provides better performance, it
+may lead to bugs if programmers used the wrong types. For
+example:
+
+```python
+@ti.kernel
+def buggy():
+    ret = 0  # 0 is an integer, so `ret` is typed as int32
+    for i in range(3):
+        ret += 0.1 * i  # i32 += f32, the result is still stored in int32!
+    print(ret)  # will show 0
+
+buggy()
+```
+
+The code above shows a common bug due to misuse of the Taichi's static type system,
+the Taichi compiler should show a warning like:
+
+```
+[W 06/27/20 21:43:51.853] [type_check.cpp:visit@66] [$19] Atomic add (float32 to int32) may lose precision.
+```
+
+This means that Taichi cannot store a `float32` result precisely to
+`int32`. The solution is to initialize `ret` as a float-point value:
+
+```python
+@ti.kernel
+def not_buggy():
+    ret = 0.0  # 0 is a floating point number, so `ret` is typed as float32
+    for i in range(3):
+        ret += 0.1 * i  # f32 += f32. OK!
+    print(ret)  # will show 0.6
+
+not_buggy()
+```
+
+### Advanced Optimization
+
+By default, Taichi runs a handful of advanced IR optimizations to make your
+Taichi kernels as performant as possible. Unfortunately, advanced
+optimization may occasionally lead to compilation errors, such as the following:
+
+`RuntimeError: [verify.cpp:basic_verify@40] stmt 8 cannot have operand 7.`
+
+You can turn off the advanced optimizations with
+`ti.init(advanced_optimization=False)` and see if it makes a difference. If
+the issue persists, please feel free to report this bug on
+[GitHub](https://github.com/taichi-dev/taichi/issues/new?labels=potential+bug&template=bug_report.md).
diff --git a/docs/lang/articles/misc/export_kernels.md b/docs/lang/articles/misc/export_kernels.md
new file mode 100644
index 0000000000000..c30431737edf1
--- /dev/null
+++ b/docs/lang/articles/misc/export_kernels.md
@@ -0,0 +1,213 @@
+---
+sidebar_position: 8
+---
+
+# Export Taichi kernels to C source
+
+The C backend of Taichi allows you to **export Taichi kernels to C
+source**.
+
+The exported Taichi program consists purely of C99-compatible code and
+does not require Python. This allows you to use the exported code in a
+C/C++ project, or even to further compile it to Javascript/Web Assembly
+via Emscripten.
+
+Each C function corresponds to one Taichi kernel. For example,
+`Tk_init_c6_0()` may correspond to `init()` in `mpm88.py`.
+
+The exported C code is self-contained for portability. Required Taichi
+runtime functions are included in the code.
+
+For example, this allows programmers to distribute Taichi programs in a
+binary format, by compiling and linking exported C code to their
+project.
+
+:::caution
+Currently, this feature is only officially supported on the C backend on
+Linux. In the future, we will support macOS and Windows.
+:::
+
+## The workflow of exporting
+
+Use `ti.core.start_recording` in the Taichi program you want to export.
+
+Suppose you want to export
+[examples/mpm88.py](https://github.com/taichi-dev/taichi/blob/master/examples/mpm88.py),
+here is the workflow:
+
+### Export YAML
+
+First, modify `mpm88.py` as shown below:
+
+```python
+import taichi as ti
+
+ti.core.start_recording('mpm88.yml')
+ti.init(arch=ti.cc)
+
+... # your program
+```
+
+Then please execute `mpm88.py`. Close the GUI window once particles are
+shown up correctly.
+
+This will save all the kernels in `mpm88.py` to `mpm88.yml`:
+
+```yaml
+- action: "compile_kernel"
+   kernel_name: "init_c6_0"
+   kernel_source: "void Tk_init_c6_0(struct Ti_Context *ti_ctx) {\n  for (Ti_i32 tmp0 = 0; tmp0 < 8192...\n"
+ - action: "launch_kernel"
+   kernel_name: "init_c6_0"
+ ...
+```
+
+:::note
+
+Equivalently, you may also specify these two arguments from environment
+variables on Unix-like system:
+
+```bash
+TI_ARCH=cc TI_ACTION_RECORD=mpm88.yml python mpm88.py
+```
+
+:::
+
+### Compose YAML into a single C file
+
+Now, all necessary information is saved in `mpm88.yml`, in the form of
+multiple separate records. You may want to **compose** the separate
+kernels into **one single file** for more portability.
+
+We provide a useful CLI tool to do this:
+
+```bash
+python3 -m taichi cc_compose mpm88.yml mpm88.c mpm88.h
+```
+
+This composes all the kernels and runtimes in `mpm88.yml` into a single
+C source file `mpm88.c`:
+
+```c
+...
+
+Ti_i8 Ti_gtmp[1048576];
+union Ti_BitCast Ti_args[8];
+Ti_i32 Ti_earg[8 * 8];
+
+struct Ti_Context Ti_ctx = {  // statically-allocated context for convenience!
+  &Ti_root, Ti_gtmp, Ti_args, Ti_earg,
+};
+
+void Tk_init_c6_0(struct Ti_Context *ti_ctx) {
+  for (Ti_i32 tmp0 = 0; tmp0 < 8192; tmp0 += 1) {
+    Ti_i32 tmp1 = tmp0;
+    Ti_f32 tmp2 = Ti_rand_f32();
+    Ti_f32 tmp3 = Ti_rand_f32();
+    Ti_f32 tmp4 = 0.4;
+    Ti_f32 tmp5 = tmp2 * tmp4;
+
+    ...
+```
+
+... and a C header file `mpm88.h` for declarations of data structures,
+functions (Taichi kernels) for this file.
+
+:::note
+
+The generated C source is promised to be C99 compatible.
+
+It should also be functional when compiled using a C++ compiler.
+:::
+
+## Calling the exported kernels
+
+Then, link the C file (`mpm88.c`) against your C/C++ project. Include
+the header file (`mpm88.h`) when Taichi kernels are called.
+
+For example, calling kernel `init_c6_0` can be implemented as follows:
+
+```cpp
+#include "mpm88.h"
+
+int main(void) {
+    ...
+    Tk_init_c6_0(&Ti_ctx);
+    ...
+}
+```
+
+Alternatively, if you need multiple Taichi contexts within one program:
+
+```cpp
+extern "C" {  // if you use mpm88.c instead of renaming it to mpm88.cpp
+#include "mpm88.h"
+}
+
+class MyRenderer {
+  ...
+  struct Ti_Context per_renderer_taichi_context;
+  ...
+};
+
+MyRenderer::MyRenderer() {
+  // allocate buffers on your own:
+  per_renderer_taichi_context.root = malloc(...);
+  ...
+  Tk_init_c6_0(&per_renderer_taichi_context);
+}
+```
+
+### Specifying scalar arguments
+
+To specify scalar arguments for kernels:
+
+```cpp
+Ti_ctx.args[0].val_f64 = 3.14;  // first argument, float64
+Ti_ctx.args[1].val_i32 = 233;  // second argument, int32
+Tk_my_kernel_c8_0(&Ti_ctx);
+double ret = Ti_ctx.args[0].val_f64;  // return value, float64
+
+printf("my_kernel(3.14, 233) = %lf\n", ret);
+```
+
+### Passing external arrays
+
+To pass external arrays as arguments for kernels:
+
+```cpp
+float img[640 * 480 * 3];
+
+Ti_ctx.args[0].ptr_f32 = img;  // first argument, float32 pointer to array
+
+// specify the shape of that array:
+Ti_ctx.earg[0 * 8 + 0] = 640;  // img.shape[0]
+Ti_ctx.earg[0 * 8 + 1] = 480;  // img.shape[1]
+Ti_ctx.earg[0 * 8 + 2] = 3;    // img.shape[2]
+Tk_matrix_to_ext_arr_c12_0(&Ti_ctx);
+
+// note that the array used in Taichi is row-major:
+printf("img[3, 2, 1] = %f\n", img[(3 * 480 + 2) * 3 + 1]);
+```
+
+## Taichi.js (WIP)
+
+Once you have C source file generated, you can compile them into
+Javascript or WASM via Emscripten.
+
+We provide [Taichi.js](https://github.com/taichi-dev/taichi.js) as an
+infrastructure for wrapping Taichi kernels for Javascript. See [its
+README.md](https://github.com/taichi-dev/taichi.js/blob/master/README.md)
+for the complete workflow.
+
+Check out [this page](https://taichi-dev.github.io/taichi.js) for online
+demos.
+
+## Calling Taichi kernels from Julia (WIP)
+
+Once you have C source generated, you can then compile the C source into
+a shared object. Then it can be called from other langurages that
+provides a C interface, including but not limited to Julia, Matlab,
+Mathematica, Java, etc.
+
+TODO: WIP.
diff --git a/docs/lang/articles/misc/export_results.md b/docs/lang/articles/misc/export_results.md
new file mode 100644
index 0000000000000..9db2c468f7c7e
--- /dev/null
+++ b/docs/lang/articles/misc/export_results.md
@@ -0,0 +1,407 @@
+---
+sidebar_position: 5
+---
+
+# Export your results
+
+Taichi has functions that help you **export visual results to images or
+videos**. This tutorial demonstrates how to use them step by step.
+
+## Export images
+
+- There are two ways to export visual results of your program to
+  images.
+- The first and easier way is to make use of `ti.GUI`.
+- The second way is to call some Taichi functions such as
+  `ti.imwrite`.
+
+### Export images using `ti.GUI.show`
+
+- `ti.GUI.show(filename)` can not only display the GUI canvas on your
+  screen, but also save the image to your specified `filename`.
+- Note that the format of the image is fully determined by the suffix
+  of `filename`.
+- Taichi now supports saving to `png`, `jpg`, and `bmp` formats.
+- We recommend using `png` format. For example:
+
+```python {23}
+import taichi as ti
+import os
+
+ti.init()
+
+pixels = ti.field(ti.u8, shape=(512, 512, 3))
+
+@ti.kernel
+def paint():
+    for i, j, k in pixels:
+        pixels[i, j, k] = ti.random() * 255
+
+iterations = 1000
+gui = ti.GUI("Random pixels", res=512)
+
+# mainloop
+for i in range(iterations):
+    paint()
+    gui.set_image(pixels)
+
+    filename = f'frame_{i:05d}.png'   # create filename with suffix png
+    print(f'Frame {i} is recorded in {filename}')
+    gui.show(filename)  # export and show in GUI
+```
+
+- After running the code above, you will get a series of images in the
+  current folder.
+- To compose these images into a single `mp4` or `gif` file, see
+  [Converting PNGs to video](./cli_utilities.md#converting-pngs-to-video).
+
+### Export images using `ti.imwrite`
+
+To save images without invoking `ti.GUI.show(filename)`, use
+`ti.imwrite(filename)`. For example:
+
+```python {14}
+import taichi as ti
+
+ti.init()
+
+pixels = ti.field(ti.u8, shape=(512, 512, 3))
+
+@ti.kernel
+def set_pixels():
+    for i, j, k in pixels:
+        pixels[i, j, k] = ti.random() * 255
+
+set_pixels()
+filename = f'imwrite_export.png'
+ti.imwrite(pixels.to_numpy(), filename)
+print(f'The image has been saved to {filename}')
+```
+
+- `ti.imwrite` can export Taichi fields (`ti.Matrix.field`,
+  `ti.Vector.field`, `ti.field`) and numpy arrays `np.ndarray`.
+- Same as above `ti.GUI.show(filename)`, the image format (`png`,
+  `jpg` and `bmp`) is also controlled by the suffix of `filename` in
+  `ti.imwrite(filename)`.
+- Meanwhile, the resulted image type (grayscale, RGB, or RGBA) is
+  determined by **the number of channels in the input field**, i.e.,
+  the length of the third dimension (`field.shape[2]`).
+- In other words, a field that has shape `(w, h)` or `(w, h, 1)` will
+  be exported as a grayscale image.
+- If you want to export `RGB` or `RGBA` images instead, the input
+  field should have a shape `(w, h, 3)` or `(w, h, 4)` respectively.
+
+:::note
+All Taichi fields have their own data types, such as `ti.u8` and
+`ti.f32`. Different data types can lead to different behaviors of
+`ti.imwrite`. Please check out [GUI system](./gui.md) for
+more details.
+:::
+
+- Taichi offers other helper functions that read and show images in
+  addition to `ti.imwrite`. They are also demonstrated in
+  [GUI system./gui.md).
+
+## Export videos
+
+:::note
+The video export utilities of Taichi depend on `ffmpeg`. If `ffmpeg` is
+not installed on your machine, please follow the installation
+instructions of `ffmpeg` at the end of this page.
+:::
+
+- `ti.VideoManager` can help you export results in `mp4` or `gif`
+  format. For example,
+
+```python {13,24}
+import taichi as ti
+
+ti.init()
+
+pixels = ti.field(ti.u8, shape=(512, 512, 3))
+
+@ti.kernel
+def paint():
+    for i, j, k in pixels:
+        pixels[i, j, k] = ti.random() * 255
+
+result_dir = "./results"
+video_manager = ti.VideoManager(output_dir=result_dir, framerate=24, automatic_build=False)
+
+for i in range(50):
+    paint()
+
+    pixels_img = pixels.to_numpy()
+    video_manager.write_frame(pixels_img)
+    print(f'\rFrame {i+1}/50 is recorded', end='')
+
+print()
+print('Exporting .mp4 and .gif videos...')
+video_manager.make_video(gif=True, mp4=True)
+print(f'MP4 video is saved to {video_manager.get_output_filename(".mp4")}')
+print(f'GIF video is saved to {video_manager.get_output_filename(".gif")}')
+```
+
+After running the code above, you will find the output videos in the
+`./results/` folder.
+
+## Install ffmpeg
+
+### Install ffmpeg on Windows
+
+- Download the `ffmpeg` archive(named `ffmpeg-2020xxx.zip`) from
+  [ffmpeg](https://ffmpeg.org/download.html).
+- Unzip this archive to a folder, such as `D:/YOUR_FFMPEG_FOLDER`.
+- **Important:** add `D:/YOUR_FFMPEG_FOLDER/bin` to the `PATH`
+  environment variable.
+- Open the Windows `cmd` or `PowerShell` and type the line of code
+  below to test your installation. If `ffmpeg` is set up properly, the
+  version information will be printed.
+
+```bash
+ffmpeg -version
+```
+
+### Install `ffmpeg` on Linux
+
+- Most Linux distribution came with `ffmpeg` natively, so you do not
+  need to read this part if the `ffmpeg` command is already there on
+  your machine.
+- Install `ffmpeg` on Ubuntu
+
+```bash
+sudo apt-get update
+sudo apt-get install ffmpeg
+```
+
+- Install `ffmpeg` on CentOS and RHEL
+
+```bash
+sudo yum install ffmpeg ffmpeg-devel
+```
+
+- Install `ffmpeg` on Arch Linux:
+
+```bash
+pacman -S ffmpeg
+```
+
+- Test your installation using
+
+```bash
+ffmpeg -h
+```
+
+### Install `ffmpeg` on macOS
+
+- `ffmpeg` can be installed on macOS using `homebrew`:
+
+```bash
+brew install ffmpeg
+```
+
+## Export PLY files
+
+- `ti.PLYwriter` can help you export results in the `ply` format.
+  Below is a short example of exporting 10 frames of a moving cube
+  with vertices randomly colored,
+
+```python
+import taichi as ti
+import numpy as np
+
+ti.init(arch=ti.cpu)
+
+num_vertices = 1000
+pos = ti.Vector.field(3, dtype=ti.f32, shape=(10, 10, 10))
+rgba = ti.Vector.field(4, dtype=ti.f32, shape=(10, 10, 10))
+
+
+@ti.kernel
+def place_pos():
+    for i, j, k in pos:
+        pos[i, j, k] = 0.1 * ti.Vector([i, j, k])
+
+
+@ti.kernel
+def move_particles():
+    for i, j, k in pos:
+        pos[i, j, k] += ti.Vector([0.1, 0.1, 0.1])
+
+
+@ti.kernel
+def fill_rgba():
+    for i, j, k in rgba:
+        rgba[i, j, k] = ti.Vector(
+            [ti.random(), ti.random(), ti.random(), ti.random()])
+
+
+place_pos()
+series_prefix = "example.ply"
+for frame in range(10):
+    move_particles()
+    fill_rgba()
+    # now adding each channel only supports passing individual np.array
+    # so converting into np.ndarray, reshape
+    # remember to use a temp var to store so you dont have to convert back
+    np_pos = np.reshape(pos.to_numpy(), (num_vertices, 3))
+    np_rgba = np.reshape(rgba.to_numpy(), (num_vertices, 4))
+    # create a PLYWriter
+    writer = ti.PLYWriter(num_vertices=num_vertices)
+    writer.add_vertex_pos(np_pos[:, 0], np_pos[:, 1], np_pos[:, 2])
+    writer.add_vertex_rgba(
+        np_rgba[:, 0], np_rgba[:, 1], np_rgba[:, 2], np_rgba[:, 3])
+    writer.export_frame_ascii(frame, series_prefix)
+```
+
+After running the code above, you will find the output sequence of `ply`
+files in the current working directory. Next, we will break down the
+usage of `ti.PLYWriter` into 4 steps and show some examples.
+
+- Setup `ti.PLYWriter`
+
+```python
+# num_vertices must be a positive int
+# num_faces is optional, default to 0
+# face_type can be either "tri" or "quad", default to "tri"
+
+# in our previous example, a writer with 1000 vertices and 0 triangle faces is created
+num_vertices = 1000
+writer = ti.PLYWriter(num_vertices=num_vertices)
+
+# in the below example, a writer with 20 vertices and 5 quadrangle faces is created
+writer2 = ti.PLYWriter(num_vertices=20, num_faces=5, face_type="quad")
+```
+
+- Add required channels
+
+```python
+# A 2D grid with quad faces
+#     y
+#     |
+# z---/
+#    x
+#         19---15---11---07---03
+#         |    |    |    |    |
+#         18---14---10---06---02
+#         |    |    |    |    |
+#         17---13---19---05---01
+#         |    |    |    |    |
+#         16---12---08---04---00
+
+writer = ti.PLYWriter(num_vertices=20, num_faces=12, face_type="quad")
+
+# For the vertices, the only required channel is the position,
+# which can be added by passing 3 np.array x, y, z into the following function.
+
+x = np.zeros(20)
+y = np.array(list(np.arange(0, 4))*5)
+z = np.repeat(np.arange(5), 4)
+writer.add_vertex_pos(x, y, z)
+
+# For faces (if any), the only required channel is the list of vertex indices that each face contains.
+indices = np.array([0, 1, 5, 4]*12)+np.repeat(
+    np.array(list(np.arange(0, 3))*4)+4*np.repeat(np.arange(4), 3), 4)
+writer.add_faces(indices)
+```
+
+- Add optional channels
+
+```python
+# Add custome vertex channel, the input should include a key, a supported datatype and, the data np.array
+vdata = np.random.rand(20)
+writer.add_vertex_channel("vdata1", "double", vdata)
+
+# Add custome face channel
+foo_data = np.zeros(12)
+writer.add_face_channel("foo_key", "foo_data_type", foo_data)
+# error! because "foo_data_type" is not a supported datatype. Supported ones are
+# ['char', 'uchar', 'short', 'ushort', 'int', 'uint', 'float', 'double']
+
+# PLYwriter already defines several useful helper functions for common channels
+# Add vertex color, alpha, and rgba
+# using float/double r g b alpha to reprent color, the range should be 0 to 1
+r = np.random.rand(20)
+g = np.random.rand(20)
+b = np.random.rand(20)
+alpha = np.random.rand(20)
+writer.add_vertex_color(r, g, b)
+writer.add_vertex_alpha(alpha)
+# equivilantly
+# add_vertex_rgba(r, g, b, alpha)
+
+# vertex normal
+writer.add_vertex_normal(np.ones(20), np.zeros(20), np.zeros(20))
+
+# vertex index, and piece (group id)
+writer.add_vertex_id()
+writer.add_vertex_piece(np.ones(20))
+
+# Add face index, and piece (group id)
+# Indexing the existing faces in the writer and add this channel to face channels
+writer.add_face_id()
+# Set all the faces is in group 1
+writer.add_face_piece(np.ones(12))
+```
+
+- Export files
+
+```python
+series_prefix = "example.ply"
+series_prefix_ascii = "example_ascii.ply"
+# Export a single file
+# use ascii so you can read the content
+writer.export_ascii(series_prefix_ascii)
+
+# alternatively, use binary for a bit better performance
+# writer.export(series_prefix)
+
+# Export a sequence of files, ie in 10 frames
+for frame in range(10):
+    # write each frame as i.e. "example_000000.ply" in your current running folder
+    writer.export_frame_ascii(frame, series_prefix_ascii)
+    # alternatively, use binary
+    # writer.export_frame(frame, series_prefix)
+
+    # update location/color
+    x = x + 0.1*np.random.rand(20)
+    y = y + 0.1*np.random.rand(20)
+    z = z + 0.1*np.random.rand(20)
+    r = np.random.rand(20)
+    g = np.random.rand(20)
+    b = np.random.rand(20)
+    alpha = np.random.rand(20)
+    # re-fill
+    writer = ti.PLYWriter(num_vertices=20, num_faces=12, face_type="quad")
+    writer.add_vertex_pos(x, y, z)
+    writer.add_faces(indices)
+    writer.add_vertex_channel("vdata1", "double", vdata)
+    writer.add_vertex_color(r, g, b)
+    writer.add_vertex_alpha(alpha)
+    writer.add_vertex_normal(np.ones(20), np.zeros(20), np.zeros(20))
+    writer.add_vertex_id()
+    writer.add_vertex_piece(np.ones(20))
+    writer.add_face_id()
+    writer.add_face_piece(np.ones(12))
+```
+
+### Import `ply` files into Houdini and Blender
+
+Houdini supports importing a series of `ply` files sharing the same
+prefix/post-fix. Our `export_frame` can achieve the requirement for you.
+
+In Houdini, click `File->Import->Geometry` and navigate to the folder
+containing your frame results, which should be collapsed into one single
+entry like `example_$F6.ply (0-9)`. Double-click this entry to finish
+the importing process.
+
+Blender requires an add-on called
+[Stop-motion-OBJ](https://github.com/neverhood311/Stop-motion-OBJ) to
+load the result sequences. [Detailed
+documentation](https://github.com/neverhood311/Stop-motion-OBJ/wiki) is
+provided by the author on how to install and use the add-on. If you're
+using the latest version of Blender (2.80+), download and install the
+[latest
+release](https://github.com/neverhood311/Stop-motion-OBJ/releases/latest)
+of Stop-motion-OBJ. For Blender 2.79 and older, use version `v1.1.1` of
+the add-on.
diff --git a/docs/lang/articles/misc/extension_libraries.md b/docs/lang/articles/misc/extension_libraries.md
new file mode 100644
index 0000000000000..e8acac9d2b1ab
--- /dev/null
+++ b/docs/lang/articles/misc/extension_libraries.md
@@ -0,0 +1,48 @@
+---
+sidebar_position: 9
+---
+
+# Extension libraries
+
+The Taichi programming language offers a minimal and generic built-in
+standard library. Extra domain-specific functionalities are provided via
+**extension libraries**:
+
+## Taichi GLSL
+
+[Taichi GLSL](https://github.com/taichi-dev/taichi_glsl) is an extension
+library of Taichi, aiming at providing useful helper functions
+including:
+
+1.  Handy scalar functions like `clamp`, `smoothstep`, `mix`, `round`.
+2.  GLSL-alike vector functions like `normalize`, `distance`, `reflect`.
+3.  Well-behaved random generators including `randUnit2D`,
+    `randNDRange`.
+4.  Handy vector and matrix initializer: `vec` and `mat`.
+5.  Handy vector component shuffle accessor like `v.xy`.
+
+Click here for [Taichi GLSL
+Documentation](https://taichi-glsl.readthedocs.io).
+
+```bash
+python3 -m pip install taichi_glsl
+```
+
+## Taichi THREE
+
+[Taichi THREE](https://github.com/taichi-dev/taichi_three) is an
+extension library of Taichi to render 3D scenes into nice-looking 2D
+images in real-time (work in progress).
+
+<center>
+
+![image](https://raw.githubusercontent.com/taichi-dev/taichi_three/16d98cb1c1f2ab7a37c9e42260878c047209fafc/assets/monkey.png)
+
+</center>
+
+Click here for [Taichi THREE
+Tutorial](https://github.com/taichi-dev/taichi_three#how-to-play).
+
+```bash
+python3 -m pip install taichi_three
+```
diff --git a/docs/lang/articles/misc/global_settings.md b/docs/lang/articles/misc/global_settings.md
new file mode 100644
index 0000000000000..8decfdaf3edc8
--- /dev/null
+++ b/docs/lang/articles/misc/global_settings.md
@@ -0,0 +1,86 @@
+---
+sidebar_position: 7
+---
+
+# Global settings
+
+## Backends
+
+- To specify which kind of architecture (Arch) to use: `ti.init(arch=ti.cuda)`.
+- To specify the pre-allocated memory size for CUDA:
+  `ti.init(device_memory_GB=0.5)`.
+- To disable the unified memory usage on CUDA:
+  `ti.init(use_unified_memory=False)`.
+- To specify which GPU to use for CUDA:
+  `export CUDA_VISIBLE_DEVICES=[gpuid]`.
+- To disable a backend (`CUDA`, `METAL`, `OPENGL`) on start up, e.g. CUDA:
+  `export TI_ENABLE_CUDA=0`.
+
+## Compilation
+
+- Disable advanced optimization to save compile time & possible
+  errors: `ti.init(advanced_optimization=False)`.
+- Disable fast math to prevent possible undefined math behavior:
+  `ti.init(fast_math=False)`.
+- To print preprocessed Python code:
+  `ti.init(print_preprocessed=True)`.
+- To show pretty Taichi-scope stack traceback:
+  `ti.init(excepthook=True)`.
+- To print intermediate IR generated: `ti.init(print_ir=True)`.
+
+## Runtime
+
+- Restart the entire Taichi system (destroy all fields and kernels):
+  `ti.reset()`.
+- To start program in debug mode: `ti.init(debug=True)` or
+  `ti debug your_script.py`.
+- To disable importing torch on start up: `export TI_ENABLE_TORCH=0`.
+
+## Logging
+
+- Show more detailed log to level TRACE: `ti.init(log_level=ti.TRACE)`
+  or `ti.set_logging_level(ti.TRACE)`.
+- Eliminate verbose outputs: `ti.init(verbose=False)`.
+
+## Develop
+
+- To trigger GDB when Taichi crashes: `ti.init(gdb_trigger=True)`.
+- Cache compiled runtime bitcode in **dev mode** to save start up
+  time: `export TI_CACHE_RUNTIME_BITCODE=1`.
+- To specify how many threads to run test: `export TI_TEST_THREADS=4`
+  or `ti test -t4`.
+
+## Specifying `ti.init` arguments from environment variables
+
+Arguments for `ti.init` may also be specified from environment
+variables. For example:
+
+- `ti.init(arch=ti.cuda)` is equivalent to `export TI_ARCH=cuda`.
+- `ti.init(log_level=ti.TRACE)` is equivalent to
+  `export TI_LOG_LEVEL=trace`.
+- `ti.init(debug=True)` is equivalent to `export TI_DEBUG=1`.
+- `ti.init(use_unified_memory=False)` is equivalent to
+  `export TI_USE_UNIFIED_MEMORY=0`.
+
+If both `ti.init` argument and the corresponding environment variable
+are specified, then the one in the environment variable will
+**override** the one in the argument, e.g.:
+
+- if `ti.init(arch=ti.cuda)` and `export TI_ARCH=opengl` are specified
+  at the same time, then Taichi will choose `ti.opengl` as backend.
+- if `ti.init(debug=True)` and `export TI_DEBUG=0` are specified at
+  the same time, then Taichi will disable debug mode.
+
+:::note
+
+If `ti.init` is called twice, then the configuation in first invocation
+will be completely discarded, e.g.:
+
+```python {1,3}
+ti.init(debug=True)
+print(ti.cfg.debug)  # True
+ti.init()
+print(ti.cfg.debug)  # False
+```
+
+:::
diff --git a/docs/lang/articles/misc/gui.md b/docs/lang/articles/misc/gui.md
new file mode 100644
index 0000000000000..9dc6a1471897f
--- /dev/null
+++ b/docs/lang/articles/misc/gui.md
@@ -0,0 +1,273 @@
+---
+sidebar_position: 1
+
+---
+
+# GUI system
+
+Taichi has a built-in GUI system to help users visualize results.
+
+## Create a window
+
+`ti.GUI(name, res)` creates a window. If `res` is scalar, then width will be equal to height.
+
+The following codes show how to create a window of resolution `640x360`:
+
+```python
+gui = ti.GUI('Window Title', (640, 360))
+```
+
+:::note
+
+If you are running Taichi on a machine without a GUI environment, consider setting `show_gui` to `False`:
+
+```python
+gui = ti.GUI('Window Title', (640, 360), show_gui=False)
+
+while gui.running:
+    ...
+    gui.show(f'{gui.frame:06d}.png')  # save a series of screenshot
+```
+
+:::
+
+## Display a window
+
+`gui.show(filename)` helps display a window. If `filename` is specified, a screenshot will be saved to the file specified by the name. For example, the following saves frames of the window to `.png`s:
+
+    for frame in range(10000):
+        render(img)
+        gui.set_image(img)
+        gui.show(f'{frame:06d}.png')
+
+
+
+## Paint on a window
+
+`gui.set_image(pixels)` sets an image to display on the window.
+
+The image pixels are set from the values of `img[i, j]`, where `i` indicates the horizontal coordinates (from left to right) and `j` the vertical coordinates (from bottom to top).
+
+If the window size is `(x, y)`, then `img` must be one of:
+
+- `ti.field(shape=(x, y))`, a gray-scale image
+
+- `ti.field(shape=(x, y, 3))`, where `3` is for `(r, g, b)` channels
+
+- `ti.field(shape=(x, y, 2))`, where `2` is for `(r, g)` channels
+
+- `ti.Vector.field(3, shape=(x, y))` `(r, g, b)` channels on each
+  component (see [vector](../../api/vector.md#vector-fields) for details)
+
+- `ti.Vector.field(2, shape=(x, y))` `(r, g)` channels on each component
+
+- `np.ndarray(shape=(x, y))`
+
+- `np.ndarray(shape=(x, y, 3))`
+
+- `np.ndarray(shape=(x, y, 2))`
+
+The data type of `img` must be one of:
+
+- `uint8`, range `[0, 255]`
+
+- `uint16`, range `[0, 65535]`
+
+- `uint32`, range `[0, 4294967295]`
+
+- `float32`, range `[0, 1]`
+
+- `float64`, range `[0, 1]`
+
+
+
+## Convert RGB to Hex
+
+`ti.rgb_to_hex(rgb)` can convert a (R, G, B) tuple of floats into a single integer value, e.g.,
+
+```python
+rgb = (0.4, 0.8, 1.0)
+hex = ti.rgb_to_hex(rgb)  # 0x66ccff
+
+rgb = np.array([[0.4, 0.8, 1.0], [0.0, 0.5, 1.0]])
+hex = ti.rgb_to_hex(rgb)  # np.array([0x66ccff, 0x007fff])
+```
+
+The return values can be used in GUI drawing APIs.
+
+
+
+## Event processing
+
+Every event have a key and type.
+
+_Event type_ is the type of event, for now, there are just three type of event:
+
+    ti.GUI.RELEASE  # key up or mouse button up
+    ti.GUI.PRESS    # key down or mouse button down
+    ti.GUI.MOTION   # mouse motion or mouse wheel
+
+_Event key_ is the key that you pressed on keyboard or mouse, can be one of:
+
+    # for ti.GUI.PRESS and ti.GUI.RELEASE event:
+    ti.GUI.ESCAPE  # Esc
+    ti.GUI.SHIFT   # Shift
+    ti.GUI.LEFT    # Left Arrow
+    'a'            # we use lowercase for alphabet
+    'b'
+    ...
+    ti.GUI.LMB     # Left Mouse Button
+    ti.GUI.RMB     # Right Mouse Button
+
+    # for ti.GUI.MOTION event:
+    ti.GUI.MOVE    # Mouse Moved
+    ti.GUI.WHEEL   # Mouse Wheel Scrolling
+
+A _event filter_ is a list combined of _key_, _type_ and _(type, key)_ tuple, e.g.:
+
+```python
+# if ESC pressed or released:
+gui.get_event(ti.GUI.ESCAPE)
+
+# if any key is pressed:
+gui.get_event(ti.GUI.PRESS)
+
+# if ESC pressed or SPACE released:
+gui.get_event((ti.GUI.PRESS, ti.GUI.ESCAPE), (ti.GUI.RELEASE, ti.GUI.SPACE))
+```
+
+`gui.running` can help check the state of the window. `ti.GUI.EXIT` occurs when you click on the close (X) button of a window.
+ `gui.running` will obtain `False` when the GUI is being closed.
+
+For example, loop until the close button is clicked:
+
+    while gui.running:
+        render()
+        gui.set_image(pixels)
+        gui.show()
+
+You can also close the window by manually setting `gui.running` to`False`:
+
+    while gui.running:
+        if gui.get_event(ti.GUI.ESCAPE):
+            gui.running = False
+
+        render()
+        gui.set_image(pixels)
+        gui.show()
+
+`gui.get_event(a, ...)` tries to pop an event from the queue, and stores it into `gui.event`.
+
+For example:
+
+    if gui.get_event():
+        print('Got event, key =', gui.event.key)
+
+For example, loop until ESC is pressed:
+
+    gui = ti.GUI('Title', (640, 480))
+    while not gui.get_event(ti.GUI.ESCAPE):
+        gui.set_image(img)
+        gui.show()
+
+`gui.get_events(a, ...)` is basically the same as `gui.get_event`, except that it returns a generator of events instead of storing into `gui.event`:
+
+    for e in gui.get_events():
+        if e.key == ti.GUI.ESCAPE:
+            exit()
+        elif e.key == ti.GUI.SPACE:
+            do_something()
+        elif e.key in ['a', ti.GUI.LEFT]:
+            ...
+
+`gui.is_pressed(key, ...)` can detect the keys you pressed. It must be used together with `gui.get_event`, or it won't be updated! For
+example:
+
+    while True:
+        gui.get_event()  # must be called before is_pressed
+        if gui.is_pressed('a', ti.GUI.LEFT):
+            print('Go left!')
+        elif gui.is_pressed('d', ti.GUI.RIGHT):
+            print('Go right!')
+
+`gui.get_cursor_pos()` can return current cursor position within the window. For example:
+
+    mouse_x, mouse_y = gui.get_cursor_pos()
+
+`gui.fps_limit` sets the FPS limit for a window. For example, to cap FPS at 24, simply use `gui.fps_limit = 24`. This helps reduce the overload on your hardware especially when you're using OpenGL on your integrated GPU which could make desktop slow to response.
+
+
+
+## GUI Widgets
+
+Sometimes it's more intuitive to use widgets like slider or button to control the program variables instead of using chaotic keyboard bindings. Taichi GUI provides a set of widgets for that reason:
+
+For example:
+
+    radius = gui.slider('Radius', 1, 50)
+
+    while gui.running:
+        print('The radius now is', radius.value)
+        ...
+        radius.value += 0.01
+        ...
+        gui.show()
+
+
+
+## Image I/O
+
+`ti.imwrite(img, filename)` can export a `np.ndarray` or Taichi field (`ti.Matrix.field`,  `ti.Vector.field`, or `ti.field`) to a specified location `filename`.
+
+Same as `ti.GUI.show(filename)`, the format of the exported image is determined by **the suffix of** `filename` as well. Now `ti.imwrite` supports exporting images to `png`, `img` and `jpg` and we recommend using `png`.
+
+Please make sure that the input image has **a valid shape**. If you want to export a grayscale image, the input shape of field should be `(height, weight)` or `(height, weight, 1)`. For example:
+
+```python
+import taichi as ti
+
+ti.init()
+
+shape = (512, 512)
+type = ti.u8
+pixels = ti.field(dtype=type, shape=shape)
+
+@ti.kernel
+def draw():
+    for i, j in pixels:
+        pixels[i, j] = ti.random() * 255    # integars between [0, 255] for ti.u8
+
+draw()
+
+ti.imwrite(pixels, f"export_u8.png")
+```
+
+Besides, for RGB or RGBA images, `ti.imwrite` needs to receive a field which has shape `(height, width, 3)` and `(height, width, 4)` individually.
+
+Generally the value of the pixels on each channel of a `png` image is an integer in \[0, 255\]. For this reason, `ti.imwrite` will **cast fields** which has different data types all **into integers between \[0, 255\]**. As a result, `ti.imwrite` has the following requirements for different data types of input fields:
+
+- For float-type (`ti.f16`, `ti.f32`, etc) input fields, **the value of each pixel should be float between \[0.0, 1.0\]**. Otherwise `ti.imwrite` will first clip them into \[0.0, 1.0\]. Then they are multiplied by 256 and casted to integers ranging from \[0, 255\].
+- For int-type (`ti.u8`, `ti.u16`, etc) input fields, **the value of each pixel can be any valid integer in its own bounds**. These integers in this field will be scaled to \[0, 255\] by being divided over the upper bound of its basic type accordingly.
+
+Here is another example:
+
+```python
+import taichi as ti
+
+ti.init()
+
+shape = (512, 512)
+channels = 3
+type = ti.f32
+pixels = ti.Matrix.field(channels, dtype=type, shape=shape)
+
+@ti.kernel
+def draw():
+    for i, j in pixels:
+        for k in ti.static(range(channels)):
+            pixels[i, j][k] = ti.random()   # floats between [0, 1] for ti.f32
+
+draw()
+
+ti.imwrite(pixels, f"export_f32.png")
+```
diff --git a/docs/lang/articles/misc/install.md b/docs/lang/articles/misc/install.md
new file mode 100644
index 0000000000000..45f0aefd3aedc
--- /dev/null
+++ b/docs/lang/articles/misc/install.md
@@ -0,0 +1,129 @@
+---
+sidebar_position: 0
+---
+
+# Installation Troubleshooting
+
+### Linux issues
+
+- If Taichi crashes and reports `libtinfo.so.5 not found`:
+
+  - On Ubuntu, execute `sudo apt install libtinfo-dev`.
+
+  - On Arch Linux, first edit `/etc/pacman.conf`, and append these
+    lines:
+
+    ```
+    [archlinuxcn]
+    Server = https://mirrors.tuna.tsinghua.edu.cn/archlinuxcn/$arch
+    ```
+
+    Then execute `sudo pacman -Syy ncurses5-compat-libs`.
+
+- If Taichi crashes and reports
+  `` /usr/lib/libstdc++.so.6: version `CXXABI_1.3.11' not found ``:
+
+  You might be using Ubuntu 16.04. Please try the solution in [this
+  thread](https://github.com/tensorflow/serving/issues/819#issuecomment-377776784):
+
+  ```bash
+  sudo add-apt-repository ppa:ubuntu-toolchain-r/test -y
+  sudo apt-get update
+  sudo apt-get install libstdc++6
+  ```
+
+
+### Windows issues
+
+- If Taichi crashes and reports `ImportError` on Windows. Please
+  consider installing [Microsoft Visual C++
+  Redistributable](https://aka.ms/vs/16/release/vc_redist.x64.exe).
+
+### Python issues
+
+- If `pip` could not find a satisfying package,
+  i.e.,
+
+  ```
+  ERROR: Could not find a version that satisfies the requirement taichi (from versions: none)
+  ERROR: No matching distribution found for taichi
+  ```
+
+  - Make sure you're using Python version 3.6/3.7/3.8:
+
+    ```bash
+    python3 -c "print(__import__('sys').version[:3])"
+    # 3.6, 3.7 or 3.8
+    ```
+
+  - Make sure your Python executable is 64-bit:
+
+    ```bash
+    python3 -c "print(__import__('platform').architecture()[0])"
+    # 64bit
+    ```
+
+### CUDA issues
+
+- If Taichi crashes with the following errors:
+
+  ```
+  [Taichi] mode=release
+  [Taichi] version 0.6.0, supported archs: [cpu, cuda, opengl], commit 14094f25, python 3.8.2
+  [W 05/14/20 10:46:49.549] [cuda_driver.h:call_with_warning@60] CUDA Error CUDA_ERROR_INVALID_DEVICE: invalid device ordinal while calling mem_advise (cuMemAdvise)
+  [E 05/14/20 10:46:49.911] Received signal 7 (Bus error)
+  ```
+
+  This might be because that your NVIDIA GPU is pre-Pascal and it
+  has limited support for [Unified
+  Memory](https://www.nextplatform.com/2019/01/24/unified-memory-the-final-piece-of-the-gpu-programming-puzzle/).
+
+  - **Possible solution**: add `export TI_USE_UNIFIED_MEMORY=0` to
+    your `~/.bashrc`. This disables unified memory usage in the CUDA
+    backend.
+
+- If you find other CUDA problems:
+
+  - **Possible solution**: add `export TI_ENABLE_CUDA=0` to your
+    `~/.bashrc`. This disables the CUDA backend completely and
+    Taichi will fall back on other GPU backends such as OpenGL.
+
+### OpenGL issues
+
+- If Taichi crashes with a stack backtrace containing a line of
+  `glfwCreateWindow` (see
+  [\#958](https://github.com/taichi-dev/taichi/issues/958)):
+
+  ```{9-11}
+  [Taichi] mode=release
+  [E 05/12/20 18.25:00.129] Received signal 11 (Segmentation Fault)
+  ***********************************
+  * Taichi Compiler Stack Traceback *
+  ***********************************
+
+  ... (many lines, omitted)
+
+  /lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: _glfwPlatformCreateWindow
+  /lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: glfwCreateWindow
+  /lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::opengl::initialize_opengl(bool)
+
+  ... (many lines, omitted)
+  ```
+
+  it is likely because you are running Taichi on a (virtual) machine
+  with an old OpenGL API. Taichi requires OpenGL 4.3+ to work.
+
+  - **Possible solution**: add `export TI_ENABLE_OPENGL=0` to your
+    `~/.bashrc` even if you initialize Taichi with other backends
+    than OpenGL. This disables the OpenGL backend detection to avoid
+    incompatibilities.
+
+
+
+### Other issues
+
+- If none of those above address your problem, please report this by
+  [opening an
+  issue](https://github.com/taichi-dev/taichi/issues/new?labels=potential+bug&template=bug_report.md)
+  on GitHub. This would help us improve user experiences and
+  compatibility, many thanks!
diff --git a/docs/lang/articles/misc/internal.md b/docs/lang/articles/misc/internal.md
new file mode 100644
index 0000000000000..610f93292347a
--- /dev/null
+++ b/docs/lang/articles/misc/internal.md
@@ -0,0 +1,347 @@
+---
+sidebar_position: 3
+---
+
+# Internal designs
+
+## Intermediate representation (IR)
+
+Taichi's computation IR is designed to be
+- Static-single assignment;
+- Hierarchical, instead of LLVM-style control-flow graph + basic blocks;
+- Differentiable;
+- Statically and strongly typed.
+
+For example, a simple Taichi kernel
+```python {4-8} title=show_ir.py
+import taichi as ti
+ti.init(print_ir=True)
+
+@ti.kernel
+def foo():
+    for i in range(10):
+        if i < 4:
+            print(i)
+
+foo()
+```
+
+may be compiled into
+
+```
+kernel {
+  $0 = offloaded range_for(0, 10) grid_dim=0 block_dim=32
+  body {
+    <i32> $1 = loop $0 index 0
+    <i32> $2 = const [4]
+    <i32> $3 = cmp_lt $1 $2
+    <i32> $4 = const [1]
+    <i32> $5 = bit_and $3 $4
+    $6 : if $5 {
+      print $1, "\n"
+    }
+  }
+}
+
+```
+
+:::note
+Use `ti.init(print_ir=True)` to print IR of all instantiated kernels.
+:::
+
+:::note
+See [Life of a Taichi kernel](../contribution/compilation.md) for more details about
+the JIT compilation system of Taichi.
+:::
+
+## Data structure organization
+
+The internal organization of Taichi's data structure is defined using the **Structural Node**
+("SNode", /snōd/) tree system. The SNode system might be confusing for new developers:
+it is important to distinguish three concepts: SNode **containers**,
+SNode **cells**, and SNode **components**.
+
+- A SNode **container** can have multiple SNode **cells**. The numbers of
+  **cells** are recommended to be powers of two.
+
+  - For example, `S = ti.root.dense(ti.i, 128)` creates an SNode `S`, and each `S` container has `128` `S` cells.
+- A SNode **cell** can have multiple SNode **components**.
+
+  - For example, `P = S.dense(ti.i, 4); Q = S.dense(ti.i, 4)` inserts two components (one `P` container and one `Q` container) into each `S` cell.
+- Note that each SNode **component** is a SNode **container** of a lower-level SNode.
+
+A hierarchical data structure in Taichi, dense or sparse, is essentially a tree with interleaved container and cell levels.
+Note that containers of `place` SNodes do not have cells. Instead, they
+directly contain numerical values.
+
+Consider the following example:
+
+```python
+# misc/listgen_demo.py
+
+x = ti.field(ti.i32)
+y = ti.field(ti.i32)
+z = ti.field(ti.i32)
+
+S0 = ti.root
+S1 = S0.pointer(ti.i, 4)
+
+S2 = S1.dense(ti.i, 2)
+S2.place(x, y) # S3: x; S4: y
+
+S5 = S1.dense(ti.i, 2)
+S5.place(z) # S6: z
+```
+
+- The whole data structure is an `S0root` **container**, containing
+  - 1x `S0root` **cell**, which has only one **component**, which
+    is
+    - An `S1pointer` **container**, containing
+      - 4x `S1pointer` **cells**, each with two **components**,
+        which are
+        - An `S2dense` **container**, containing
+          - 2x `S2dense` **cells**, each with two
+            **components**, which are
+            - An `S3place_x` container which directly
+              contains a `x: ti.i32` value
+            - An `S4place_y` container which directly
+              contains a `y: ti.i32` value
+        - An `S5dense` **container**, containing
+          - 2x `S5dense` **cells**, each with one
+            **component**, which is
+            - An `S6place` container which directly
+              contains a `z: ti.i32` value
+
+The following figure shows the hierarchy of the data structure. The
+numbers are `indices` of the containers and cells.
+
+![image](https://raw.githubusercontent.com/taichi-dev/public_files/fa03e63ca4e161318c8aa9a5db7f4a825604df88/taichi/data_structure_organization.png)
+
+Note that the `S0root` container and cell do not have an `index`.
+
+In summary, we will have the following containers:
+
+- 1x `S0root` container
+- 1x `S1pointer` container
+- 4x `S2dense` containers
+- 4x `S5dense` containers
+- 8x `S3place_x` containers, each directly containing an `i32` value
+- 8x `S4place_y` containers, each directly containing an `i32` value
+- 8x `S6place_z` containers, each directly containing an `i32` value
+
+... and the following cells:
+
+- 1x `S0root` cell
+- 4x `S1pointer` cells
+- 8x `S2dense` cells
+- 8x `S5dense` cells
+
+Again, note that `S3place_x`, `S4place_y` and `S6place_z` containers do **not**
+have corresponding cells.
+
+In struct compilers of supported backends, each SNode has two types: `container` type and
+`cell` type. Again, **components** of a higher level SNode **cell** are
+**containers** of a lower level SNode.
+
+Note that **cells** are never exposed to end-users.
+
+**List generation** generates lists of SNode **containers** (instead of
+SNode **cells**).
+
+:::note
+We are on our way to remove usages of **children**, **instances**, and
+**elements** in Taichi. These are very ambiguous terms and should be replaced with standardized terms: **container**, **cell**, and **component**.
+:::
+
+## List generation
+
+Struct-fors in Taichi loop over all active elements of a (sparse) data
+structure **in parallel**. Evenly distributing work onto processor cores
+is challenging on sparse data structures: naively splitting an irregular
+tree into pieces can easily lead to partitions with drastically
+different numbers of leaf elements.
+
+Our strategy is to generate lists of active **SNode containers**, layer by
+layer. The list generation computation happens on the same device as
+normal computation kernels, depending on the `arch` argument when the
+user calls `ti.init()`.
+
+List generations flatten the data structure leaf elements into a 1D
+list, circumventing the irregularity of incomplete trees. Then we
+can simply invoke a regular **parallel for** over the 1D list.
+
+For example,
+
+```python {14-17}
+# misc/listgen_demo.py
+
+import taichi as ti
+
+ti.init(print_ir=True)
+
+x = ti.field(ti.i32)
+
+S0 = ti.root
+S1 = S0.dense(ti.i, 4)
+S2 = S1.bitmasked(ti.i, 4)
+S2.place(x)
+
+@ti.kernel
+def func():
+    for i in x:
+        print(i)
+
+func()
+```
+
+gives you the following IR:
+
+```
+$0 = offloaded clear_list S1dense
+$1 = offloaded listgen S0root->S1dense
+$2 = offloaded clear_list S2bitmasked
+$3 = offloaded listgen S1dense->S2bitmasked
+$4 = offloaded struct_for(S2bitmasked) block_dim=0 {
+  <i32 x1> $5 = loop index 0
+  print i, $5
+}
+```
+
+Note that `func` leads to two list generations:
+
+- (Tasks `$0` and `$1`) based on the list of the (only) `S0root` container,
+  generate the list of the (only) `S1dense` container;
+- (Tasks `$2` and `$3`) based on the list of `S1dense` containers,
+  generate the list of `S2bitmasked` containers.
+
+The list of `S0root` SNode always has exactly one container, so we
+never clear or re-generate this list. Although the list of `S1dense` always
+has only one container, we still regenerate the list for uniformity.
+The list of `S2bitmasked` has 4 containers.
+
+:::note
+The list of `place` (leaf) nodes (e.g., `S3` in this example) is never
+generated. Instead, we simply loop over the list of their parent nodes,
+and for each parent node we enumerate the `place` nodes on-the-fly
+(without actually generating a list).
+
+The motivation for this design is to amortize list generation overhead.
+Generating one list element per leaf node (`place` SNode) element is too
+expensive, likely much more expensive than the essential computation
+happening on the leaf element. Therefore we only generate their parent
+element list, so that the list generation cost is amortized over
+multiple child elements of a second-to-last-level SNode element.
+
+In the example above, although we have 16 instances of `x`, we only
+generate a list of 4 x `S2bitmasked` nodes (and 1 x `S1dense` node).
+:::
+
+## Statistics
+
+In some cases, it is helpful to gather certain quantitative information
+about internal events during Taichi program execution. The `Statistics`
+class is designed for this purpose.
+
+Usage:
+
+```cpp
+#include "taichi/util/statistics.h"
+
+// add 1.0 to counter "codegen_offloaded_tasks"
+taichi::stat.add("codegen_offloaded_tasks");
+
+// add the number of statements in "ir" to counter "codegen_statements"
+taichi::stat.add("codegen_statements", irpass::analysis::count_statements(this->ir));
+```
+
+Note the keys are `std::string` and values are `double`.
+
+To print out all statistics in Python:
+
+```python
+ti.core.print_stat()
+```
+
+## Why Python frontend
+
+Embedding Taichi in `python` has the following advantages:
+
+- Easy to learn. Taichi has a very similar syntax to Python.
+- Easy to run. No ahead-of-time compilation is needed.
+- This design allows people to reuse existing python infrastructure:
+  - IDEs. A python IDE mostly works for Taichi with syntax
+    highlighting, syntax checking, and autocomplete.
+  - Package manager (pip). A developed Taichi application and be
+    easily submitted to `PyPI` and others can easily set it up with
+    `pip`.
+  - Existing packages. Interacting with other python components
+    (e.g. `matplotlib` and `numpy`) is just trivial.
+- The built-in AST manipulation tools in `python` allow us to flexibly
+manipulate and analyze Python ASTs,
+as long as the kernel body function is parse-able by the Python parser.
+
+However, this design has drawbacks too:
+
+- Taichi kernels must be parse-able by Python parsers. This means Taichi
+  syntax cannot go beyond Python syntax.
+  - For example, indexing is always needed when accessing elements
+    in Taichi fields, even if the fields is 0D. Use `x[None] = 123`
+    to set the value in `x` if `x` is 0D. This is because `x = 123`
+    will set `x` itself (instead of its containing value) to be the
+    constant `123` in Python syntax. For code consistency in Python-
+    and Taichi-scope, we have to use the more verbose `x[None] = 123` syntax.
+- Python has relatively low performance. This can cause a performance
+  issue when initializing large Taichi fields with pure python
+  scripts. A Taichi kernel should be used to initialize huge fields.
+
+## Virtual indices v.s. physical indices
+
+In Taichi, _virtual indices_ are used to locate elements in fields, and
+_physical indices_ are used to specify data layouts in memory.
+
+For example,
+
+- In `a[i, j, k]`, `i`, `j`, and `k` are **virtual** indices.
+- In `for i, j in x:`, `i` and `j` are **virtual** indices.
+- `ti.i, ti.j, ti.k, ti.l, ...` are **physical** indices.
+- In struct-for statements, `LoopIndexStmt::index` is a **physical**
+  index.
+
+The mapping between virtual indices and physical indices for each
+`SNode` is stored in `SNode::physical_index_position`. I.e.,
+`physical_index_position[i]` answers the question: **which physical
+index does the i-th virtual index** correspond to?
+
+Each `SNode` can have a different virtual-to-physical mapping.
+`physical_index_position[i] == -1` means the `i`-th virtual index does
+not corrspond to any physical index in this `SNode`.
+
+`SNode` s in handy dense fields (i.e.,
+`a = ti.field(ti.i32, shape=(128, 256, 512))`) have **trivial**
+virtual-to-physical mapping, e.g. `physical_index_position[i] = i`.
+
+However, more complex data layouts, such as column-major 2D fields can
+lead to `SNodes` with `physical_index_position[0] = 1` and
+`physical_index_position[1] = 0`.
+
+```python
+a = ti.field(ti.f32, shape=(128, 32, 8))
+
+b = ti.field(ti.f32)
+ti.root.dense(ti.j, 32).dense(ti.i, 16).place(b)
+
+ti.get_runtime().materialize()
+
+mapping_a = a.snode().physical_index_position()
+
+assert mapping_a == {0: 0, 1: 1, 2: 2}
+
+mapping_b = b.snode().physical_index_position()
+
+assert mapping_b == {0: 1, 1: 0}
+# Note that b is column-major:
+# the virtual first index exposed to the user comes second in memory layout.
+```
+
+Taichi supports up to 8 (`constexpr int taichi_max_num_indices = 8`)
+virtual indices and physical indices.
diff --git a/docs/lang/articles/misc/profiler.md b/docs/lang/articles/misc/profiler.md
new file mode 100644
index 0000000000000..e8a47cbae08f6
--- /dev/null
+++ b/docs/lang/articles/misc/profiler.md
@@ -0,0 +1,81 @@
+---
+sidebar_position: 4
+---
+
+# Profiler
+
+Taichi's profiler can help you analyze the run-time cost of your
+program. There are two profiling systems in Taichi: `KernelProfiler` and
+`ScopedProfiler`.
+
+`KernelProfiler` is used to analyze the performance of user kernels.
+
+While `ScopedProfiler` is used by Taichi developers to analyze the
+performance of the compiler itself.
+
+## KernelProfiler
+
+1.  `KernelProfiler` records the costs of Taichi kernels on devices. To
+    enable this profiler, set `kernel_profiler=True` in `ti.init`.
+2.  Call `ti.kernel_profiler_print()` to show the kernel profiling
+    result. For example:
+
+```python {3,13}
+import taichi as ti
+
+ti.init(ti.cpu, kernel_profiler=True)
+var = ti.field(ti.f32, shape=1)
+
+
+@ti.kernel
+def compute():
+    var[0] = 1.0
+
+
+compute()
+ti.kernel_profiler_print()
+```
+
+The outputs would be:
+
+```
+[ 22.73%] jit_evaluator_0_kernel_0_serial             min   0.001 ms   avg   0.001 ms   max   0.001 ms   total   0.000 s [      1x]
+[  0.00%] jit_evaluator_1_kernel_1_serial             min   0.000 ms   avg   0.000 ms   max   0.000 ms   total   0.000 s [      1x]
+[ 77.27%] compute_c4_0_kernel_2_serial                min   0.004 ms   avg   0.004 ms   max   0.004 ms   total   0.000 s [      1x]
+```
+
+:::note
+Currently the result of `KernelProfiler` could be incorrect on OpenGL
+backend due to its lack of support for `ti.sync()`.
+:::
+
+## ScopedProfiler
+
+1.  `ScopedProfiler` measures time spent on the **host tasks**
+    hierarchically.
+2.  This profiler is automatically on. To show its results, call
+    `ti.print_profile_info()`. For example:
+
+```python
+import taichi as ti
+
+ti.init(arch=ti.cpu)
+var = ti.field(ti.f32, shape=1)
+
+
+@ti.kernel
+def compute():
+    var[0] = 1.0
+    print("Setting var[0] =", var[0])
+
+
+compute()
+ti.print_profile_info()
+```
+
+`ti.print_profile_info()` prints profiling results in a hierarchical format.
+
+:::note
+`ScopedProfiler` is a C++ class in the core of Taichi. It is not exposed
+to Python users.
+:::