atomic_add return value #332

yuanming-hu · 2019-12-31T01:07:32Z

No description provided.

k-ye · 2020-01-12T12:42:10Z

Hey, I'd like to contribute to this if possible. I got a prototype in this commit, and now am adding new tests/fixing broken tests...

yuanming-hu · 2020-01-12T16:08:03Z

Nice! I think your implementation generally makes sense to me. I'm at the airport now and will take a close look after I settle down. Thank you!

k-ye · 2020-01-12T17:23:27Z

Sure no problem! I've listed some of my decisions and questions below for you to take a look. The code is not ready yet, so don't worry about that for now.. I'll send a PR later :)

Need to set AtomicOpStmt::value in codegen_llvm/codegen_llvm_ptx to propagate the return value to subsequent stmts. This leads to an issue that, once stmt->width is greater than 1, everything breaks. I noted that type_check currently has a check for this, so I assume it's a known issue?
Need to set AtomicOpStmt::ret_type, otherwise taichi inserts a cast of type unknown. And this will crash due to this line.

BTW, it seems that the current setting of SNodeOpStmt::element_type could be wrong? It is set to the type of SNode. But given that this stmt implements append and length, shouldn't the type always be i32?
In order to make atomic_add() return value, we need to convert it from a statement to an expression. As a result, I added a FrontendAtomicExpression, which flattens to a FrontendAtomicStmt. I wonder if this is the right approach, since currently there is no Frontend* exprs -- they are all stmts.
Following the above point. It turned out that rvalue exprs are eliminated if it's not assigned. This is undesirable since atomic_add() has side-effect. I followed how ti.append() was implemented and wrapped taichi_lang_core.expr_atomic_add() inside ti.expr_init(), which solved the problem. I still don't quite understand this, maybe because ti.expr_init() created a python native Expr, and python forces this to be evaluated? The printed IR showed that a local variable is created to hold the result of the atomic add, something like this:
```
$a = alloca
$b = atomic add(...)
$c : store [$a <- $b]
```
demote_atomic can transform

$d = atomic add($a, $b)

to

$c = load $a
$d = add $c $b
$e : store [$a <- $c]

Combined with the above point, this actually caused a problem:

$c = alloca
$d = atomic add($a, $b)
$e : local store [$c <- $d]

gets demoted to

$c = alloca
$d' = load $a          <-- added by demote_atomic
$e' = add $d' $b
$f : store [$a <- $e']  <-- added by demote_atomic
$g: store [$c <- $f]   <-- the original stmt to store atomic_add result to tmpvar $c. This will crash

The program crashed at stmt $g, because $f is a store stmt whose value is nullptr. Obviously we can set store stmt's value, which is similar to C++'s operator= and effectively makes store an expr with side effect. One subtle thing is that if i wanted to following C++, store's value should be $a, since operator= returns the reference of the lvalue. But LLVM doesn't like me to do that (i.e. setting store stmt value to an AllocaInst eventually led to LLVM assert errors...)

I ended up replacing all $d = atomic add($a, $b) with $e' (the added value). This makes no change to store stmt, so i think it's a bit cleaner.

Finally, if I make __iadd__() an expression, it then broke ALMOST ALL THE EXAMPLES + test_mpm88.py. As a workaround, I had to do the following:
- For ti.atomic_add(a, b) and a.atomic_add(b), these are expressions
- For a += b: this is a stmt. (Seems like having a.atomic_add(b) as a stmt is more consistent?)

It will be great to know what's wrong, but i need a stop...

yuanming-hu · 2020-01-13T01:37:43Z

Need to set AtomicOpStmt::value in codegen_llvm/codegen_llvm_ptx to propagate the return value to subsequent stmts. This leads to an issue that, once stmt->width is greater than 1, everything breaks. I noted that type_check currently has a check for this, so I assume it's a known issue?

Taichi used to support loop vectorization, however, after switching to LLVM the auto-vectorization functionality is broken. I would suggest that we implement the scalar case first, since

Making scalar instructions work is already a lot of work. We'd better focus on it right now.
Most performance-aware people use GPUs. So we can postpone auto-vectorization on CPUs.

Need to set AtomicOpStmt::ret_type, otherwise taichi inserts a cast of type unknown. And this will crash due to this line.

Yes, this needs to be set to the same type as stmt->dest->ret_type or stmt->val->ret_type.

BTW, it seems that the current setting of SNodeOpStmt::element_type could be wrong? It is set to the type of SNode. But given that this stmt implements append and length, shouldn't the type always be i32?

Nice catch. I believe this is a bug. My bad :-(

In order to make atomic_add() return value, we need to convert it from a statement to an expression. As a result, I added a FrontendAtomicExpression, which flattens to a FrontendAtomicStmt. I wonder if this is the right approach, since currently there is no Frontend* exprs -- they are all stmts.

This is mostly the right approach! I would simply use the name AtomicExpression instead of FrontentAtomicExpression, since expressions only live at the frontend, before AST lowering. FrontendAtomicStmt can now be removed, since lowering AtomicExpression directly gives you a AtomicOpStmt. Then you might run into the issue that expressions cannot exist on its own, which can be solved following the scheme in ti.append (create a local var and then assign). Let me think about this for a few more minutes...

yuanming-hu · 2020-01-13T02:03:54Z

Following the above point. It turned out that rvalue exprs are eliminated if it's not assigned. This is undesirable since atomic_add() has side-effect. I followed how ti.append() was implemented and wrapped taichi_lang_core.expr_atomic_add() inside ti.expr_init(), which solved the problem. I still don't quite understand this, maybe because ti.expr_init() created a python native Expr, and python forces this to be evaluated? The printed IR showed that a local variable is created to hold the result of the atomic add, something like this:

$a = alloca
$b = atomic add(...)
$c : store [$a <- $b]

Yeah this is slightly tricky. ti.expr_init() here will go to this code path and call taichi_lang_core.expr_var. This gets resolved by pybind11 into Var, where a FrontendAllocaStmt is created, followed by an assignment.

demote_atomic can transform

$d = atomic add($a, $b)

to

$c = load $a
$d = add $c $b
$e : store [$a <- $c]

Combined with the above point, this actually caused a problem:

$c = alloca
$d = atomic add($a, $b)
$e : local store [$c <- $d]

gets demoted to

$c = alloca
$d' = load $a          <-- added by demote_atomic
$e' = add $d' $b
$f : store [$a <- $e']  <-- added by demote_atomic
$g: store [$c <- $f]   <-- the original stmt to store atomic_add result to tmpvar $c. This will crash

The program crashed at stmt $g, because $f is a store stmt whose value is nullptr. Obviously we can set store stmt's value, which is similar to C++'s operator= and effectively makes store an expr with side effect. One subtle thing is that if i wanted to following C++, store's value should be $a, since operator= returns the reference of the lvalue. But LLVM doesn't like me to do that (i.e. setting store stmt value to an AllocaInst eventually led to LLVM assert errors...)
I ended up replacing all $d = atomic add($a, $b) with $e' (the added value). This makes no change to store stmt, so i think it's a bit cleaner.

I believe store inst in LLVM does not have a value, so assigning it to an AllocaInst is not a good idea :-(

The only patch I would do here: ti.atomic_add should return the old value instead of the old one, following the convention in CUDA and C++, therefore instead of using $e' we'd better use $d'. Everything else is really well done.

Finally, if I make __iadd__() an expression, it then broke ALMOST ALL THE EXAMPLES + test_mpm88.py. As a workaround, I had to do the following:

For ti.atomic_add(a, b) and a.atomic_add(b), these are expressions

For a += b: this is a stmt. (Seems like having a.atomic_add(b) as a stmt is more consistent?)

Great job! This makes perfect sense. I don't think the Python parser even allows you to parse a += b as a statement :-)

Thank you so much for helping with this!

k-ye · 2020-01-13T03:00:00Z

Thank you for the detailed feedback! I'll try to finish the impl today or so. XD

yuanming-hu · 2020-01-13T03:42:40Z

Thanks! Please have fun and no need to rush. Here are some useful pieces of info when debugging the Taichi compiler https://taichi.readthedocs.io/en/latest/contributor_guide.html#tips-on-taichi-compiler-development (although I think you have figured out most of these on your own :)

yuanming-hu self-assigned this Dec 31, 2019

k-ye mentioned this issue Jan 10, 2020

Request to support dynamic.clear() #371

Closed

yuanming-hu added the welcome contribution label Jan 11, 2020

k-ye mentioned this issue Jan 14, 2020

Make atomic_add() return value (should be ready to review...) #381

Merged

yuanming-hu closed this as completed Jan 18, 2020

k-ye mentioned this issue May 1, 2020

[ir] Deprecate FrontendAtomicStmt #907

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

atomic_add return value #332

atomic_add return value #332

yuanming-hu commented Dec 31, 2019

k-ye commented Jan 12, 2020 •

edited

Loading

yuanming-hu commented Jan 12, 2020

k-ye commented Jan 12, 2020

yuanming-hu commented Jan 13, 2020

yuanming-hu commented Jan 13, 2020

k-ye commented Jan 13, 2020

yuanming-hu commented Jan 13, 2020

atomic_add return value #332

atomic_add return value #332

Comments

yuanming-hu commented Dec 31, 2019

k-ye commented Jan 12, 2020 • edited Loading

yuanming-hu commented Jan 12, 2020

k-ye commented Jan 12, 2020

yuanming-hu commented Jan 13, 2020

yuanming-hu commented Jan 13, 2020

k-ye commented Jan 13, 2020

yuanming-hu commented Jan 13, 2020

k-ye commented Jan 12, 2020 •

edited

Loading