Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atomic_add return value #332

Closed
yuanming-hu opened this issue Dec 31, 2019 · 7 comments
Closed

atomic_add return value #332

yuanming-hu opened this issue Dec 31, 2019 · 7 comments
Assignees

Comments

@yuanming-hu
Copy link
Member

No description provided.

@k-ye
Copy link
Member

k-ye commented Jan 12, 2020

Hey, I'd like to contribute to this if possible. I got a prototype in this commit, and now am adding new tests/fixing broken tests...

@yuanming-hu
Copy link
Member Author

Nice! I think your implementation generally makes sense to me. I'm at the airport now and will take a close look after I settle down. Thank you!

@k-ye
Copy link
Member

k-ye commented Jan 12, 2020

Sure no problem! I've listed some of my decisions and questions below for you to take a look. The code is not ready yet, so don't worry about that for now.. I'll send a PR later :)

  • Need to set AtomicOpStmt::value in codegen_llvm/codegen_llvm_ptx to propagate the return value to subsequent stmts. This leads to an issue that, once stmt->width is greater than 1, everything breaks. I noted that type_check currently has a check for this, so I assume it's a known issue?

  • Need to set AtomicOpStmt::ret_type, otherwise taichi inserts a cast of type unknown. And this will crash due to this line.

    BTW, it seems that the current setting of SNodeOpStmt::element_type could be wrong? It is set to the type of SNode. But given that this stmt implements append and length, shouldn't the type always be i32?

  • In order to make atomic_add() return value, we need to convert it from a statement to an expression. As a result, I added a FrontendAtomicExpression, which flattens to a FrontendAtomicStmt. I wonder if this is the right approach, since currently there is no Frontend* exprs -- they are all stmts.

  • Following the above point. It turned out that rvalue exprs are eliminated if it's not assigned. This is undesirable since atomic_add() has side-effect. I followed how ti.append() was implemented and wrapped taichi_lang_core.expr_atomic_add() inside ti.expr_init(), which solved the problem. I still don't quite understand this, maybe because ti.expr_init() created a python native Expr, and python forces this to be evaluated? The printed IR showed that a local variable is created to hold the result of the atomic add, something like this:

    $a = alloca
    $b = atomic add(...)
    $c : store [$a <- $b]
    
  • demote_atomic can transform

$d = atomic add($a, $b)

to

$c = load $a
$d = add $c $b
$e : store [$a <- $c]

Combined with the above point, this actually caused a problem:

$c = alloca
$d = atomic add($a, $b)
$e : local store [$c <- $d]

gets demoted to

$c = alloca
$d' = load $a          <-- added by demote_atomic
$e' = add $d' $b
$f : store [$a <- $e']  <-- added by demote_atomic
$g: store [$c <- $f]   <-- the original stmt to store atomic_add result to tmpvar $c. This will crash

The program crashed at stmt $g, because $f is a store stmt whose value is nullptr. Obviously we can set store stmt's value, which is similar to C++'s operator= and effectively makes store an expr with side effect. One subtle thing is that if i wanted to following C++, store's value should be $a, since operator= returns the reference of the lvalue. But LLVM doesn't like me to do that (i.e. setting store stmt value to an AllocaInst eventually led to LLVM assert errors...)

I ended up replacing all $d = atomic add($a, $b) with $e' (the added value). This makes no change to store stmt, so i think it's a bit cleaner.

  • Finally, if I make __iadd__() an expression, it then broke ALMOST ALL THE EXAMPLES + test_mpm88.py. As a workaround, I had to do the following:

    • For ti.atomic_add(a, b) and a.atomic_add(b), these are expressions
    • For a += b: this is a stmt. (Seems like having a.atomic_add(b) as a stmt is more consistent?)

It will be great to know what's wrong, but i need a stop...

@yuanming-hu
Copy link
Member Author

Need to set AtomicOpStmt::value in codegen_llvm/codegen_llvm_ptx to propagate the return value to subsequent stmts. This leads to an issue that, once stmt->width is greater than 1, everything breaks. I noted that type_check currently has a check for this, so I assume it's a known issue?

Taichi used to support loop vectorization, however, after switching to LLVM the auto-vectorization functionality is broken. I would suggest that we implement the scalar case first, since

  • Making scalar instructions work is already a lot of work. We'd better focus on it right now.
  • Most performance-aware people use GPUs. So we can postpone auto-vectorization on CPUs.

Need to set AtomicOpStmt::ret_type, otherwise taichi inserts a cast of type unknown. And this will crash due to this line.

Yes, this needs to be set to the same type as stmt->dest->ret_type or stmt->val->ret_type.

BTW, it seems that the current setting of SNodeOpStmt::element_type could be wrong? It is set to the type of SNode. But given that this stmt implements append and length, shouldn't the type always be i32?

Nice catch. I believe this is a bug. My bad :-(

In order to make atomic_add() return value, we need to convert it from a statement to an expression. As a result, I added a FrontendAtomicExpression, which flattens to a FrontendAtomicStmt. I wonder if this is the right approach, since currently there is no Frontend* exprs -- they are all stmts.

This is mostly the right approach! I would simply use the name AtomicExpression instead of FrontentAtomicExpression, since expressions only live at the frontend, before AST lowering. FrontendAtomicStmt can now be removed, since lowering AtomicExpression directly gives you a AtomicOpStmt. Then you might run into the issue that expressions cannot exist on its own, which can be solved following the scheme in ti.append (create a local var and then assign). Let me think about this for a few more minutes...

@yuanming-hu
Copy link
Member Author

Following the above point. It turned out that rvalue exprs are eliminated if it's not assigned. This is undesirable since atomic_add() has side-effect. I followed how ti.append() was implemented and wrapped taichi_lang_core.expr_atomic_add() inside ti.expr_init(), which solved the problem. I still don't quite understand this, maybe because ti.expr_init() created a python native Expr, and python forces this to be evaluated? The printed IR showed that a local variable is created to hold the result of the atomic add, something like this:

$a = alloca
$b = atomic add(...)
$c : store [$a <- $b]

Yeah this is slightly tricky. ti.expr_init() here will go to this code path and call taichi_lang_core.expr_var. This gets resolved by pybind11 into Var, where a FrontendAllocaStmt is created, followed by an assignment.

demote_atomic can transform

$d = atomic add($a, $b)

to

$c = load $a
$d = add $c $b
$e : store [$a <- $c]

Combined with the above point, this actually caused a problem:

$c = alloca
$d = atomic add($a, $b)
$e : local store [$c <- $d]

gets demoted to

$c = alloca
$d' = load $a          <-- added by demote_atomic
$e' = add $d' $b
$f : store [$a <- $e']  <-- added by demote_atomic
$g: store [$c <- $f]   <-- the original stmt to store atomic_add result to tmpvar $c. This will crash

The program crashed at stmt $g, because $f is a store stmt whose value is nullptr. Obviously we can set store stmt's value, which is similar to C++'s operator= and effectively makes store an expr with side effect. One subtle thing is that if i wanted to following C++, store's value should be $a, since operator= returns the reference of the lvalue. But LLVM doesn't like me to do that (i.e. setting store stmt value to an AllocaInst eventually led to LLVM assert errors...)
I ended up replacing all $d = atomic add($a, $b) with $e' (the added value). This makes no change to store stmt, so i think it's a bit cleaner.

I believe store inst in LLVM does not have a value, so assigning it to an AllocaInst is not a good idea :-(

The only patch I would do here: ti.atomic_add should return the old value instead of the old one, following the convention in CUDA and C++, therefore instead of using $e' we'd better use $d'. Everything else is really well done.

Finally, if I make __iadd__() an expression, it then broke ALMOST ALL THE EXAMPLES + test_mpm88.py. As a workaround, I had to do the following:

  • For ti.atomic_add(a, b) and a.atomic_add(b), these are expressions
  • For a += b: this is a stmt. (Seems like having a.atomic_add(b) as a stmt is more consistent?)

Great job! This makes perfect sense. I don't think the Python parser even allows you to parse a += b as a statement :-)

Thank you so much for helping with this!

@k-ye
Copy link
Member

k-ye commented Jan 13, 2020

Thank you for the detailed feedback! I'll try to finish the impl today or so. XD

@yuanming-hu
Copy link
Member Author

Thanks! Please have fun and no need to rush. Here are some useful pieces of info when debugging the Taichi compiler https://taichi.readthedocs.io/en/latest/contributor_guide.html#tips-on-taichi-compiler-development (although I think you have figured out most of these on your own :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants