[CC] [autodiff] Support AdStack on C backend #1752

archibate · 2020-08-23T05:26:42Z

Related issue = close #1748

archibate · 2020-08-23T05:30:03Z

taichi/backends/cc/runtime/base.h

+static inline Ti_AdStackPtr Ti_ad_stack_top_primal(Ti_AdStackPtr stack,
+                                          Ti_u32 element_size) {
+  Ti_u32 *n = Ti_ad_stack_n(stack);
+  return Ti_ad_stack_data(stack) + (*n - 1) * 2 * element_size;


I copied from ad_stack.metal.h, can you tell me why n - 1 here? @k-ye
Sometimes n can be 0 and it gets overflowed, resulting in a serious segfault when the lhs pointer is 64-bit (-1 = 0xffffffff). But it somehow silently passed on Metal whose pointer is 32-bit?

I think when n is 0, it's pretty OK to have a segfault here -- just like this in C++:

std::stack<int> s; s.top(); // runtime error

Exactly, this is just l[len(l) - 1]. As mentioned, accessing top without push sounds like a bug.

archibate · 2020-08-23T05:31:13Z

taichi/backends/cc/runtime/base.h

+  Ti_i32 *data = (Ti_i32 *)Ti_ad_stack_data(stack);
+  data[0] = 0;
+  data[1] = 0;
+  *n = 1;


Have to do this mock for L108 to prevent overflow when Ti_ad_stack_top_primal called without Ti_ad_stack_push, do you have the same issue on Metal?

do you have the same issue on Metal?

No I don't remember seeing such an issue. It sounds like a bug if top_primal is called before push, which probably won't be limited to Metal only. Could you provide a test to repro this?

Sorry, I've no knowledge about the autodiff system, I just know that test_ad_for.py fails but test_ad_if didn't. And it only occur on CC, not x64. But I'll try to find the min-repro based on the test later.

codecov · 2020-08-23T06:01:50Z

Codecov Report

Merging #1752 into master will decrease coverage by 0.05%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master    #1752      +/-   ##
==========================================
- Coverage   42.50%   42.45%   -0.06%     
==========================================
  Files          44       44              
  Lines        6194     6202       +8     
  Branches     1073     1073              
==========================================
  Hits         2633     2633              
- Misses       3406     3414       +8     
  Partials      155      155

Impacted Files	Coverage Δ
python/taichi/core/util.py	`0.37% <0.00%> (-0.02%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fa0d9d...bdc1559. Read the comment docs.

xumingkuan

LGTMig!

xumingkuan · 2020-08-23T06:25:31Z

taichi/backends/cc/runtime/base.h

+// Copied from Metal:
+typedef Ti_u8 *Ti_AdStackPtr;
+
+static inline Ti_u32 *Ti_ad_stack_n(Ti_AdStackPtr stack) {


Well... I would suggest not using a capital letter as the beginning of a function's name. I haven't taken a look at the C backend before, so is there any reason that the Ti_ prefix is used? I would suggest cc_ prefix to show that it's the C backend (and probably Cc for classes).

xumingkuan · 2020-08-23T06:28:10Z

taichi/backends/cc/runtime/base.h

+static inline Ti_AdStackPtr Ti_ad_stack_top_primal(Ti_AdStackPtr stack,
+                                          Ti_u32 element_size) {
+  Ti_u32 *n = Ti_ad_stack_n(stack);
+  return Ti_ad_stack_data(stack) + (*n - 1) * 2 * element_size;


I think when n is 0, it's pretty OK to have a segfault here -- just like this in C++:

std::stack<int> s; s.top(); // runtime error

archibate · 2020-08-23T07:18:17Z

I think when n is 0, it's pretty OK to have a segfault here -- just like this in C++:

But it makes test_ad_for.py to fail. It seems no push is executed before top.

so is there any reason that the Ti_ prefix is used?

It's a namespace, prevent possible name conflict when users exporting the source code to their projects.

I would suggest cc_ prefix to show that it's the C backend.

Yes, cc_ is clear only within the Taichi repo codebase, but not so clear when being exported to a thrid-party shared object. Ti_ immediately hints them this is a function of Taichi runtime.
What's more, we may futher support exporting kernels in LLVM backend, which could use the same naming rule for portability.

xumingkuan · 2020-08-23T08:15:41Z

I think when n is 0, it's pretty OK to have a segfault here -- just like this in C++:

But it makes test_ad_for.py to fail. It seems no push is executed before top.

Interesting... Looks like a bug in autodiff or optimization passes.

archibate · 2020-08-23T09:19:23Z

I think when n is 0, it's pretty OK to have a segfault here -- just like this in C++:

But it makes test_ad_for.py to fail. It seems no push is executed before top.

Interesting... Looks like a bug in autodiff or optimization passes.

What's more, it silently passed on x64 and metal test.. Any idea?

archibate · 2020-08-25T03:59:57Z

Can we confirm that this is an issue in autodiff system or C backend?

xumingkuan · 2020-08-25T16:49:35Z

Can we confirm that this is an issue in autodiff system or C backend?

I'll take a look tomorrow.

xumingkuan · 2020-08-26T11:13:09Z

It's an issue in autodiff.

test case:
test_ad_fibonacci_index() in test_ad_for.py
log:

[I 08/26/20 19:11:30.900] [compile_to_offloads.cpp:taichi::lang::irpass::`anon
ymous-namespace'::make_pass_printer::<lambda_fe1d620add3df83d4ee306f9c0ab10ca>
::operator ()@18] [fib_c5_0_grad_grad] Simplified I:
kernel {
  <i32 x1> $0 = const [5]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [1]
  <i32 x1> $3 = const [10]
  $4 : for in range($1, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    <f32*x1> $6 = global ptr [S6place_f32], index [$5] activate=true
    <f32 x1> $7 = global load $6
    <f32*x1> $8 = global ptr [S10place_f32], index [] activate=true
    <f32 x1> $9 = atomic add($8, $7)
  }
  $10 : for in range($1, $0) (vectorize 1) block_dim=adaptive {
    <i32 x1> $11 = alloca
    <i32 x1> $12 = alloca
    <i32 x1> $13 : local store [$12 <- $2]
    $14 : for in range($1, $0) (vectorize 1) block_dim=adaptive {
      <i32 x1> $15 = local load [ [$12[0]]]
      <i32 x1> $16 = local load [ [$11[0]]]
      <i32 x1> $17 = add $16 $15
      <i32 x1> $18 : local store [$11 <- $15]
      <i32 x1> $19 : local store [$12 <- $17]
      <f32*x1> $20 = global ptr [S2place_f32], index [$17] activate=true
      <f32 x1> $21 = global load $20
      <f32*x1> $22 = global ptr [S6place_f32], index [$17] activate=true
      <f32 x1> $23 = atomic add($22, $21)
    }
  }
}
[I 08/26/20 19:11:30.903] [compile_to_offloads.cpp:taichi::lang::irpass::`anon
ymous-namespace'::make_pass_printer::<lambda_fe1d620add3df83d4ee306f9c0ab10ca>
::operator ()@18] [fib_c5_0_grad_grad] Gradient:
kernel {
  <i32 x1> $0 = const [5]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [1]
  <i32 x1> $3 = const [10]
  $4 : for in range($1, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    <f32*x1> $6 = global ptr [S6place_f32], index [$5] activate=true
    <f32*x1> $7 = global ptr [S10place_f32], index [] activate=true
    <f32*x1> $8 = global ptr [S12place_f32], index [] activate=true
    <f32 x1> $9 = global load $8
    <f32*x1> $10 = global ptr [S8place_f32], index [$5] activate=true
    <f32 x1> $11 = atomic add($10, $9)
  }
  $12 : for in range($1, $0) (vectorize 1) block_dim=adaptive {
    <f32 x1> $13 = stack alloc (max_size=16)
    <i32 x1> $14 = stack alloc (max_size=16)
    <i32 x1> $15 = stack alloc (max_size=16)
    <i32 x1> $16 = stack alloc (max_size=16)
    <i32 x1> $17 : stack push $16, val = $2
    $18 : for in range($1, $0) (vectorize 1) block_dim=adaptive {
      <i32 x1> $19 = stack load top $16
      <i32 x1> $20 = stack load top $15 // <------------------------------ empty stack!
      <i32 x1> $21 = add $20 $19
      <i32 x1> $22 : stack push $14, val = $21
      <i32 x1> $23 = stack load top $14
      <i32 x1> $24 : stack push $15, val = $19
      <i32 x1> $25 : stack push $16, val = $23
      <f32*x1> $26 = global ptr [S2place_f32], index [$23] activate=true
      <f32 x1> $27 = global load $26
      <f32 x1> $28 : stack push $13, val = $27
      <f32*x1> $29 = global ptr [S6place_f32], index [$23] activate=true
    }
    $30 : reversed for in range($1, $0) (vectorize 1) block_dim=adaptive {
      <i32 x1> $31 = stack load top $14
      <f32*x1> $32 = global ptr [S8place_f32], index [$31] activate=true
      <f32 x1> $33 = global load $32
      <f32 x1> $34 : stack acc adj $13, val = $33
      <f32 x1> $35 = stack load top adj $13
      <f32 x1> $36 : stack pop $13
      <f32*x1> $37 = global ptr [S4place_f32], index [$31] activate=true
      <f32 x1> $38 = atomic add($37, $35)
      <i32 x1> $39 : stack pop $16
      <i32 x1> $40 : stack pop $15
      <i32 x1> $41 : stack pop $14
    }
    <i32 x1> $42 : stack pop $16
  }
}

archibate · 2020-08-26T14:44:37Z

Great, so will we merge this PR before or after that issue is resolved?

xumingkuan · 2020-08-26T14:48:12Z

*n = 1; looks too hacky to me... I prefer to merge this after that issue is resolved if that won't take too much time.

taichi/backends/cc/runtime/base.h

Co-authored-by: xumingkuan <[email protected]>

xumingkuan

Cool! LGTM.

archibate added 2 commits August 23, 2020 13:22

tmp

257cfb2

fix the stupid init issue

3ebf2ca

archibate requested a review from k-ye August 23, 2020 05:26

archibate commented Aug 23, 2020

View reviewed changes

archibate requested review from taichi-gardener and yuanming-hu August 23, 2020 05:31

[skip ci] enforce code format

6a83777

archibate requested a review from xumingkuan August 23, 2020 05:31

xumingkuan reviewed Aug 23, 2020

View reviewed changes

archibate requested a review from xumingkuan August 23, 2020 13:33

archibate assigned xumingkuan Aug 24, 2020

xumingkuan mentioned this pull request Aug 26, 2020

[bug] [autodiff] Empty AdStack loaded #1781

Closed

archibate mentioned this pull request Aug 28, 2020

[ir] [autodiff] Initialize ADStack with a zero #1791

Merged

xumingkuan reviewed Aug 28, 2020

View reviewed changes

taichi/backends/cc/runtime/base.h Outdated Show resolved Hide resolved

archibate and others added 3 commits August 28, 2020 17:07

[skip ci] Update taichi/backends/cc/runtime/base.h

4143a3e

Co-authored-by: xumingkuan <[email protected]>

Merge branch 'master' into cc-adstack

cadd0aa

Merge branch 'master' into cc-adstack

bdc1559

archibate requested a review from xumingkuan August 28, 2020 09:35

xumingkuan approved these changes Aug 28, 2020

View reviewed changes

archibate added the LGTM label Aug 28, 2020

archibate merged commit d94c2dd into taichi-dev:master Aug 29, 2020

yuanming-hu mentioned this pull request Sep 1, 2020

[release] v0.6.31 #1818

Merged

archibate mentioned this pull request Sep 6, 2020

Potential TODOs #1847

Closed

25 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CC] [autodiff] Support AdStack on C backend #1752

[CC] [autodiff] Support AdStack on C backend #1752

archibate commented Aug 23, 2020

archibate Aug 23, 2020 •

edited

Loading

xumingkuan Aug 23, 2020

k-ye Aug 23, 2020

archibate Aug 23, 2020

k-ye Aug 23, 2020

archibate Aug 23, 2020

codecov bot commented Aug 23, 2020 •

edited

Loading

xumingkuan left a comment

xumingkuan Aug 23, 2020

xumingkuan Aug 23, 2020

archibate commented Aug 23, 2020

xumingkuan commented Aug 23, 2020

archibate commented Aug 23, 2020

archibate commented Aug 25, 2020

xumingkuan commented Aug 25, 2020

xumingkuan commented Aug 26, 2020

archibate commented Aug 26, 2020

xumingkuan commented Aug 26, 2020

xumingkuan left a comment

[CC] [autodiff] Support AdStack on C backend #1752

[CC] [autodiff] Support AdStack on C backend #1752

Conversation

archibate commented Aug 23, 2020

archibate Aug 23, 2020 • edited Loading

Choose a reason for hiding this comment

xumingkuan Aug 23, 2020

Choose a reason for hiding this comment

k-ye Aug 23, 2020

Choose a reason for hiding this comment

archibate Aug 23, 2020

Choose a reason for hiding this comment

k-ye Aug 23, 2020

Choose a reason for hiding this comment

archibate Aug 23, 2020

Choose a reason for hiding this comment

codecov bot commented Aug 23, 2020 • edited Loading

Codecov Report

xumingkuan left a comment

Choose a reason for hiding this comment

xumingkuan Aug 23, 2020

Choose a reason for hiding this comment

xumingkuan Aug 23, 2020

Choose a reason for hiding this comment

archibate commented Aug 23, 2020

xumingkuan commented Aug 23, 2020

archibate commented Aug 23, 2020

archibate commented Aug 25, 2020

xumingkuan commented Aug 25, 2020

xumingkuan commented Aug 26, 2020

archibate commented Aug 26, 2020

xumingkuan commented Aug 26, 2020

xumingkuan left a comment

Choose a reason for hiding this comment

archibate Aug 23, 2020 •

edited

Loading

codecov bot commented Aug 23, 2020 •

edited

Loading