-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CC] [autodiff] Support AdStack on C backend #1752
Conversation
static inline Ti_AdStackPtr Ti_ad_stack_top_primal(Ti_AdStackPtr stack, | ||
Ti_u32 element_size) { | ||
Ti_u32 *n = Ti_ad_stack_n(stack); | ||
return Ti_ad_stack_data(stack) + (*n - 1) * 2 * element_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I copied from ad_stack.metal.h
, can you tell me why n - 1
here? @k-ye
Sometimes n
can be 0 and it gets overflowed, resulting in a serious segfault when the lhs pointer is 64-bit (-1 = 0xffffffff). But it somehow silently passed on Metal whose pointer is 32-bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when n
is 0, it's pretty OK to have a segfault here -- just like this in C++:
std::stack<int> s;
s.top(); // runtime error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, this is just l[len(l) - 1]
. As mentioned, accessing top
without push
sounds like a bug.
taichi/backends/cc/runtime/base.h
Outdated
Ti_i32 *data = (Ti_i32 *)Ti_ad_stack_data(stack); | ||
data[0] = 0; | ||
data[1] = 0; | ||
*n = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to do this mock for L108 to prevent overflow when Ti_ad_stack_top_primal
called without Ti_ad_stack_push
, do you have the same issue on Metal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have the same issue on Metal?
No I don't remember seeing such an issue. It sounds like a bug if top_primal
is called before push
, which probably won't be limited to Metal only. Could you provide a test to repro this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've no knowledge about the autodiff system, I just know that test_ad_for.py
fails but test_ad_if
didn't. And it only occur on CC, not x64. But I'll try to find the min-repro based on the test later.
Codecov Report
@@ Coverage Diff @@
## master #1752 +/- ##
==========================================
- Coverage 42.50% 42.45% -0.06%
==========================================
Files 44 44
Lines 6194 6202 +8
Branches 1073 1073
==========================================
Hits 2633 2633
- Misses 3406 3414 +8
Partials 155 155
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTMig!
// Copied from Metal: | ||
typedef Ti_u8 *Ti_AdStackPtr; | ||
|
||
static inline Ti_u32 *Ti_ad_stack_n(Ti_AdStackPtr stack) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... I would suggest not using a capital letter as the beginning of a function's name. I haven't taken a look at the C backend before, so is there any reason that the Ti_
prefix is used? I would suggest cc_
prefix to show that it's the C backend (and probably Cc
for classes).
static inline Ti_AdStackPtr Ti_ad_stack_top_primal(Ti_AdStackPtr stack, | ||
Ti_u32 element_size) { | ||
Ti_u32 *n = Ti_ad_stack_n(stack); | ||
return Ti_ad_stack_data(stack) + (*n - 1) * 2 * element_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think when n
is 0, it's pretty OK to have a segfault here -- just like this in C++:
std::stack<int> s;
s.top(); // runtime error
But it makes
It's a namespace, prevent possible name conflict when users exporting the source code to their projects.
Yes, |
Interesting... Looks like a bug in autodiff or optimization passes. |
What's more, it silently passed on x64 and metal test.. Any idea? |
Can we confirm that this is an issue in autodiff system or C backend? |
I'll take a look tomorrow. |
It's an issue in autodiff. test case:
|
Great, so will we merge this PR before or after that issue is resolved? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! LGTM.
Related issue = close #1748
[Click here for the format server]