Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] test_ad_atomic.py::test_ad_reduce sometimes fails #828

Closed
xumingkuan opened this issue Apr 20, 2020 · 12 comments
Closed

[test] test_ad_atomic.py::test_ad_reduce sometimes fails #828

xumingkuan opened this issue Apr 20, 2020 · 12 comments
Labels
potential bug Something that looks like a bug but not yet confirmed welcome contribution

Comments

@xumingkuan
Copy link
Contributor

Describe the bug
test_ad_atomic.py::test_ad_reduce sometimes fails on AppVeyor, and it often passes when rerun.

Log/Screenshots
I find it weird here : $6 = get child [S0root->S5dense] $5 and $11 = get child [S0root->S3dense] $5. Can one statement link to two SNodes?

kernel {
  $0 = offloaded range_for(0, 16) block_dim=adaptive {
    <i32 x1> $1 = loop index 0
    <i32 x1> $2 = bit_extract($1 + 0, 0~4)
    <gen*x1> $3 = get root
    <i32 x1> $4 = const [0]
    <gen*x1> $5 = [S0root][root]::lookup($3, $4) activate = false
    <gen*x1> $6 = get child [S0root->S5dense] $5
    <i32 x1> $7 = bit_extract($2 + 0, 0~4)
    <gen*x1> $8 = [S5dense][dense]::lookup($6, $7) activate = false
    <f32*x1> $9 = get child [S5dense->S6place_f32] $8
    <f32 x1> $10 = global load $9
    <gen*x1> $11 = get child [S0root->S3dense] $5
    <gen*x1> $12 = [S3dense][dense]::lookup($11, $4) activate = false
    <f32*x1> $13 = get child [S3dense->S4place_f32] $12
    <f32 x1> $14 = global load $13
    <f32 x1> $15 = mul $14 $10
    <f32 x1> $16 = add $15 $15
    <f32*x1> $17 = get child [S5dense->S7place_f32] $8
    <f32 x1> $18 = atomic add($17, $16)
  }
}

Additional comments

ti.root.place(loss, loss.grad).dense(ti.i, N).place(x, x.grad)

I'm not sure if it's allowed to call place two times in one statement.

@xumingkuan xumingkuan added the potential bug Something that looks like a bug but not yet confirmed label Apr 20, 2020
@xumingkuan xumingkuan changed the title test_ad_atomic.py::test_ad_reduce sometimes fails [test] test_ad_atomic.py::test_ad_reduce sometimes fails Apr 20, 2020
@xumingkuan
Copy link
Contributor Author

@yuanming-hu What do you think?

@yuanming-hu
Copy link
Member

Thanks for looking into this. This has been a nasty issue for a while :-)

We should remove

to avoid this confusing syntax here. Not sure if this is the cause though.

To debug, take a look at ti.get_runtime().prog.print_snode_tree()

@xumingkuan
Copy link
Contributor Author

ti.get_runtime().prog.print_snode_tree():

S0root
  S1dense
    S2place_f32
  S3dense
    S4place_f32
  S5dense
    S6place_f32
    S7place_f32

@xumingkuan
Copy link
Contributor Author

After I changed

ti.root.place(loss, loss.grad).dense(ti.i, N).place(x, x.grad)

to

ti.root.place(loss, loss.grad)
ti.root.dense(ti.i, N).place(x, x.grad)

, both the snode tree and the IR didn't change.

@yuanming-hu
Copy link
Member

Thanks for the info. I took another look at the script and had no idea why it fails. We should also try to stably reproduce this issue: if I remember correctly this test will only fail with, a small probability, if I run all the tests together (which takes a couple of minutes), and it will pass if I run this one alone.

Also, it only fails on Windows.

@xumingkuan
Copy link
Contributor Author

xumingkuan commented Apr 20, 2020

Do you also think that $6 = get child [S0root->S5dense] $5 with $11 = get child [S0root->S3dense] $5 is probably the cause of failure, or it looks good to you?

I found it still exists if I set advanced_optimization to false:

kernel {
  $0 = offloaded range_for(0, 16) block_dim=adaptive {
    <i32 x1> $1 = const [0]
    <i32 x1> $2 = loop index 0
    <i32 x1> $3 = bit_extract($2 + 0, 0~4)
    <i32 x1> $4 = const [1]
    <i32 x1> $5 = mul $3 $4
    <i32 x1> $6 = add $1 $5
    <f32 x1> $7 = alloca
    <f32 x1> $8 = alloca
    <gen*x1> $9 = get root
    <i32 x1> $10 = linearized(ind {}, stride {})
    <gen*x1> $11 = [S0root][root]::lookup($9, $10) activate = false
    <gen*x1> $12 = get child [S0root->S5dense] $11
    <i32 x1> $13 = bit_extract($6 + 0, 0~4)
    <i32 x1> $14 = linearized(ind {$13}, stride {16})
    <gen*x1> $15 = [S5dense][dense]::lookup($12, $14) activate = false
    <f32*x1> $16 = get child [S5dense->S6place_f32] $15
    <f32 x1> $17 = global load $16
    <gen*x1> $18 = get child [S0root->S3dense] $11
    <gen*x1> $19 = [S3dense][dense]::lookup($18, $10) activate = false
    <f32*x1> $20 = get child [S3dense->S4place_f32] $19
    <f32 x1> $21 = global load $20
    <f32 x1> $22 = local load [ [$8[0]]]
    <f32 x1> $23 = add $22 $21
    <f32 x1> $24 = mul $23 $17
    <f32 x1> $25 = local load [ [$7[0]]]
    <f32 x1> $26 = add $25 $24
    <f32 x1> $27 = add $26 $24
    <f32*x1> $28 = get child [S5dense->S7place_f32] $15
    <f32 x1> $29 = atomic add($28, $27)
  }
}

($12 = get child [S0root->S5dense] $11 and $18 = get child [S0root->S3dense] $11 here)

@yuanming-hu
Copy link
Member

Do you also think that $6 = get child [S0root->S5dense] $5 with $11 = get child [S0root->S3dense] $5 is probably the cause of failure

Can one statement link to two SNodes?

Could you provide greater detail? I'm not sure if I understand what link means here. Thanks.

@xumingkuan
Copy link
Contributor Author

I think $5 here means a pointer acquired from the offset 0 from the root:

    <gen*x1> $3 = get root
    <i32 x1> $4 = const [0]
    <gen*x1> $5 = [S0root][root]::lookup($3, $4) activate = false

And I don't understand why $5 can be used in two get child statements with different snodes here. I think S5dense and S3dense cannot have the same address.

    <gen*x1> $6 = get child [S0root->S5dense] $5
    <gen*x1> $11 = get child [S0root->S3dense] $5

@yuanming-hu
Copy link
Member

I think $5 here means a pointer acquired from the offset 0 from the root:

    <gen*x1> $3 = get root
    <i32 x1> $4 = const [0]
    <gen*x1> $5 = [S0root][root]::lookup($3, $4) activate = false

Correct.

And I don't understand why $5 can be used in two get child statements with different snodes here. I think S5dense and S3dense cannot have the same address.

    <gen*x1> $6 = get child [S0root->S5dense] $5
    <gen*x1> $11 = get child [S0root->S3dense] $5

$6 and $11 do not have the same address. Note that each element of S0root has an instance of S1, S3 and S5.

@yuanming-hu
Copy link
Member

Another mysterious crash: https://ci.appveyor.com/project/yuanming-hu/taichi/builds/32373295/job/jbdelwb6ch2yj2g6#L9430

@archibate
Copy link
Collaborator

archibate commented Apr 23, 2020

This may related to variable-not-initialized? I found #633 caused by num_groups not reset to 1 after each kernel invocation, and cause a random crash.
Also I heard win has a different memory initialize mechanism from linux: they initialize stack as 0xcc, and heap as 0xcd, causing the famous 烫烫烫 and 屯屯屯 codes, see https://blog.csdn.net/mig_davidli/article/details/37507731.

@ailzhang
Copy link
Contributor

Closing as we no longer use appveyor and this test seem to be stable now. Please feel free to open a new issue if it happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed welcome contribution
Projects
None yet
Development

No branches or pull requests

4 participants