Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code breaks on vulkan backend with v1.1.2 #6684

Closed
neozhaoliang opened this issue Nov 21, 2022 · 4 comments
Closed

Code breaks on vulkan backend with v1.1.2 #6684

neozhaoliang opened this issue Nov 21, 2022 · 4 comments
Assignees
Labels
potential bug Something that looks like a bug but not yet confirmed

Comments

@neozhaoliang
Copy link
Contributor

neozhaoliang commented Nov 21, 2022

This code crashes on intel GT2 card, the commit is e2fb83bdc4f3b4cc6cc86cca4ea35433883a49e6:

import taichi as ti
ti.init(arch=ti.vulkan)

d = 3
num_round = 1000000
max_steps = 1000000

ivec = ti.types.vector(d, int)
origin = ivec([0, 0, 0]) # ivec([0] * d)

dirs = ti.Vector.field(d, int, shape=2*d)
for k in range(d):
    dirs[2 *k][k] = 1
    dirs[2 * k + 1][k] = -1

print(dirs)

@ti.func
def choose_random_direction():
    ind = int(ti.random() * 2 * d)
    return dirs[ind]

@ti.kernel
def walk() -> float:
    success = 0
    for _ in range(num_round):
        pos = origin
        for step in range(max_steps):
            pos += choose_random_direction()
            if all(pos == origin):
                success += 1
                break

    return success / num_round

print(walk())

Error message:

[E 11/21/22 12:57:53.167 1630179] [vulkan_device.cpp:submit@1708] Vulkan Error : -4 : failed to submit command buffer


Traceback (most recent call last):
  File "/home/zhao/China_R_Meeting/return_probability_Z^d.py", line 36, in <module>
    print(do_experiments())
  File "/home/zhao/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 942, in wrapped
    return primal(*args, **kwargs)
  File "/home/zhao/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 869, in __call__
    return self.runtime.compiled_functions[key](*args)
  File "/home/zhao/.local/lib/python3.10/site-packages/taichi/lang/kernel_impl.py", line 792, in func__
    runtime_ops.sync()
  File "/home/zhao/.local/lib/python3.10/site-packages/taichi/lang/runtime_ops.py", line 8, in sync
    impl.get_runtime().sync()
  File "/home/zhao/.local/lib/python3.10/site-packages/taichi/lang/impl.py", line 453, in sync
    self.prog.synchronize()
RuntimeError: [vulkan_device.cpp:submit@1708] Vulkan Error : -4 : failed to submit command buffer
[E 11/21/22 12:57:53.209 1630179] [vulkan_device.cpp:submit@1708] Vulkan Error : -4 : failed to submit command buffer


[E 11/21/22 12:57:53.210 1630179] [vulkan_device.cpp:submit@1708] Vulkan Error : -4 : failed to submit command buffer


terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >'
已放弃

Though the parent commits run without raising errors, they all give the wrong result 0.0 on both the Vulkan and OpenGL backends. But the code works fine on Cuda +v1.0.

Info: the program works for num_rounds = 100K but fails for num_rounds=1000K.

@neozhaoliang neozhaoliang added the potential bug Something that looks like a bug but not yet confirmed label Nov 21, 2022
@taichi-gardener taichi-gardener moved this to Untriaged in Taichi Lang Nov 21, 2022
@ailzhang
Copy link
Contributor

@neozhaoliang hmmm I've run the script and it works fine on master branch. Does this repro consistently on your end?

@ailzhang ailzhang moved this from Untriaged to In Progress in Taichi Lang Nov 25, 2022
@PENGUINLIONG PENGUINLIONG self-assigned this Nov 25, 2022
@strongoier strongoier added this to the v1.4.0 milestone Dec 2, 2022
@PENGUINLIONG PENGUINLIONG modified the milestones: v1.4.0, v1.5.0 Jan 4, 2023
@ailzhang
Copy link
Contributor

@neozhaoliang can you still repro this issue?

@ailzhang ailzhang modified the milestones: v1.5.0, v1.6.0 Feb 17, 2023
@neozhaoliang
Copy link
Contributor Author

Using current master branch [Taichi] version 1.5.0, llvm 14.0.0, commit 4af54d49, linux, python 3.10.6

The program runs but gives the wrong error 0.0

@bobcao3
Copy link
Collaborator

bobcao3 commented Feb 18, 2023

RTX 4090 on windows produced this result:

[[ 1  0  0]
 [-1  0  0]
 [ 0  1  0]
 [ 0 -1  0]
 [ 0  0  1]
 [ 0  0 -1]]
0.34026598930358887

@feisuzhu feisuzhu modified the milestones: v1.6.0, v1.7.0 May 11, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Taichi Lang Oct 25, 2023
@jim19930609 jim19930609 removed this from the v1.7.0 milestone Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
potential bug Something that looks like a bug but not yet confirmed
Projects
Status: Done
Development

No branches or pull requests

7 participants