Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [opt] [ir] GlobalPtrStmt being optimized out cross offload #1390

Closed
archibate opened this issue Jul 3, 2020 · 0 comments · Fixed by #1392
Closed

[Bug] [opt] [ir] GlobalPtrStmt being optimized out cross offload #1390

archibate opened this issue Jul 3, 2020 · 0 comments · Fixed by #1392
Assignees
Labels
ir IR related issues potential bug Something that looks like a bug but not yet confirmed

Comments

@archibate
Copy link
Collaborator

Describe the bug
GlobalPtrStmt being optimized out cross offload.

To Reproduce

import taichi as ti
ti.init(print_ir=True)
ti.core.toggle_advanced_optimization(False)

x = ti.var(ti.i32, ())

@ti.kernel
def hello():
	x[None] = 0
	for i in range(10):
		print(i)
	x[None] += 1

hello()

Log/Screenshots

[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-mo14utn3
[Taichi] <dev mode>, llvm 10.0.0, commit 28e005cc, python 3.8.3
[Taichi] Starting on arch=x64
[I 07/04/20 01:29:35.034] [compile_to_offloads.cpp:operator()@23] Initial IR:
kernel {
  #@tmp0[] = 0
  $1 : for @tmp1 in range((cast_value<int32> 0), (cast_value<int32> 10)) block_dim=adaptive {
    $2 = eval @tmp1
    print %2, "
"
  }
  #@tmp0[] = 1
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Lowered:
kernel {
  <i32 x1> $0 = const [0]
  <i32 x1> $1 = global ptr [S2place_i32], index [] activate=true
  $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32 x1> $11 = global ptr [S2place_i32], index [] activate=true
  $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Typechecked:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Loop Vectorized:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Loop Split:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Simplified I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [10]
  $4 : for in range($0, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    print $5, "
"
  }
  <i32 x1> $7 = const [1]
  <i32*x1> $8 : global store [$1 <- $7]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Dense struct-for demoted:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [10]
  $4 : for in range($0, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    print $5, "
"
  }
  <i32 x1> $7 = const [1]
  <i32*x1> $8 : global store [$1 <- $7]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Optimized by CFG I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32 x1> $2 = const [10]
  $3 : for in range($0, $2) (vectorize 1) block_dim=adaptive {
    <i32 x1> $4 = loop $3 index 0
    print $4, "
"
  }
  <i32 x1> $6 = const [1]
  <i32*x1> $7 : global store [$1 <- $6]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Access flagged I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32 x1> $2 = const [10]
  $3 : for in range($0, $2) (vectorize 1) block_dim=adaptive {
    <i32 x1> $4 = loop $3 index 0
    print $4, "
"
  }
  <i32 x1> $6 = const [1]
  <i32*x1> $7 : global store [$1 <- $6]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Simplified II:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32 x1> $2 = const [10]
  $3 : for in range($0, $2) (vectorize 1) block_dim=adaptive {
    <i32 x1> $4 = loop $3 index 0
    print $4, "
"
  }
  <i32 x1> $6 = const [1]
  <i32*x1> $7 : global store [$1 <- $6]
}
[E 07/04/20 01:29:35.036] [offload.cpp:visit_operand@445] !op->has_global_side_effect()


***********************************
* Taichi Compiler Stack Traceback *
***********************************
/tmp/taichi-mo14utn3/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
/tmp/taichi-mo14utn3/taichi_core.so(+0x995377) [0x7f30949eb377]
/tmp/taichi-mo14utn3/taichi_core.so(+0x99319d) [0x7f30949e919d]
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::IRVisitor::visit(taichi::lang::GlobalStoreStmt*)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::BasicStmtVisitor::visit(taichi::lang::Block*)
/tmp/taichi-mo14utn3/taichi_core.so(+0x994565) [0x7f30949ea565]
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::BasicStmtVisitor::visit(taichi::lang::Block*)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::irpass::offload(taichi::lang::IRNode*)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::irpass::compile_to_offloads(taichi::lang::IRNode*, taichi::lang::CompileConfig const&, bool, bool, bool, bool, bool, bool)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Kernel::lower(bool)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Program::compile(taichi::lang::Kernel&)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Kernel::compile()
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Kernel::operator()()
/tmp/taichi-mo14utn3/taichi_core.so(+0x870477) [0x7f30948c6477]
/tmp/taichi-mo14utn3/taichi_core.so(+0x7b9148) [0x7f309480f148]
/usr/lib/libpython3.8.so.1.0: PyCFunction_Call
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/usr/lib/libpython3.8.so.1.0(+0x13e8c9) [0x7f30a4fc48c9]
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0(+0x98f6d) [0x7f30a4f1ef6d]
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyObject_FastCallDict
/usr/lib/libpython3.8.so.1.0: _PyObject_Call_Prepend
/usr/lib/libpython3.8.so.1.0(+0x1f5e09) [0x7f30a507be09]
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: PyEval_EvalCode
/usr/lib/libpython3.8.so.1.0(+0x1d8248) [0x7f30a505e248]
/usr/lib/libpython3.8.so.1.0(+0x1d2483) [0x7f30a5058483]
/usr/lib/libpython3.8.so.1.0: PyRun_FileExFlags
/usr/lib/libpython3.8.so.1.0: PyRun_SimpleFileExFlags
/usr/lib/libpython3.8.so.1.0: Py_RunMain
/usr/lib/libpython3.8.so.1.0: Py_BytesMain
/usr/lib/libc.so.6: __libc_start_main
python(_start+0x2e) [0x55a547a7704e]

Internal Error occurred, check this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    hello()
  File "/root/taichi/python/taichi/lang/kernel.py", line 548, in wrapped
    return primal(*args, **kwargs)
  File "/root/taichi/python/taichi/lang/kernel.py", line 480, in __call__
    return self.compiled_functions[key](*args)
  File "/root/taichi/python/taichi/lang/kernel.py", line 444, in func__
    t_kernel()
RuntimeError: [offload.cpp:visit_operand@445] !op->has_global_side_effect()

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Loop Split:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Simplified I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [10]
  $4 : for in range($0, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    print $5, "
"
  }
  <i32 x1> $7 = const [1]
  <i32*x1> $8 : global store [$1 <- $7]

Seems we've removed duplicate global ptr in Simplified I now? Can we just do no optimization before offloaded?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ir IR related issues potential bug Something that looks like a bug but not yet confirmed
Projects
None yet
2 participants