[Bug] [opt] [ir] GlobalPtrStmt being optimized out cross offload #1390

archibate · 2020-07-03T17:32:01Z

Describe the bug
GlobalPtrStmt being optimized out cross offload.

To Reproduce

import taichi as ti
ti.init(print_ir=True)
ti.core.toggle_advanced_optimization(False)

x = ti.var(ti.i32, ())

@ti.kernel
def hello():
	x[None] = 0
	for i in range(10):
		print(i)
	x[None] += 1

hello()

Log/Screenshots

[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-mo14utn3
[Taichi] <dev mode>, llvm 10.0.0, commit 28e005cc, python 3.8.3
[Taichi] Starting on arch=x64
[I 07/04/20 01:29:35.034] [compile_to_offloads.cpp:operator()@23] Initial IR:
kernel {
  #@tmp0[] = 0
  $1 : for @tmp1 in range((cast_value<int32> 0), (cast_value<int32> 10)) block_dim=adaptive {
    $2 = eval @tmp1
    print %2, "
"
  }
  #@tmp0[] = 1
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Lowered:
kernel {
  <i32 x1> $0 = const [0]
  <i32 x1> $1 = global ptr [S2place_i32], index [] activate=true
  $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32 x1> $11 = global ptr [S2place_i32], index [] activate=true
  $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Typechecked:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Loop Vectorized:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Loop Split:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Simplified I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [10]
  $4 : for in range($0, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    print $5, "
"
  }
  <i32 x1> $7 = const [1]
  <i32*x1> $8 : global store [$1 <- $7]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Dense struct-for demoted:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [10]
  $4 : for in range($0, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    print $5, "
"
  }
  <i32 x1> $7 = const [1]
  <i32*x1> $8 : global store [$1 <- $7]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Optimized by CFG I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32 x1> $2 = const [10]
  $3 : for in range($0, $2) (vectorize 1) block_dim=adaptive {
    <i32 x1> $4 = loop $3 index 0
    print $4, "
"
  }
  <i32 x1> $6 = const [1]
  <i32*x1> $7 : global store [$1 <- $6]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Access flagged I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32 x1> $2 = const [10]
  $3 : for in range($0, $2) (vectorize 1) block_dim=adaptive {
    <i32 x1> $4 = loop $3 index 0
    print $4, "
"
  }
  <i32 x1> $6 = const [1]
  <i32*x1> $7 : global store [$1 <- $6]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Simplified II:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32 x1> $2 = const [10]
  $3 : for in range($0, $2) (vectorize 1) block_dim=adaptive {
    <i32 x1> $4 = loop $3 index 0
    print $4, "
"
  }
  <i32 x1> $6 = const [1]
  <i32*x1> $7 : global store [$1 <- $6]
}
[E 07/04/20 01:29:35.036] [offload.cpp:visit_operand@445] !op->has_global_side_effect()


***********************************
* Taichi Compiler Stack Traceback *
***********************************
/tmp/taichi-mo14utn3/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)
/tmp/taichi-mo14utn3/taichi_core.so(+0x995377) [0x7f30949eb377]
/tmp/taichi-mo14utn3/taichi_core.so(+0x99319d) [0x7f30949e919d]
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::IRVisitor::visit(taichi::lang::GlobalStoreStmt*)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::BasicStmtVisitor::visit(taichi::lang::Block*)
/tmp/taichi-mo14utn3/taichi_core.so(+0x994565) [0x7f30949ea565]
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::BasicStmtVisitor::visit(taichi::lang::Block*)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::irpass::offload(taichi::lang::IRNode*)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::irpass::compile_to_offloads(taichi::lang::IRNode*, taichi::lang::CompileConfig const&, bool, bool, bool, bool, bool, bool)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Kernel::lower(bool)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Program::compile(taichi::lang::Kernel&)
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Kernel::compile()
/tmp/taichi-mo14utn3/taichi_core.so: taichi::lang::Kernel::operator()()
/tmp/taichi-mo14utn3/taichi_core.so(+0x870477) [0x7f30948c6477]
/tmp/taichi-mo14utn3/taichi_core.so(+0x7b9148) [0x7f309480f148]
/usr/lib/libpython3.8.so.1.0: PyCFunction_Call
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/usr/lib/libpython3.8.so.1.0(+0x13e8c9) [0x7f30a4fc48c9]
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0(+0x98f6d) [0x7f30a4f1ef6d]
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyObject_FastCallDict
/usr/lib/libpython3.8.so.1.0: _PyObject_Call_Prepend
/usr/lib/libpython3.8.so.1.0(+0x1f5e09) [0x7f30a507be09]
/usr/lib/libpython3.8.so.1.0: PyObject_Call
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName
/usr/lib/libpython3.8.so.1.0: PyEval_EvalCode
/usr/lib/libpython3.8.so.1.0(+0x1d8248) [0x7f30a505e248]
/usr/lib/libpython3.8.so.1.0(+0x1d2483) [0x7f30a5058483]
/usr/lib/libpython3.8.so.1.0: PyRun_FileExFlags
/usr/lib/libpython3.8.so.1.0: PyRun_SimpleFileExFlags
/usr/lib/libpython3.8.so.1.0: Py_RunMain
/usr/lib/libpython3.8.so.1.0: Py_BytesMain
/usr/lib/libc.so.6: __libc_start_main
python(_start+0x2e) [0x55a547a7704e]

Internal Error occurred, check this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    hello()
  File "/root/taichi/python/taichi/lang/kernel.py", line 548, in wrapped
    return primal(*args, **kwargs)
  File "/root/taichi/python/taichi/lang/kernel.py", line 480, in __call__
    return self.compiled_functions[key](*args)
  File "/root/taichi/python/taichi/lang/kernel.py", line 444, in func__
    t_kernel()
RuntimeError: [offload.cpp:visit_operand@445] !op->has_global_side_effect()

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

[I 07/04/20 01:29:35.035] [compile_to_offloads.cpp:operator()@23] Loop Split:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [0]
  <i32 x1> $4 = cast_value<i32> $3
  <i32 x1> $5 = const [10]
  <i32 x1> $6 = cast_value<i32> $5
  $7 : for in range($4, $6) (vectorize 1) block_dim=adaptive {
    <i32 x1> $8 = loop $7 index 0
    print $8, "
"
  }
  <i32 x1> $10 = const [1]
  <i32*x1> $11 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $12 : global store [$11 <- $10]
}
[I 07/04/20 01:29:35.036] [compile_to_offloads.cpp:operator()@23] Simplified I:
kernel {
  <i32 x1> $0 = const [0]
  <i32*x1> $1 = global ptr [S2place_i32], index [] activate=true
  <i32*x1> $2 : global store [$1 <- $0]
  <i32 x1> $3 = const [10]
  $4 : for in range($0, $3) (vectorize 1) block_dim=adaptive {
    <i32 x1> $5 = loop $4 index 0
    print $5, "
"
  }
  <i32 x1> $7 = const [1]
  <i32*x1> $8 : global store [$1 <- $7]

Seems we've removed duplicate global ptr in Simplified I now? Can we just do no optimization before offloaded?

The text was updated successfully, but these errors were encountered:

archibate added potential bug Something that looks like a bug but not yet confirmed ir IR related issues labels Jul 3, 2020

archibate assigned xumingkuan Jul 3, 2020

xumingkuan mentioned this issue Jul 3, 2020

[Bug] [ir] Fix compilation crash when there's a cross-offload global atomic operation #1392

Merged

xumingkuan closed this as completed in #1392 Jul 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] [opt] [ir] GlobalPtrStmt being optimized out cross offload #1390

[Bug] [opt] [ir] GlobalPtrStmt being optimized out cross offload #1390

archibate commented Jul 3, 2020

[Bug] [opt] [ir] GlobalPtrStmt being optimized out cross offload #1390

[Bug] [opt] [ir] GlobalPtrStmt being optimized out cross offload #1390

Comments

archibate commented Jul 3, 2020