compatability between mxnet, tvm and nnvm #518

kaishijeng · 2017-10-07T17:11:58Z

I have mxnet 0.11.0 installed and mxnet is running OK by itself
However, I got the following error of running python2 deploy_model_on_rasp.py from nnvm tutorial:

[10:06:53] /home/sky/2TB/src/tvm/dmlc-core/include/dmlc/logging.h:308: [10:06:53] /home/sky/2TB/src/mxnet/dmlc-core/include/dmlc/././any.h:289: Check failed: type_->ptype_info == &typeid(T) The stored type mismatch stored=N4nnvm5OpMapISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS6_EES9_EEEE requested=N4nnvm5OpMapISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS6_EES9_EEEE

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f3d0eb8bd3c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(ZZN4nnvm2Op8set_attrISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS7_EESA_EEEERS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKT_iENKUlPN4dmlc3anyEE_clESR+0x136) [0x7f3d213177d6]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN4nnvm2Op13UpdateAttrMapERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvPN4dmlc3anyEEE+0xcc) [0x7f3d2243013c]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN4nnvm2Op8set_attrISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS7_EESA_EEEERS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKT_i+0x190) [0x7f3d213075b0]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(+0x1f77c3) [0x7f3d212727c3]
[bt] (5) /lib64/ld-linux-x86-64.so.2(+0x106ba) [0x7f3d26c206ba]
[bt] (6) /lib64/ld-linux-x86-64.so.2(+0x107cb) [0x7f3d26c207cb]
[bt] (7) /lib64/ld-linux-x86-64.so.2(+0x158e2) [0x7f3d26c258e2]
[bt] (8) /lib64/ld-linux-x86-64.so.2(+0x10564) [0x7f3d26c20564]
[bt] (9) /lib64/ld-linux-x86-64.so.2(+0x14da9) [0x7f3d26c24da9]

terminate called after throwing an instance of 'dmlc::Error'
what(): [10:06:53] /home/fc/2TB/src/mxnet/dmlc-core/include/dmlc/././any.h:289: Check failed: type_->ptype_info == &typeid(T) The stored type mismatch stored=N4nnvm5OpMapISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS6_EES9_EEEE requested=N4nnvm5OpMapISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS6_EES9_EEEE

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(_ZN4dmlc15LogMessageFatalD1Ev+0x3c) [0x7f3d0eb8bd3c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(ZZN4nnvm2Op8set_attrISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS7_EESA_EEEERS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKT_iENKUlPN4dmlc3anyEE_clESR+0x136) [0x7f3d213177d6]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN4nnvm2Op13UpdateAttrMapERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvPN4dmlc3anyEEE+0xcc) [0x7f3d2243013c]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(_ZN4nnvm2Op8set_attrISt8functionIFbRKNS_9NodeAttrsEPSt6vectorINS_6TShapeESaIS7_EESA_EEEERS0_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERKT_i+0x190) [0x7f3d213075b0]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-0.11.0-py2.7.egg/mxnet/libmxnet.so(+0x1f77c3) [0x7f3d212727c3]
[bt] (5) /lib64/ld-linux-x86-64.so.2(+0x106ba) [0x7f3d26c206ba]
[bt] (6) /lib64/ld-linux-x86-64.so.2(+0x107cb) [0x7f3d26c207cb]
[bt] (7) /lib64/ld-linux-x86-64.so.2(+0x158e2) [0x7f3d26c258e2]
[bt] (8) /lib64/ld-linux-x86-64.so.2(+0x10564) [0x7f3d26c20564]
[bt] (9) /lib64/ld-linux-x86-64.so.2(+0x14da9) [0x7f3d26c24da9]

Aborted (core dumped)

Any idea why this happen?

tqchen · 2017-10-07T17:20:59Z

can you check if you can run unittest cases of nnvm compiler alone?

tqchen · 2017-10-07T17:22:10Z

if you have older version of tvm, it is likely that you will need to rebuild and install tvm with the same version as in nnvm repo.

kaishijeng · 2017-10-07T17:25:56Z

How do I run unittest cases of nnvm compiler?
My tvm is the latest one from github

tqchen · 2017-10-07T17:40:50Z

under nnvm, do

nosetests -v tests/python/unittest 
nosetests -v tests/python/compiler

kaishijeng · 2017-10-07T17:45:43Z

Below is what I got:

test_graph.test_json_pass ... ok
test_graph.test_json_pass_with_attr ... ok
test_graph.test_graph_json_attr ... ok
test_graph.test_list_args ... ok
test_graph.test_infer_shape ... ok
test_graph.test_infer_shape_known_partial ... ok
test_graph.test_infer_type ... ok
test_graph.test_plan_memory ... ok
test_graph.test_print_graph_ir ... ok
test_infer_shape.test_dense ... ok
test_infer_shape.test_concatenate ... ok
test_infer_shape.test_expand_dims ... ok
test_infer_shape.test_split ... ok
test_infer_shape.test_batchnorm ... ok
test_infer_shape.test_flatten ... ok
test_infer_shape.test_conv2d ... ok
test_infer_shape.test_conv2d_transpose ... ok
test_infer_shape.test_max_pool2d ... ok
test_infer_shape.test_global_pool2d ... ok
test_infer_shape.test_reshape ... ok
test_infer_shape.test_transpose ... ok
test_infer_shape.test_broadcast_to ... ok
test_infer_shape.test_broadcast_binary ... ok
test_infer_shape.test_reduce ... ok
test_symbol.test_dense ... ok
test_symbol.test_compose ... ok
test_symbol.test_default_input ... [10:43:26] /home/sky/2TB/src/tvm/dmlc-core/include/dmlc/logging.h:308: [10:43:26] src/core/symbolic.cc:309: Not enough argument to call operator elemwise_add

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/libtvm.so(_ZN4dmlc15LogMessageF atalD1Ev+0x3c) [0x7fab29378d3c]
[bt] (1) /home/sky/2TB/src/nnvm/python/nnvm/../../lib/libnnvm_compiler.so(ZN4nnvm6Symbol7ComposeERKN4dmlc10array_vie wIPKS0_EERKSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES4_St4hashISE_ESt8equal_toISE_ESaIS t4pairIKSE_S4_EEERSK+0x10ef) [0x7fab2873eccf]
[bt] (2) /home/sky/2TB/src/nnvm/python/nnvm/../../lib/libnnvm_compiler.so(NNSymbolCompose+0x2e0) [0x7fab28729470]
[bt] (3) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fab3f0a1e40]
[bt] (4) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7fab3f0a18ab]
[bt] (5) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(_ctypes_callproc+0x48f) [0x7fab3f2b13df]
[bt] (6) /usr/lib/python2.7/lib-dynload/_ctypes.x86_64-linux-gnu.so(+0x11d82) [0x7fab3f2b5d82]
[bt] (7) /usr/bin/python(PyObject_Call+0x43) [0x4b0cb3]
[bt] (8) /usr/bin/python(PyEval_EvalFrameEx+0x5faf) [0x4c9faf]
[bt] (9) /usr/bin/python(PyEval_EvalCodeEx+0x255) [0x4c2765]

ok
test_symbol.test_copy ... ok
test_symbol.test_op_name ... ok
test_top_level1.test_dense ... ok
test_top_level1.test_concatenate_split ... ok
test_top_level1.test_expand_dims ... ok
test_top_level1.test_unary ... ok
test_top_level1.test_batchnorm ... ok
test_top_level2.test_conv2d ... ok
test_top_level2.test_max_pool2d ... ok
test_top_level3.test_reshape ... ok
test_top_level3.test_scalar_op ... ok
test_top_level3.test_leaky_relu ... ok
test_top_level4.test_binary_broadcast ... ok
test_top_level4.test_broadcast_to ... ok

Ran 41 tests in 0.098s

OK

tqchen · 2017-10-07T17:51:45Z

I do not have a quite clear clue on what is happening in mxnet side. You can try to upgrade mxnet using the latest pip version, that is how we build docker image to test mxnet frontend https://github.com/dmlc/nnvm/blob/master/tests/ci_build/Dockerfile.gpu

kaishijeng · 2017-10-07T18:22:27Z

If I import mxnet before nnvm, the issue seems to disappear.

However, I got a different error below which says llvm is not enabled:

TVM: Initializing cython mode...
[11:16:58] /home/sky/2TB/src/tvm/dmlc-core/include/dmlc/logging.h:308: [11:16:58] /home/sky/2TB/src/tvm/src/codegen/codegen.cc:27: Check failed: bf != nullptr Tar get llvm is not enabled

Stack trace returned 10 entries:

tqchen · 2017-10-07T18:35:02Z

this is mainly because tvm need to build with llvm support, modify config.mk to do that

kaishijeng · 2017-10-07T22:29:03Z

I installed llvm 5.0/clang 5.0 and is able to pass this error now.
However, another error occurs:

Intrinsic name not mangled correctly for type arguments! Should be: llvm.fmuladd.f32
float (float, float, float)* @llvm.fmuladd.f32.f32.f32
LLVM ERROR: Broken function found, compilation aborted!

tqchen · 2017-10-07T22:31:50Z

This seems to be an error that was fixed in later version of LLVM5, try llvm from here http://apt.llvm.org/

tqchen · 2017-10-07T22:47:04Z

The problem of old llvm should also be fixed by #519

kaishijeng · 2017-10-08T00:31:36Z

It works OK with PR#519
By the way, how do I measure execution time on Raspberry PI and how to compare with a regular mxnet mobilenet on PI?

Thanks,

tqchen · 2017-10-08T00:34:50Z

You can use the RPC module which comes with remote executor, see https://github.com/dmlc/nnvm/blob/master/examples/benchmark/rasp_imagenet_bench.py and
http://nnvm.tvmlang.org/tutorials/deploy_model_on_rasp.html#sphx-glr-tutorials-deploy-model-on-rasp-py

For MXNet reference, we do need to do it manually by scripting a python script

tqchen · 2017-10-08T17:20:28Z

I am going to close this issue as the original problem has been resolved. Feel free to open another one if you have other questions

kaishijeng · 2017-10-09T03:23:29Z

I have firefly 3399 system which is 2xA72 ad 4xA53 running on ubuntu 64 bit.
It has OpenCL 1.2 driver/library.

Can I run the same code below for this platform instead PI:
http://nnvm.tvmlang.org/tutorials/deploy_model_on_rasp.html#sphx-glr-tutorials-deploy-model-on-rasp-py

Also I notice the abve code uses Neon only, but not GPU with OpenCL. Can it be modified to take advantage of GPU in firefly3399?

Thanks,

tqchen · 2017-10-09T03:33:02Z

You might need to update the target triple to match the A53/A72 arch, i.e. change line of -target=armv7l-none-linux-anueabihf to triple that matches the processor. You can do OpenCL deployment as long as your host server support opencl see https://github.com/dmlc/tvm/blob/master/tests/python/unittest/test_runtime_rpc.py#L92

kaishijeng · 2017-10-09T03:50:09Z

I am confused about the code in deploy_model_on_rasp.py (see below) which use target ="llvm" instead of "llvm -target=armv7l-none-linux-anueabihf -mcpu=cortex-a53 -mattr=+neo" because I use "remote" mode. So it should work with firefly3399 without any change.

use_rasp=False
if use_rasp:
target = "llvm -target=armv7l-none-linux-anueabihf -mcpu=cortex-a53 -mattr=+neon"
else:
target = "llvm"

tqchen · 2017-10-09T04:20:11Z

The tutorial needs to be able to compile on CI machine, which starts a local RPC server on x86.
So the code path use_rasp=False is set up to do that. In real practice, you need to change use_rasp =True.

If you are using arch64 architecture, the target triple needs to change a bit. (check out gcc --verbose on your board)

tqchen · 2017-10-09T04:20:59Z

Since the tutorial causes a bit of your confusion, you are more than welcomed to send in a PR to update the explanation, so that it is more clear to the future readers

kaishijeng · 2017-10-09T04:55:16Z

Are you saying that I need to use_rasp=True and do cross compiling on my local machine which is an x64 machine? If yes, how do I do the cross compile on my host machine? I don't think my local host has installation of gcc for arch64 Thanks,

…

On Sun, Oct 8, 2017 at 9:21 PM, Tianqi Chen ***@***.***> wrote: Since the tutorial causes a bit of your confusion, you are more than welcomed to send in a PR to update the explanation, so that it is more clear to the future readers — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMGg3jSJ3egJqYxZXm5nRn-QyRcwZQofks5sqZ8ugaJpZM4PxbB3> .

kaishijeng · 2017-10-09T05:35:21Z

I change
target = "llvm -target=armv7l-none-linux-anueabihf -mcpu=cortex-a53 -mattr=+neon"
to
target = "llvm -target=aaarch64 -mcpu=cortex-a53 -mattr=+neon"

and test on it on firefly 3399 platform. Unfortunaley, it is crashed with error message on firefly platform and local host:

Error message on Firelfy3399:

INFO:root:RPCServer: connection from ('192.168.20.41', 60826)
INFO:root:Connection from ('192.168.20.41', 60826)
[22:28:21] src/runtime/rpc/rpc_server_env.cc:23: Upload /tmp/tmpdkLwvF/net.o... nbytes=230640
INFO:root:Create shared library based on /tmp/tmpdkLwvF/net.o
INFO:root:load_module /tmp/tmpdkLwvF/net.o.so
[22:28:25] /home/firefly/2TB/src/firefly/tvm/dmlc-core/include/dmlc/logging.h:308: [22:28:25] include/tvm/././runtime/./packed_func.h:601: Check failed: i < num_args (1 vs. 1) not enough argument passed, 1 passed but request arg[1].

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN4dmlc15LogMessageFatalD1Ev+0x44) [0x7f920c551c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZNK3tvm7runtime7TVMArgsixEi+0x1f8) [0x7f921ac3a8]
[bt] (2) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime16RPCModuleGetFuncENS0_7TVMArgsEPNS0_11TVMRetValueE+0x68) [0x7f924653d0]
[bt] (3) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(ZN3tvm7runtime10RPCSession12EventHandler11CallHandlerIPFvNS0_7TVMArgsEPNS0_11TVMRetValueEEEEvT+0x84) [0x7f9246b27c]
[bt] (4) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler16HandlePackedCallEv+0x660) [0x7f92465d78]
[bt] (5) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler13SwitchToStateENS2_5StateE+0x2d4) [0x7f9246b93c]
[bt] (6) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler22HandleRecvPackedSeqArgEv+0x5c8) [0x7f9246c5b0]
[bt] (7) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler15HandleNextEventEPNS0_11TVMRetValueEbPKNS0_10PackedFuncE+0x3a0) [0x7f9246cb88]
[bt] (8) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession22HandleUntilReturnEventEPNS0_11TVMRetValueEbPKNS0_10PackedFuncE+0x1f8) [0x7f92466d00]
[bt] (9) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession10ServerLoopEv+0x74) [0x7f924670f4]

[22:28:25] src/runtime/rpc/rpc_session.cc:751: Shutdown...
INFO:root:Finish serving ('192.168.20.41', 60826)

Error message on Local Host::

Traceback (most recent call last):
File "./deploy_model_on_rasp.py", line 218, in
module = runtime.create(graph, rlib, ctx)
File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/contrib/graph_runtime.py", line 40, in create
fcreate = ctx._rpc_sess.get_function("tvm.graph_runtime.remote_create")
File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/contrib/rpc.py", line 212, in get_function
return self._sess.get_function(name)
File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/_ffi/function.py", line 106, in get_function
"Module has no function '%s'" % name)
AttributeError: Module has no function 'tvm.graph_runtime.remote_create'
terminate called after throwing an instance of 'dmlc::Error'
what(): Except caught from RPC call: [22:28:25] include/tvm/././runtime/./packed_func.h:601: Check failed: i < num_args (1 vs. 1) not enough argument passed, 1 passed but request arg[1].

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN4dmlc15LogMessageFatalD1Ev+0x44) [0x7f920c551c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZNK3tvm7runtime7TVMArgsixEi+0x1f8) [0x7f921ac3a8]
[bt] (2) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime16RPCModuleGetFuncENS0_7TVMArgsEPNS0_11TVMRetValueE+0x68) [0x7f924653d0]
[bt] (3) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(ZN3tvm7runtime10RPCSession12EventHandler11CallHandlerIPFvNS0_7TVMArgsEPNS0_11TVMRetValueEEEEvT+0x84) [0x7f9246b27c]
[bt] (4) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler16HandlePackedCallEv+0x660) [0x7f92465d78]
[bt] (5) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler13SwitchToStateENS2_5StateE+0x2d4) [0x7f9246b93c]
[bt] (6) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler22HandleRecvPackedSeqArgEv+0x5c8) [0x7f9246c5b0]
[bt] (7) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler15HandleNextEventEPNS0_11TVMRetValueEbPKNS0_10PackedFuncE+0x3a0) [0x7f9246cb88]
[bt] (8) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession22HandleUntilReturnEventEPNS0_11TVMRetValueEbPKNS0_10PackedFuncE+0x1f8) [0x7f92466d00]
[bt] (9) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession10ServerLoopEv+0x74) [0x7f924670f4]

Aborted (core dumped)

austingg · 2017-10-09T13:07:30Z

@kaishijeng check your compile target option.

you may try target=aarch64-none-linux-gnueabihf -mcpu=cortex-a53 -mattr=+neon or target=aarch64-none-linux-gnueabihf -mcpu=cortex-a72 -mattr=+neon for RK3399, since RK3399 have 2 cortex-A72 and 4 cortex-A53s.

I have successfully run this tutorial on RK3288 with target=armv7l-none-linux-gnueabihf -mcpu=cortex-a17 -mattr=+neon

kaishijeng · 2017-10-09T16:45:00Z

I tried both targets, but the error is still the same. Critical errors on firefly are below: : [09:33:47] /home/firefly/2TB/src/firefly/tvm/dmlc-core/include/dmlc/logging.h:308: [09:33:47] include/tvm/././runtime/./packed_func.h:601: Check failed: i < num_args (1 vs. 1) not enough argument passed, 1 passed but request arg[1]. Stack trace returned 10 entries: [bt] (0) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN4dmlc15LogMessageFatalD1Ev+0x44) [0x7f85a0651c] [bt] (1) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZNK3tvm7runtime7TVMArgsixEi+0x1f8) [0x7f85aed3a8] [bt] (2) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime16RPCModuleGetFuncENS0_7TVMArgsEPNS0_11TVMRetValueE+0x68) [0x7f85da63d0] [bt] (3) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler11CallHandlerIPFvNS0_7TVMArgsEPNS0_11TVMRetValueEEEEvT_+0x84) [0x7f85dac27c] [bt] (4) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler16HandlePackedCallEv+0x660) [0x7f85da6d78] [bt] (5) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler13SwitchToStateENS2_5StateE+0x2d4) [0x7f85dac93c] [bt] (6) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler22HandleRecvPackedSeqArgEv+0x5c8) [0x7f85dad5b0] [bt] (7) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession12EventHandler15HandleNextEventEPNS0_11TVMRetValueEbPKNS0_10PackedFuncE+0x3a0) [0x7f85dadb88] [bt] (8) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession22HandleUntilReturnEventEPNS0_11TVMRetValueEbPKNS0_10PackedFuncE+0x1f8) [0x7f85da7d00] [bt] (9) /usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/libtvm.so(_ZN3tvm7runtime10RPCSession10ServerLoopEv+0x74) [0x7f85da80f4] Critical errors on the local host are below: Traceback (most recent call last): File "./test-firefly.py", line 218, in <module> module = runtime.create(graph, rlib, ctx) File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/contrib/graph_runtime.py", line 40, in create fcreate = ctx._rpc_sess.get_function("tvm.graph_runtime.remote_create") File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/contrib/rpc.py", line 212, in get_function return self._sess.get_function(name) File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-x86_64.egg/tvm/_ffi/function.py", line 106, in get_function "Module has no function '%s'" % name) AttributeError: Module has no function 'tvm.graph_runtime.remote_create' terminate called after throwing an instance of 'dmlc::Error' what(): Except caught from RPC call: [09:33:47] include/tvm/././runtime/./packed_func.h:601: Check failed: i < num_args (1 vs. 1) not enough argument passed, 1 passed but request arg[1].

…

On Mon, Oct 9, 2017 at 6:07 AM, Yubin Wang ***@***.***> wrote: @kaishijeng <https://github.com/kaishijeng> check your compile target option. you may try target=aarch64-none-linux-gnueabihf -mcpu=cortex-a53 -mattr=+neon or target=aarch64-none-linux-gnueabihf -mcpu=cortex-a72 -mattr=+neon for *RK3399*, since RK3399 have 2 cortext-A72 and 4 cortex-A53s. I have successfully run this tutorial on RK3288 with target=armv7l-none-linux-gnueabihf -mcpu=cortex-a17 -mattr=+neon — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMGg3lKjIjcs9WZaoLmli9IMltj_mSJIks5sqhqYgaJpZM4PxbB3> .

tqchen · 2017-10-09T16:54:11Z

since there is an error Module has no function 'tvm.graph_runtime.remote_create', it could due to you forget to turn on USE_GRAPH_RUNTIME=1 in the board when you build runtime

kaishijeng · 2017-10-09T17:17:16Z

Check config.mk and it configures USE_GRAPH_RUNTIME=1 Thanks,

…

On Mon, Oct 9, 2017 at 9:54 AM, Tianqi Chen ***@***.***> wrote: since there is an error Module has no function 'tvm.graph_runtime.remote_create', it could due to you forget to turn on USE_GRAPH_RUNTIME=1 in the board when you build runtime — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMGg3tvhvd2bod4Jd0ZSApDCPXKqKl_pks5sqk-3gaJpZM4PxbB3> .

kaishijeng · 2017-10-09T17:42:02Z

Any idea why firefly 3399 has the following error which the first error encounters: [09:33:47] /home/firefly/2TB/src/firefly/tvm/dmlc-core/include/dmlc/logging.h:308: [09:33:47] include/tvm/././runtime/./packed_func.h:601: Check failed: i < num_args (1 vs. 1) not enough argument passed, 1 passed but request arg[1].

…

On Mon, Oct 9, 2017 at 10:17 AM, kaishi Jeng ***@***.***> wrote: Check config.mk and it configures USE_GRAPH_RUNTIME=1 Thanks, On Mon, Oct 9, 2017 at 9:54 AM, Tianqi Chen ***@***.***> wrote: > since there is an error Module has no function > 'tvm.graph_runtime.remote_create', it could due to you forget to turn on > USE_GRAPH_RUNTIME=1 in the board when you build runtime > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#518 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/AMGg3tvhvd2bod4Jd0ZSApDCPXKqKl_pks5sqk-3gaJpZM4PxbB3> > . >

tqchen · 2017-10-09T18:09:03Z

It means the function is called with only one argument but the function is expecting two arguments. Need to know which function it corresponds to though.

kaishijeng · 2017-10-09T18:39:47Z

I got it worked now after checkout the latest tvm and rebuild it. My next step is to figure out how to use opencl in Firefly3399 If you have a guidance, please let me know Thanks

…

On Mon, Oct 9, 2017 at 11:09 AM, Tianqi Chen ***@***.***> wrote: It means the function is called with only one argument but the function is expecting two arguments. Need to know which function it corresponds to though. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMGg3uhbBNrgkJ-ntKJ7Panw7fB0yLmMks5sqmFEgaJpZM4PxbB3> .

tqchen · 2017-10-09T18:45:31Z

checkout https://github.com/dmlc/tvm/blob/master/tests/python/unittest/test_runtime_rpc.py#L92

kaishijeng · 2017-10-09T19:01:34Z

Should I run test_runtime_rpc.py on firefly platform? I tried and it gave me errors below. However, I can run est_runtime_rpc.py on the x86 host without any error TVM: Initializing cython mode... INFO:root:RPCServer: bind to localhost:9091 INFO:root:RPCServer: connection from ('127.0.0.1', 40138) INFO:root:Connection from ('127.0.0.1', 40138) Traceback (most recent call last): File "./test_runtime_rpc.py", line 153, in <module> test_rpc_remote_module() File "./test_runtime_rpc.py", line 67, in test_rpc_remote_module n = tvm.convert(1024) File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/api.py", line 65, in convert return _convert_to_node(value) File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/_ffi/node_generic.py", line 40, in convert_to_node return const(value) File "/usr/local/lib/python2.7/dist-packages/tvm-0.1.0-py2.7-linux-aarch64.egg/tvm/_ffi/node_generic.py", line 80, in const return _api_internal._const(value, dtype) AttributeError: 'module' object has no attribute '_const'

…

On Mon, Oct 9, 2017 at 11:45 AM, Tianqi Chen ***@***.***> wrote: checkout https://github.com/dmlc/tvm/blob/master/tests/python/ unittest/test_runtime_rpc.py#L92 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMGg3hNgrfxM3-LQkxTmk2StI6w2AWqjks5sqmnPgaJpZM4PxbB3> .

kaishijeng · 2017-10-09T19:34:52Z

Ignore this error. I forgot to recompile after enable OpenCL flag.
However, I read test_runtime_rpc.py and see these 2 functions, check_remote() and check_remote_link_cl() are not called in test_runtime_rpc.py and they contain openCL unittest

Thanks,

kaishijeng · 2017-10-09T20:05:03Z

I add check_remote_link_cl() at end of test_rpc_remote_module() and it runs OK on firefly3399.
So OpenCL works properly on my firefly system. Now how can I run deploy_model_on_rasp.py with OpenCL enabled?

Thanks,

kaishijeng · 2017-10-09T23:57:38Z

Not sure where to start to enable OpenCL in deploy_model_on_rasp.py.

I can run deploy_model_on_rasp.py in local mode ie, use_rasp=False on firefly 3399. Is the code below which needs to change to generate opencl instead of llvm for cpu?

use_rasp = False

if use_rasp:
target = "llvm -target=armv7l-none-linux-anueabihf -mcpu=cortex-a53 -mattr=+neon"
else:
target = "llvm"

with tvm.target.rasp():
graph, lib, params = nnvm.compiler.build(
net, target, shape={"data": data_shape}, params=params)

Thanks

tqchen · 2017-10-10T00:19:27Z

For now, the NNVM compiler routine via topi is not optimized under ARM openCL. So it is not enabled

We can enable it by adding opencl schedule for ops under https://github.com/dmlc/nnvm/blob/master/python/nnvm/top/
which calls into topi, in theory we could reuse some of the schedules for CUDA, but they won't be optimal

tqchen · 2017-10-10T00:21:52Z

To get the full power of opencl we will need to start by tuning OpenCL perf of topi under these GPUs. After things are done with the single kernel perf. Enabling them in NNVM will be straight forward

kaishijeng · 2017-10-10T02:01:27Z

It looks like not a small effort to enable OpenCL. I will wait for OpenCL enabled demo example first and then do finetuning on firefly3399 by myself later Thanks,

…

On Mon, Oct 9, 2017 at 5:22 PM, Tianqi Chen ***@***.***> wrote: To get the full power of opencl we will need to start by tuning OpenCL perf of topi under these GPUs. After things are done with the single kernel perf. Enabling them in NNVM will be straight forward — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#518 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMGg3o2EmjFg_bu64Ku5g5nmvIuckeaQks5sqrisgaJpZM4PxbB3> .

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Squashed commit [Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) * ... * update * update * print * more [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) * [Meta Schedule] Initiate experiments on CUDA * ... * fix boolean printing Auto Tensor Core (apache#524) Working on Evo Search (apache#542) Squashed commit [Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) * parallel vectorize unroll & random compute location * rebased [Meta Schedule] Per-Store-Feature (apache#521) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) * Squashed commit [Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) * annotate * annotate * lint * test * fix * fix * fix [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) Fix sttr func & schedule naming. Fix schedule -> sch. Add feature extractor. Fix init. Add cost model. Remove unused include. [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) * wip fix * revoke change to gallery * split postprocessors to separate files * rename attrs * minor * minor tweak on utils.h * refactor disallow-dynamic-loop * refactor verify_gpu_code * succesfully give up refactoring parallelize-vectorize-unroll * python structuring * unittests Co-authored-by: Junru Shao <[email protected]> Fix issues. Fix init. Finish random model part. Finish xgb model. Minor fix. Rebase. Add init. Await refactor of callback. Update a bit on the test case. Move impos. Minor fix. More fixes. Remove unused import. Fix per store feature test. Update model save / load. * Fix model save / load with tar. * Fix issues. * Remove dup. Co-authored-by: Junru Shao <[email protected]> User-Interface: Tune-TIR (apache#525) * User-Interface: Tune-TIR * fix fix fix User-Interface: Tune-TE (apache#527) * fix a lot of issues * Add tune-te Get CUDA tuning working (apache#529) [Meta Schedule] Evolutionary Search (apache#522) * Checkpoint. Fix cost model comment. Finish evolutionary seaarch. Remove extra code. Fix compile. Add comments. Add python part. Ad test. Update other files & comments. * Squashed commit [Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> * [TensorIR] GetProducer, GetConsumer (apache#506) * [MetaScheduleRefactor] Annotate&Unannotate (apache#505) * annotate * annotate * lint * test * fix * fix * fix * [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) * Blockize & Tensorize (apache#514) * Blockize & Tensorize * Update tensor intrin * Fix blockized & Recalculate affine flags * Cleanup utils.cc * Add test cases of blockize * Re-enable affine flag checking * Checkpoint. Fix cost model comment. Finish evolutionary seaarch. Remove extra code. Fix compile. Add comments. Add python part. Ad test. Update other files & comments. Fix random seed bug. Minor fix. Fix num-cores. Add docs. Check point. Add max_fail_cnt. Minor fix. Minor fix. Segfault. Fix pointers to trace. Test fix. Remove measure callbacks. Refactor a bit. Split function. Adjust variable name. Minor fixes. Add mutator probs to TuneContext. Add token. Fix loops. Remove include. Add has workload for database. Add check. Add concurrent bitmask. * Fix TuneContext. * Fix haash & stuff. * Modifyy shash. * Remove trace field. * Minor fix. * Fix cbmask. * Fix numbers. Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) Tune relay. Further add interface. Remove unused import Fix rebase. Add task name dispatch. Add task deduplication. Rename extract_task to extract_task_from_relay Remove duplicate function def. Minor fix.

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (#1) Hot fix for bound predicate (#3) [Meta Schedule] Update Tune Relay (#4) [Performance Align] fixing codegen problems (#5) [PerfAlign] NRM & SFM on Raspi Aligned (#6) [BugFix] Apply bound predicate directly to loops when possible (#12) [BugFix] Fix CrossThreadReduction on CUDA (#13) [MetaSchedule] Enable BertTuning with MetaScheduler (#11) [Minor][MemHammer] Minor tweaks in code review (#14) [Meta Schedule] Add customizable search space to PostOrderApply. (#16) Fix cooperative fetching (#17) Fixes for codegen (#18) [Hotfix] A unittest (#19) Fix for GRP sketch gen (#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (#22) [MemHammer][Refactor] Code Review (#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (#24) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (#1) Hot fix for bound predicate (apache#3) [Meta Schedule] Update Tune Relay (apache#4) [Performance Align] fixing codegen problems (apache#5) [PerfAlign] NRM & SFM on Raspi Aligned (apache#6) [BugFix] Apply bound predicate directly to loops when possible (apache#12) [BugFix] Fix CrossThreadReduction on CUDA (apache#13) [MetaSchedule] Enable BertTuning with MetaScheduler (apache#11) [Minor][MemHammer] Minor tweaks in code review (apache#14) [Meta Schedule] Add customizable search space to PostOrderApply. (apache#16) Fix cooperative fetching (apache#17) Fixes for codegen (apache#18) [Hotfix] A unittest (apache#19) Fix for GRP sketch gen (apache#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22) [MemHammer][Refactor] Code Review (apache#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (#1) Hot fix for bound predicate (apache#3) [Meta Schedule] Update Tune Relay (apache#4) [Performance Align] fixing codegen problems (apache#5) [PerfAlign] NRM & SFM on Raspi Aligned (apache#6) [BugFix] Apply bound predicate directly to loops when possible (apache#12) [BugFix] Fix CrossThreadReduction on CUDA (apache#13) [MetaSchedule] Enable BertTuning with MetaScheduler (apache#11) [Minor][MemHammer] Minor tweaks in code review (apache#14) [Meta Schedule] Add customizable search space to PostOrderApply. (apache#16) Fix cooperative fetching (apache#17) Fixes for codegen (apache#18) [Hotfix] A unittest (apache#19) Fix for GRP sketch gen (apache#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22) [MemHammer][Refactor] Code Review (apache#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]> fix some fixes fix test

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (#1) Hot fix for bound predicate (#3) [Meta Schedule] Update Tune Relay (#4) [Performance Align] fixing codegen problems (#5) [PerfAlign] NRM & SFM on Raspi Aligned (#6) [BugFix] Apply bound predicate directly to loops when possible (#12) [BugFix] Fix CrossThreadReduction on CUDA (#13) [MetaSchedule] Enable BertTuning with MetaScheduler (#11) [Minor][MemHammer] Minor tweaks in code review (#14) [Meta Schedule] Add customizable search space to PostOrderApply. (#16) Fix cooperative fetching (#17) Fixes for codegen (#18) [Hotfix] A unittest (#19) Fix for GRP sketch gen (#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (#22) [MemHammer][Refactor] Code Review (#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (#24) Import & Cache Mechanism (#26) [BugFix] Fix Winograd Test Script (#25) Add task extraction & caching (#27) A few fixes for task extraction (#28) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (apache#1) Hot fix for bound predicate (apache#3) [Meta Schedule] Update Tune Relay (apache#4) [Performance Align] fixing codegen problems (apache#5) [PerfAlign] NRM & SFM on Raspi Aligned (apache#6) [BugFix] Apply bound predicate directly to loops when possible (apache#12) [BugFix] Fix CrossThreadReduction on CUDA (apache#13) [MetaSchedule] Enable BertTuning with MetaScheduler (apache#11) [Minor][MemHammer] Minor tweaks in code review (apache#14) [Meta Schedule] Add customizable search space to PostOrderApply. (apache#16) Fix cooperative fetching (apache#17) Fixes for codegen (apache#18) [Hotfix] A unittest (apache#19) Fix for GRP sketch gen (apache#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22) [MemHammer][Refactor] Code Review (apache#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (apache#1) Hot fix for bound predicate (apache#3) [Meta Schedule] Update Tune Relay (apache#4) [Performance Align] fixing codegen problems (apache#5) [PerfAlign] NRM & SFM on Raspi Aligned (apache#6) [BugFix] Apply bound predicate directly to loops when possible (apache#12) [BugFix] Fix CrossThreadReduction on CUDA (apache#13) [MetaSchedule] Enable BertTuning with MetaScheduler (apache#11) [Minor][MemHammer] Minor tweaks in code review (apache#14) [Meta Schedule] Add customizable search space to PostOrderApply. (apache#16) Fix cooperative fetching (apache#17) Fixes for codegen (apache#18) [Hotfix] A unittest (apache#19) Fix for GRP sketch gen (apache#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22) [MemHammer][Refactor] Code Review (apache#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24) Import & Cache Mechanism (apache#26) [BugFix] Fix Winograd Test Script (apache#25) Add task extraction & caching (apache#27) A few fixes for task extraction (apache#28) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (apache#1) Hot fix for bound predicate (apache#3) [Meta Schedule] Update Tune Relay (apache#4) [Performance Align] fixing codegen problems (apache#5) [PerfAlign] NRM & SFM on Raspi Aligned (apache#6) [BugFix] Apply bound predicate directly to loops when possible (apache#12) [BugFix] Fix CrossThreadReduction on CUDA (apache#13) [MetaSchedule] Enable BertTuning with MetaScheduler (apache#11) [Minor][MemHammer] Minor tweaks in code review (apache#14) [Meta Schedule] Add customizable search space to PostOrderApply. (apache#16) Fix cooperative fetching (apache#17) Fixes for codegen (apache#18) [Hotfix] A unittest (apache#19) Fix for GRP sketch gen (apache#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22) [MemHammer][Refactor] Code Review (apache#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (apache#1) Hot fix for bound predicate (apache#3) [Meta Schedule] Update Tune Relay (apache#4) [Performance Align] fixing codegen problems (apache#5) [PerfAlign] NRM & SFM on Raspi Aligned (apache#6) [BugFix] Apply bound predicate directly to loops when possible (apache#12) [BugFix] Fix CrossThreadReduction on CUDA (apache#13) [MetaSchedule] Enable BertTuning with MetaScheduler (apache#11) [Minor][MemHammer] Minor tweaks in code review (apache#14) [Meta Schedule] Add customizable search space to PostOrderApply. (apache#16) Fix cooperative fetching (apache#17) Fixes for codegen (apache#18) [Hotfix] A unittest (apache#19) Fix for GRP sketch gen (apache#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22) [MemHammer][Refactor] Code Review (apache#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24) Import & Cache Mechanism (apache#26) [BugFix] Fix Winograd Test Script (apache#25) Add task extraction & caching (apache#27) A few fixes for task extraction (apache#28) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

[Meta Schedule][M3c] Schedule Rules, Mutator & Postprocs (apache#485) [Meta Schedule][M3c] PostOrderApply (apache#486) Fix Post Order Apply (apache#490) [MetaSchedule] Relay Integration (apache#489) [M3c][Meta Schedule] Add Trace Correctness Test for PostOrderApply (apache#492) Fix replay trace. (apache#493) [M3c][Meta Schedule] Implement the Replay Func class. (apache#495) [PR] Test script for meta-schedule task extraction. Interface to load… (apache#494) [Meta Schedule Refactor] Get child blocks (apache#500) Read-at && Write-at (apache#497) [M3c][Meta Schedule] Measure Callbacks (apache#498) [Bug] Fix Infinite Loop Caused When Calling Methods Not Overrided In PyClass (apache#496) [MetaSchedule] Sample-Perfect-Tile (apache#501) [MetaSchedule] TE Workloads (apache#502) [TensorIR] GetProducer, GetConsumer (apache#506) [MetaScheduleRefactor] Annotate&Unannotate (apache#505) [MetaSchedule] Multi-Level-Tiling & Auto-Inline (apache#503) [Tests] Add unittests for auto-inline and multi-level-tiling (apache#508) [Meta Schedule] Minor Fixes (apache#507) [MetaSchedule] Rewrite Cooperative-Fetching / Unbound-Block / Reduction-Block (apache#509) [MetaSchedule] Rewrite Parallel-Vectorize-Unroll / Verify-GPU / Disallow-Dynamic-Loops (apache#499) [Meta Schedule] Add Helper Function & Minor Modification (apache#512) [MetaSchedule] Test for Rewrite Parallel-Vectorize-Unroll (apache#513) [Meta Schedule] Feature Extractor & Cost Model (apache#510) Blockize & Tensorize (apache#514) Layout Rewriting: Suggest-Index-Map (apache#520) [MetaSchedule] Parallel-Vectorize-Unroll & Random-Compute-Location (apache#516) [Meta Schedule] Per-Store-Feature (apache#521) Add traced schedule for blockize & tensorize (apache#526) [Meta Schedule] Add XGBoost Model & Random Model (apache#519) User-Interface: Tune-TIR (apache#525) User-Interface: Tune-TE (apache#527) [Minor] More logging on python (apache#528) Get CUDA tuning working (apache#529) [MetaSchedule] TensorRT BYOC (apache#518) [BugFix] LocalBuilder API (apache#531) [Meta Schedule] Add Cost Model Update Measure Callback (apache#530) [Bugfix] BuilderInput with default params (apache#532) [MetaSchedule] Mutator-Tile-Size, Mutate-Parallel, Mutate-Unroll (apache#534) [Meta Schedule] Evolutionary Search (apache#522) [BugFix] Remove duplicated definition of MakeMultinomialSampler (apache#535) [Meta Schedule] Fix some bugs (apache#537) Initiate Experiments for CPU Performance Alignment with Ansor (apache#538) [Meta Schedule] Tweak experiment scripts (apache#539) [Meta Schedule] Initiate experiments on CUDA (apache#540) [TIR][Schedule] Buffer transform (apache#523) Auto Tensor Core (apache#524) Working on Evo Search (apache#542) [Meta Schedule] Add Replay Tuning Interface (apache#543) Evolutionary Search on CPU (apache#544) Misc improvement over the error message (apache#545) [TIR][Schedule] Software pipelining (apache#533) [Meta Schedule Refactor] fixing unit tests (apache#547) [MetaSchedule] Mutator-Compute-Location (apache#548) Misc Improvement of Evolutionary Search (apache#549) Hotfix for software pipeline (apache#552) Misc Improvement (apache#550) [Cherry-Pick][TensorIR] Primitive "SetScope" (apache#9738) (apache#555) Rule RFactor (apache#551) [MemHammer] Rewrite Rules (apache#554) [MetaSchedule] Schedule Rule: Cross-Thread Reduction (apache#556) [MetaSchedule] Performance Alignment - NRM and SFM (CUDA) (apache#559) [MetaSchedule] Perf Alignment - NRM on CUDA (apache#560) [TIR] Reorder the block iters of the blocks generated by RFactor (apache#561) Removing 2 unit tests for software pipelining (apache#562) [MemHammer] Lower Pass + Unittests (apache#557) Perf Align: Remove Auto-inline before Multi-level-tiling (apache#564) Fix Sketch Generation Unittests (apache#565) speed up VerifyGpuCode (apache#568) [Performance Align] fixing codegen problems (apache#569) [Meta schedule] improve search space (apache#1) Hot fix for bound predicate (apache#3) [Meta Schedule] Update Tune Relay (apache#4) [Performance Align] fixing codegen problems (apache#5) [PerfAlign] NRM & SFM on Raspi Aligned (apache#6) [BugFix] Apply bound predicate directly to loops when possible (apache#12) [BugFix] Fix CrossThreadReduction on CUDA (apache#13) [MetaSchedule] Enable BertTuning with MetaScheduler (apache#11) [Minor][MemHammer] Minor tweaks in code review (apache#14) [Meta Schedule] Add customizable search space to PostOrderApply. (apache#16) Fix cooperative fetching (apache#17) Fixes for codegen (apache#18) [Hotfix] A unittest (apache#19) Fix for GRP sketch gen (apache#21) Add threadIdx filtering in Multi-Level-Tiling and Verify-GPU-Code (apache#20) [BugFix][TIR] Fix cross-thread reduction when single reduction loop with predicate (apache#10016) (apache#22) [MemHammer][Refactor] Code Review (apache#15) [Meta Schedule] Add Winograd Test for Customizable Search Space (apache#24) Co-authored-by: Siyuan Feng <[email protected]> Co-authored-by: Bohan Hou <[email protected]> Co-authored-by: Hongyi Jin <[email protected]> Co-authored-by: Ruihang Lai <[email protected]> Co-authored-by: Junru Shao <[email protected]> Co-authored-by: Wuwei Lin <[email protected]> Co-authored-by: Sunghyun Park <[email protected]> Co-authored-by: Xiyou Zhou <[email protected]>

tqchen closed this as completed Oct 8, 2017

compatability between mxnet, tvm and nnvm #518

compatability between mxnet, tvm and nnvm #518

Comments

kaishijeng commented Oct 7, 2017

tqchen commented Oct 7, 2017

tqchen commented Oct 7, 2017

kaishijeng commented Oct 7, 2017

tqchen commented Oct 7, 2017 • edited Loading

kaishijeng commented Oct 7, 2017

tqchen commented Oct 7, 2017

kaishijeng commented Oct 7, 2017

tqchen commented Oct 7, 2017

kaishijeng commented Oct 7, 2017

tqchen commented Oct 7, 2017

tqchen commented Oct 7, 2017

kaishijeng commented Oct 8, 2017

tqchen commented Oct 8, 2017

tqchen commented Oct 8, 2017

kaishijeng commented Oct 9, 2017

tqchen commented Oct 9, 2017 • edited Loading

kaishijeng commented Oct 9, 2017

tqchen commented Oct 9, 2017

tqchen commented Oct 9, 2017

kaishijeng commented Oct 9, 2017 via email

kaishijeng commented Oct 9, 2017

austingg commented Oct 9, 2017 • edited Loading

kaishijeng commented Oct 9, 2017 via email

tqchen commented Oct 9, 2017

kaishijeng commented Oct 9, 2017 via email

kaishijeng commented Oct 9, 2017 via email

tqchen commented Oct 9, 2017

kaishijeng commented Oct 9, 2017 via email

tqchen commented Oct 9, 2017

kaishijeng commented Oct 9, 2017 via email

kaishijeng commented Oct 9, 2017

kaishijeng commented Oct 9, 2017

kaishijeng commented Oct 9, 2017

tqchen commented Oct 10, 2017

tqchen commented Oct 10, 2017

kaishijeng commented Oct 10, 2017 via email

tqchen commented Oct 7, 2017 •

edited

Loading

tqchen commented Oct 9, 2017 •

edited

Loading

austingg commented Oct 9, 2017 •

edited

Loading