-
Notifications
You must be signed in to change notification settings - Fork 58
[MetaSchedule][Hexagon] conv2d produces different results after tuning #294
Comments
Thanks @psrivas2 for reporting the issue! Two questions that could help us know more about the context:
|
First, it is hexagon specific. On CPU the tuned kernel output is same as untuned output. |
|
I think I have narrowed it down to the reordering of loops. On Hexagon the following two modules which differ only in the order of loops i3 & i4 produce different numeric results. The max difference in values is 0.5 and the mean difference is 0.0708. This is only happening for fp16 dtype. @tvm.script.ir_module
class TuningBug:
@T.prim_func
def conv2d(lv1: T.Buffer[(1, 230, 230, 3), "float16"], param_0: T.Buffer[(7, 7, 3, 64), "float16"], conv2d_nhwc: T.Buffer[(1, 112, 112, 64), "float16"]):
# function attr dict
T.func_attr({"tir.noalias": True, "global_symbol": "conv2d"})
# body
# with T.block("root")
for i0, i1, i2, i3, i4, i5, i6 in T.grid(1, 112, 112, 64, 7, 7, 3):
with T.block("conv2d_nhwc"):
nn, yy, xx, ff, ry, rx, rc = T.axis.remap("SSSSRRR", [i0, i1, i2, i3, i4, i5, i6])
T.reads(lv1[nn, yy * 2 + ry, xx * 2 + rx, rc], param_0[ry, rx, rc, ff])
T.writes(conv2d_nhwc[nn, yy, xx, ff])
with T.init():
conv2d_nhwc[nn, yy, xx, ff] = T.float16(0)
conv2d_nhwc[nn, yy, xx, ff] = (conv2d_nhwc[nn, yy, xx, ff] + lv1[nn, yy * 2 + ry, xx * 2 + rx, rc] * param_0[ry, rx, rc, ff])
@R.function
def main(lv1: R.Tensor[(1, 230, 230, 3), "float16"], param_0: R.Tensor[(T.int64(7), T.int64(7), T.int64(3), T.int64(64)), "float16"]):
with R.dataflow():
gv = R.call_tir(conv2d, (lv1, param_0), (1, 112, 112, 64), dtype="float16")
R.output(gv)
return gv Reorder loops i3 & i4 sch = tvm.tir.Schedule(mod)
b0 = sch.get_block("conv2d_nhwc", func_name="conv2d")
i0, i1, i2, i3, i4, i5, i6 = sch.get_loops(b0)
sch.reorder(i4, i3) the modified module looks like below @tvm.script.ir_module
class TuningBug:
@T.prim_func
def conv2d(lv1: T.Buffer[(1, 230, 230, 3), "float16"], param_0: T.Buffer[(7, 7, 3, 64), "float16"], conv2d_nhwc: T.Buffer[(1, 112, 112, 64), "float16"]):
# function attr dict
T.func_attr({"tir.noalias": True, "global_symbol": "conv2d"})
# body
# with T.block("root")
for i0, i1, i2, i4, i3, i5, i6 in T.grid(1, 112, 112, 7, 64, 7, 3):
with T.block("conv2d_nhwc"):
nn, yy, xx, ff, ry, rx, rc = T.axis.remap("SSSSRRR", [i0, i1, i2, i3, i4, i5, i6])
T.reads(lv1[nn, yy * 2 + ry, xx * 2 + rx, rc], param_0[ry, rx, rc, ff])
T.writes(conv2d_nhwc[nn, yy, xx, ff])
with T.init():
conv2d_nhwc[nn, yy, xx, ff] = T.float16(0)
conv2d_nhwc[nn, yy, xx, ff] = (conv2d_nhwc[nn, yy, xx, ff] + lv1[nn, yy * 2 + ry, xx * 2 + rx, rc] * param_0[ry, rx, rc, ff])
@R.function
def main(lv1: R.Tensor[(1, 230, 230, 3), "float16"], param_0: R.Tensor[(T.int64(7), T.int64(7), T.int64(3), T.int64(64)), "float16"]):
with R.dataflow():
gv = R.call_tir(conv2d, (lv1, param_0), (1, 112, 112, 64), dtype="float16")
R.output(gv)
return gv |
The following PrimFunc produces different results after tuning on hexagon.
Post tuning the PrimFunc is transformed to:
The two PrimFuncs produce different results on hexagon hardware. This needs to be investigated.
The text was updated successfully, but these errors were encountered: