Different value on a matrix when print is present and absent #6663

lin-hitonami · 2022-11-18T08:49:37Z

Describe the bug
Different value on a matrix when print is present and absent
Originally posted in https://forum.taichi-lang.cn/t/topic/3547/4

To Reproduce

import taichi as ti
ti.init(ti.cpu,dynamic_index=True)


@ti.func
def jacob_eigen_test(a:ti.template()):
    p = ti.math.eye(a.n)
    tol = 1.0e-7
    sig = ti.Vector.zero(ti.f32,a.n)
    aMax = 1.0
    print('p1',p[0,0])
    while aMax > tol:
        print('p2',p[0,0])
        aMax = 0

        for i in range(a.n):      # Update transformation matrix
            p[i,0] = -1

    for ii in range(a.n):
        sig[ii] = a[ii,ii]
    return sig, p

@ti.kernel
def test():
    test_S =ti.math.mat4(0)
    Sig, P = jacob_eigen_test(test_S)
    # print(Sig,P)

@ti.kernel
def test2():
    test_S =ti.math.mat4(0)
    Sig, P = jacob_eigen_test(test_S)
    print(Sig,P)


test()
print('test2')
test2()

Log/Screenshots

p1 1.000000
p2 1.000000
test2
p1 1.000000
p2 0.000000
[0.000000, 0.000000, 0.000000, 0.000000] [[-1.000000, 0.000000, 0.000000, 0.000000], [-1.000000, 1.000000, 0.000000, 0.000000], [-1.000000, 0.000000, 1.000000, 0.000000], [-1.000000, 0.000000, 0.000000, 1.000000]]

Additional comments
The IR of test:

kernel {
$0 = offloaded  
body {
  <f32> $1 = const -1.0
  <i32> $2 = const 4
  <i32> $3 = const 0
  <i32> $4 = const 1
  <i32> $5 = const 2
  <f32> $6 = const 0.0
  <f32> $7 = const 1.0
  <[Tensor (4, 4) f32]> $8 = alloca
  <*f32> $9 = shift ptr [$8 + $3]
  <f32> $10 : local store [$9 <- $7]
  <f32> $11 = const 1e-07
  <f32> $12 = alloca
  <f32> $13 : local store [$12 <- $7]
  print "p1 ", $7, "\n"
  $15 : while true {
    <f32> $16 = local load [$12]
    <i32> $17 = cmp_gt $16 $11
    <i32> $18 = bit_and $17 $4
    $19 : if $18 {
    } else {
      $20 : while control nullptr, $3
    }
    <f32> $21 = local load [$9]
    print "p2 ", $21, "\n"
    <f32> $23 : local store [$12 <- $6]
    $24 : for in range($3, $2) block_dim=adaptive {
      <i32> $25 = loop $24 index 0
      <i32> $26 = bit_shl $25 $5
      <*f32> $27 = shift ptr [$8 + $26]
      <f32> $28 : local store [$27 <- $1]
    }
  }
}
}

The IR of test2:

kernel {
$0 = offloaded  
body {
  <f32> $1 = const -1.0
  <i32> $2 = const 4
  <i32> $3 = const 0
  <i32> $4 = const 1
  <i32> $5 = const 2
  <i32> $6 = const 3
  <i32> $7 = const 5
  <i32> $8 = const 6
  <i32> $9 = const 7
  <i32> $10 = const 8
  <i32> $11 = const 9
  <i32> $12 = const 10
  <i32> $13 = const 11
  <i32> $14 = const 12
  <i32> $15 = const 13
  <i32> $16 = const 14
  <i32> $17 = const 15
  <f32> $18 = const 0.0
  <[Tensor (4, 4) f32]> $19 = global tmp var (offset = 0 B)
  <*f32> $20 = shift ptr [$19 + $3]
  $21 : global store [$20 <- $18]
  <*f32> $22 = shift ptr [$19 + $2]
  $23 : global store [$22 <- $18]
  <*f32> $24 = shift ptr [$19 + $10]
  $25 : global store [$24 <- $18]
  <*f32> $26 = shift ptr [$19 + $14]
  $27 : global store [$26 <- $18]
  <i32> $28 = const 16
  <*f32> $29 = shift ptr [$19 + $28]
  $30 : global store [$29 <- $18]
  <i32> $31 = const 20
  <*f32> $32 = shift ptr [$19 + $31]
  $33 : global store [$32 <- $18]
  <i32> $34 = const 24
  <*f32> $35 = shift ptr [$19 + $34]
  $36 : global store [$35 <- $18]
  <i32> $37 = const 28
  <*f32> $38 = shift ptr [$19 + $37]
  $39 : global store [$38 <- $18]
  <i32> $40 = const 32
  <*f32> $41 = shift ptr [$19 + $40]
  $42 : global store [$41 <- $18]
  <i32> $43 = const 36
  <*f32> $44 = shift ptr [$19 + $43]
  $45 : global store [$44 <- $18]
  <i32> $46 = const 40
  <*f32> $47 = shift ptr [$19 + $46]
  $48 : global store [$47 <- $18]
  <i32> $49 = const 44
  <*f32> $50 = shift ptr [$19 + $49]
  $51 : global store [$50 <- $18]
  <i32> $52 = const 48
  <*f32> $53 = shift ptr [$19 + $52]
  $54 : global store [$53 <- $18]
  <i32> $55 = const 52
  <*f32> $56 = shift ptr [$19 + $55]
  $57 : global store [$56 <- $18]
  <i32> $58 = const 56
  <*f32> $59 = shift ptr [$19 + $58]
  $60 : global store [$59 <- $18]
  <i32> $61 = const 60
  <*f32> $62 = shift ptr [$19 + $61]
  $63 : global store [$62 <- $18]
  <*f32> $64 = shift ptr [$19 + $4]
  $65 : global store [$64 <- $18]
  <*f32> $66 = shift ptr [$19 + $5]
  $67 : global store [$66 <- $18]
  <*f32> $68 = shift ptr [$19 + $6]
  $69 : global store [$68 <- $18]
  <*f32> $70 = shift ptr [$19 + $7]
  $71 : global store [$70 <- $18]
  <*f32> $72 = shift ptr [$19 + $8]
  $73 : global store [$72 <- $18]
  <*f32> $74 = shift ptr [$19 + $9]
  $75 : global store [$74 <- $18]
  <*f32> $76 = shift ptr [$19 + $11]
  $77 : global store [$76 <- $18]
  <*f32> $78 = shift ptr [$19 + $12]
  $79 : global store [$78 <- $18]
  <*f32> $80 = shift ptr [$19 + $13]
  $81 : global store [$80 <- $18]
  <*f32> $82 = shift ptr [$19 + $15]
  $83 : global store [$82 <- $18]
  <*f32> $84 = shift ptr [$19 + $16]
  $85 : global store [$84 <- $18]
  <*f32> $86 = shift ptr [$19 + $17]
  $87 : global store [$86 <- $18]
  <f32> $88 = const 1.0
  <[Tensor (4, 4) f32]> $89 = global tmp var (offset = 80 B)
  <*f32> $90 = shift ptr [$89 + $3]
  <*f32> $91 = shift ptr [$89 + $2]
  $92 : global store [$91 <- $18]
  <*f32> $93 = shift ptr [$89 + $10]
  $94 : global store [$93 <- $18]
  <*f32> $95 = shift ptr [$89 + $14]
  $96 : global store [$95 <- $18]
  <*f32> $97 = shift ptr [$89 + $28]
  $98 : global store [$97 <- $18]
  <*f32> $99 = shift ptr [$89 + $31]
  $100 : global store [$99 <- $18]
  <*f32> $101 = shift ptr [$89 + $34]
  $102 : global store [$101 <- $18]
  <*f32> $103 = shift ptr [$89 + $37]
  $104 : global store [$103 <- $18]
  <*f32> $105 = shift ptr [$89 + $40]
  $106 : global store [$105 <- $18]
  <*f32> $107 = shift ptr [$89 + $43]
  $108 : global store [$107 <- $18]
  <*f32> $109 = shift ptr [$89 + $46]
  $110 : global store [$109 <- $18]
  <*f32> $111 = shift ptr [$89 + $49]
  $112 : global store [$111 <- $18]
  <*f32> $113 = shift ptr [$89 + $52]
  $114 : global store [$113 <- $18]
  <*f32> $115 = shift ptr [$89 + $55]
  $116 : global store [$115 <- $18]
  <*f32> $117 = shift ptr [$89 + $58]
  $118 : global store [$117 <- $18]
  <*f32> $119 = shift ptr [$89 + $61]
  $120 : global store [$119 <- $18]
  $121 : global store [$90 <- $88]
  <*f32> $122 = shift ptr [$89 + $4]
  $123 : global store [$122 <- $18]
  <*f32> $124 = shift ptr [$89 + $5]
  $125 : global store [$124 <- $18]
  <*f32> $126 = shift ptr [$89 + $6]
  $127 : global store [$126 <- $18]
  <*f32> $128 = shift ptr [$89 + $7]
  $129 : global store [$128 <- $88]
  <*f32> $130 = shift ptr [$89 + $8]
  $131 : global store [$130 <- $18]
  <*f32> $132 = shift ptr [$89 + $9]
  $133 : global store [$132 <- $18]
  <*f32> $134 = shift ptr [$89 + $11]
  $135 : global store [$134 <- $18]
  <*f32> $136 = shift ptr [$89 + $12]
  $137 : global store [$136 <- $88]
  <*f32> $138 = shift ptr [$89 + $13]
  $139 : global store [$138 <- $18]
  <*f32> $140 = shift ptr [$89 + $15]
  $141 : global store [$140 <- $18]
  <*f32> $142 = shift ptr [$89 + $16]
  $143 : global store [$142 <- $18]
  <*f32> $144 = shift ptr [$89 + $17]
  $145 : global store [$144 <- $88]
  <f32> $146 = const 1e-07
  <[Tensor (4) f32]> $147 = global tmp var (offset = 64 B)
  <*f32> $148 = shift ptr [$147 + $3]
  $149 : global store [$148 <- $18]
  <*f32> $150 = shift ptr [$147 + $2]
  $151 : global store [$150 <- $18]
  <*f32> $152 = shift ptr [$147 + $10]
  $153 : global store [$152 <- $18]
  <*f32> $154 = shift ptr [$147 + $14]
  $155 : global store [$154 <- $18]
  <*f32> $156 = shift ptr [$147 + $4]
  $157 : global store [$156 <- $18]
  <*f32> $158 = shift ptr [$147 + $5]
  $159 : global store [$158 <- $18]
  <*f32> $160 = shift ptr [$147 + $6]
  $161 : global store [$160 <- $18]
  <f32> $162 = alloca
  <f32> $163 : local store [$162 <- $88]
  print "p1 ", $88, "\n"
  $165 : while true {
    <f32> $166 = local load [$162]
    <i32> $167 = cmp_gt $166 $146
    <i32> $168 = bit_and $167 $4
    $169 : if $168 {
    } else {
      $170 : while control nullptr, $3
    }
    <f32> $171 = global load $90
    print "p2 ", $171, "\n"
    <f32> $173 : local store [$162 <- $18]
    $174 : for in range($3, $2) block_dim=adaptive {
      <i32> $175 = loop $174 index 0
      <i32> $176 = bit_shl $175 $5
      <*f32> $177 = shift ptr [$89 + $176]
      $178 : global store [$177 <- $1]
    }
  }
}
}

The text was updated successfully, but these errors were encountered:

lin-hitonami · 2022-11-18T08:51:46Z

Is this related to the dynamic index? @strongoier

lin-hitonami · 2022-11-21T06:57:59Z

This problem exists on taichi 0.9.2 (the oldest version on the pypi).

strongoier · 2022-12-06T13:45:55Z

A simpler code snippet:

import taichi as ti
ti.init(ti.cpu, dynamic_index=True)

@ti.func
def jacob_eigen_test():
    p = ti.Matrix([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])
    loop = 1
    sig = ti.Vector([0, 0, 0, 0])
    print('p1', p[0, 0])
    while loop == 1:
        print('p2', p[0, 0])
        loop = 0
        p[0, 0] = -1
    for i in range(1):
        sig[i] = 2
    return sig, p

@ti.kernel
def test():
    Sig, P = jacob_eigen_test()

@ti.kernel
def test2():
    Sig, P = jacob_eigen_test()
    print(Sig,P)


test()
print('test2')
test2()

Issue: fix #6663 ### Brief Summary In `MatrixPtrStmt`, when `origin` is `GlobalTemporaryStmt`, the semantics of `offset` has changed from the number of bytes to the number of elements. This PR fixes the outdated usage which may overwrite the global tmp buffer. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…v#6820) Issue: fix taichi-dev#6663 ### Brief Summary In `MatrixPtrStmt`, when `origin` is `GlobalTemporaryStmt`, the semantics of `offset` has changed from the number of bytes to the number of elements. This PR fixes the outdated usage which may overwrite the global tmp buffer. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

lin-hitonami added the potential bug Something that looks like a bug but not yet confirmed label Nov 18, 2022

taichi-gardener added this to Taichi Lang Nov 18, 2022

taichi-gardener moved this to Untriaged in Taichi Lang Nov 18, 2022

ailzhang assigned strongoier and lin-hitonami Nov 25, 2022

ailzhang moved this from Untriaged to Todo in Taichi Lang Nov 25, 2022

strongoier mentioned this issue Dec 6, 2022

[Bug] Avoid overwriting global tmp with dynamic_index=True #6820

Merged

strongoier closed this as completed in #6820 Dec 7, 2022

Repository owner moved this from Todo to Done in Taichi Lang Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different value on a matrix when print is present and absent #6663

Different value on a matrix when print is present and absent #6663

lin-hitonami commented Nov 18, 2022 •

edited

Loading

lin-hitonami commented Nov 18, 2022

lin-hitonami commented Nov 21, 2022

strongoier commented Dec 6, 2022

Different value on a matrix when print is present and absent #6663

Different value on a matrix when print is present and absent #6663

Comments

lin-hitonami commented Nov 18, 2022 • edited Loading

lin-hitonami commented Nov 18, 2022

lin-hitonami commented Nov 21, 2022

strongoier commented Dec 6, 2022

lin-hitonami commented Nov 18, 2022 •

edited

Loading