[opengl] Randomly breaking down mpm128.py #633

archibate · 2020-03-21T11:28:02Z

No description provided.

archibate · 2020-03-21T22:32:47Z

I surprisingly found that I can't reproduce this bug now..

archibate · 2020-03-26T05:39:33Z

I surprisingly found that I can't reproduce this bug now..

Maybe you should move group_size = xxx back into reset() to cause that offload fault.

yuanming-hu · 2020-03-27T00:48:44Z

I built the OpenGL backend on my end and run into this issue on mpm99.py. I also tested mpm128 and got a similar issue. Do you have an idea? :-)

python mpm99.py 
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-_zvu7wvc
[Taichi] sandbox prepared
[Taichi] version 0.5.8, cuda 10.0, commit a66eba07, python 3.6.9
[I 03/26/20 20:33:30.700] [program.cpp:materialize_layout@255] OpenGL root buffer size: 1114112 B
[W 03/26/20 20:33:30.700] [opengl_api.cpp:initialize_opengl@194] OpenGL backend currently WIP, MAY NOT WORK
[I 03/26/20 20:33:30.869] [opengl_api.cpp:initialize_opengl@223] [glsl] OpenGL 4.3.0 NVIDIA 430.26
[E 03/26/20 20:33:30.893] [opengl_api.cpp:compile@62] [glsl] error while compiling shader:
  1 #version 430 core
  2 precision highp float;
  3 #define S25 const int // place float
  4 #define S25_stride 4 // sizeof(float)
  5 #define S24_ch const int
  6 #define S24_get0(a_) (a_) // S25
  7 #define S24_ch_stride (S25_stride)
  8 #define S24 const int // dense
  9 #define S24_n 16384
 10 #define S24_stride (S24_ch_stride * S24_n)
 11 #define S24_children(a_, i) ((a_) + S24_ch_stride * (i))
 12 #define S23 const int // place float
 13 #define S23_stride 4 // sizeof(float)
 14 #define S22 const int // place float
 15 #define S22_stride 4 // sizeof(float)
 16 #define S21_ch const int
 17 #define S21_get0(a_) (a_) // S22
 18 #define S21_get1(a_) ((a_) + (S22_stride)) // S23
 19 #define S21_ch_stride (S22_stride + S23_stride)
 20 #define S21 const int // dense
 21 #define S21_n 16384
 22 #define S21_stride (S21_ch_stride * S21_n)
 23 #define S21_children(a_, i) ((a_) + S21_ch_stride * (i))
 24 #define S20 const int // place float
 25 #define S20_stride 4 // sizeof(float)
 26 #define S19_ch const int
 27 #define S19_get0(a_) (a_) // S20
 28 #define S19_ch_stride (S20_stride)
 29 #define S19 const int // dense
 30 #define S19_n 16384
 31 #define S19_stride (S19_ch_stride * S19_n)
 32 #define S19_children(a_, i) ((a_) + S19_ch_stride * (i))
 33 #define S18 const int // place int
 34 #define S18_stride 4 // sizeof(int)
 35 #define S17_ch const int
 36 #define S17_get0(a_) (a_) // S18
 37 #define S17_ch_stride (S18_stride)
 38 #define S17 const int // dense
 39 #define S17_n 16384
 40 #define S17_stride (S17_ch_stride * S17_n)
 41 #define S17_children(a_, i) ((a_) + S17_ch_stride * (i))
 42 #define S16 const int // place float
 43 #define S16_stride 4 // sizeof(float)
 44 #define S15 const int // place float
 45 #define S15_stride 4 // sizeof(float)
 46 #define S14 const int // place float
 47 #define S14_stride 4 // sizeof(float)
 48 #define S13 const int // place float
 49 #define S13_stride 4 // sizeof(float)
 50 #define S12_ch const int
 51 #define S12_get0(a_) (a_) // S13
 52 #define S12_get1(a_) ((a_) + (S13_stride)) // S14
 53 #define S12_get2(a_) ((a_) + (S13_stride + S14_stride)) // S15
 54 #define S12_get3(a_) ((a_) + (S13_stride + S14_stride + S15_stride)) // S16
 55 #define S12_ch_stride (S13_stride + S14_stride + S15_stride + S16_stride)
 56 #define S12 const int // dense
 57 #define S12_n 16384
 58 #define S12_stride (S12_ch_stride * S12_n)
 59 #define S12_children(a_, i) ((a_) + S12_ch_stride * (i))
 60 #define S11 const int // place float
 61 #define S11_stride 4 // sizeof(float)
 62 #define S10 const int // place float
 63 #define S10_stride 4 // sizeof(float)
 64 #define S9 const int // place float
 65 #define S9_stride 4 // sizeof(float)
 66 #define S8 const int // place float
 67 #define S8_stride 4 // sizeof(float)
 68 #define S7_ch const int
 69 #define S7_get0(a_) (a_) // S8
 70 #define S7_get1(a_) ((a_) + (S8_stride)) // S9
 71 #define S7_get2(a_) ((a_) + (S8_stride + S9_stride)) // S10
 72 #define S7_get3(a_) ((a_) + (S8_stride + S9_stride + S10_stride)) // S11
 73 #define S7_ch_stride (S8_stride + S9_stride + S10_stride + S11_stride)
 74 #define S7 const int // dense
 75 #define S7_n 16384
 76 #define S7_stride (S7_ch_stride * S7_n)
 77 #define S7_children(a_, i) ((a_) + S7_ch_stride * (i))
 78 #define S6 const int // place float
 79 #define S6_stride 4 // sizeof(float)
 80 #define S5 const int // place float
 81 #define S5_stride 4 // sizeof(float)
 82 #define S4_ch const int
 83 #define S4_get0(a_) (a_) // S5
 84 #define S4_get1(a_) ((a_) + (S5_stride)) // S6
 85 #define S4_ch_stride (S5_stride + S6_stride)
 86 #define S4 const int // dense
 87 #define S4_n 16384
 88 #define S4_stride (S4_ch_stride * S4_n)
 89 #define S4_children(a_, i) ((a_) + S4_ch_stride * (i))
 90 #define S3 const int // place float
 91 #define S3_stride 4 // sizeof(float)
 92 #define S2 const int // place float
 93 #define S2_stride 4 // sizeof(float)
 94 #define S1_ch const int
 95 #define S1_get0(a_) (a_) // S2
 96 #define S1_get1(a_) ((a_) + (S2_stride)) // S3
 97 #define S1_ch_stride (S2_stride + S3_stride)
 98 #define S1 const int // dense
 99 #define S1_n 16384
100 #define S1_stride (S1_ch_stride * S1_n)
101 #define S1_children(a_, i) ((a_) + S1_ch_stride * (i))
102 #define S0_ch const int
103 #define S0_get0(a_) (a_) // S1
104 #define S0_get1(a_) ((a_) + (S1_stride)) // S4
105 #define S0_get2(a_) ((a_) + (S1_stride + S4_stride)) // S7
106 #define S0_get3(a_) ((a_) + (S1_stride + S4_stride + S7_stride)) // S12
107 #define S0_get4(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride)) // S17
108 #define S0_get5(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride)) // S19
109 #define S0_get6(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride)) // S21
110 #define S0_get7(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride + S21_stride)) // S24
111 #define S0_ch_stride (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride + S21_stride + S24_stride)
112 #define S0 const int // root
113 #define S0_n 1
114 #define S0_stride (S0_ch_stride * S0_n)
115 #define S0_children(a_, i) ((a_) + S0_ch_stride * (i))
116 
117 layout(std430, binding = 0) buffer data_i32 { int _data_i32_[]; };
118 layout(std430, binding = 0) buffer data_f32 { float _data_f32_[]; };
119 layout(std430, binding = 0) buffer data_f64 { double _data_f64_[]; };
120 #define _mem_i32(x) _data_i32_[(x) >> 2]
121 #define _mem_f32(x) _data_f32_[(x) >> 2]
122 #define _mem_f64(x) _data_f64_[(x) >> 3]
123 #define _Ax_(x) x
124 #define _At_(x) _Ax_(_at_##x(x))
125 uvec4 _rand_;
126 
127 void _init_rand()
128 {
129   uint i = gl_GlobalInvocationID.x;
130   _rand_.x = 123456789 * i * 1000000007;
131   _rand_.y = 362436069;
132   _rand_.z = 521288629;
133   _rand_.w = 88675123;
134 }
135 
136 uint _rand_u32()
137 {
138   uint t = _rand_.x ^ (_rand_.x << 11);
139   _rand_.xyz = _rand_.yzw;
140   _rand_.x = _rand_.y;
141   _rand_.y = _rand_.z;
142   _rand_.z = _rand_.w;
143   _rand_.w = (_rand_.w ^ (_rand_.w >> 19)) ^ (t ^ (t >> 8));
144   return _rand_.w * 1000000007;
145 }
146 
147 float _rand_f32()
148 {
149   return float(_rand_u32()) * (1.0 / 4294967296.0);
150 }
151 
152 double _rand_f64()
153 {
154   return double(_rand_f32());
155 }
156 
157 int _rand_i32()
158 {
159   return int(_rand_u32());
160 }
161 
162 void initialize_c6_00()
163 { // range for
164   // range known at compile time
165   const int _thread_id_ = int(gl_GlobalInvocationID.x);
166   if (_thread_id_ >= 9000) return;
167   const int _it_value_ = 0 + _thread_id_ * 1;
168   const float tmp5 = _rand_f32();
169   const float tmp6 = 0.2;
170   const float tmp7 = float(tmp5 * tmp6);
171   const float tmp8 = 0.3;
172   const float tmp9 = float(tmp7 + tmp8);
173   const int tmp10 = _it_value_;
174   const int tmp11 = 3000;
175   const int tmp12 = int(tmp10 * tmp11 >= 0 ? abs(tmp10) / abs(tmp11) : sign(tmp10) * (abs(tmp10) + abs(tmp11) - 1) / tmp11);
176   const float tmp13 = 0.1;
177   const float tmp14 = float(tmp12);
178   const float tmp15 = float(tmp14 * tmp13);
179   const float tmp16 = float(tmp9 + tmp15);
180   S0 tmp19 = 0;
181   const int tmp199 = 0;
182   S0_ch tmp21 = S0_children(tmp19, tmp199);
183   S1 tmp22 = S0_get0(tmp21);
184   const int tmp23 = (((0 + tmp10) >> 0) & ((1 << 14) - 1));
185   const int tmp201 = 1;
186   const int tmp202 = int(tmp23 * tmp201);
187   const int tmp203 = int(tmp199 + tmp202);
188   S1_ch tmp25 = S1_children(tmp22, tmp203);
189   S2 tmp26 = S1_get0(tmp25);
190   #define _at_tmp26 _mem_f32
191   _At_(tmp26) = tmp16;
192   const float tmp29 = _rand_f32();
193   const float tmp30 = float(tmp29 * tmp6);
194   const float tmp31 = 0.05;
195   const float tmp32 = float(tmp30 + tmp31);
196   const float tmp33 = 0.32;
197   const float tmp34 = float(tmp14 * tmp33);
198   const float tmp35 = float(tmp32 + tmp34);
199   S3 tmp45 = S1_get1(tmp25);
200   #define _at_tmp45 _mem_f32
201   _At_(tmp45) = tmp35;
202   S17 tmp53 = S0_get4(tmp21);
203   S17_ch tmp56 = S17_children(tmp53, tmp203);
204   S18 tmp57 = S17_get0(tmp56);
205   #define _at_tmp57 _mem_i32
206   _At_(tmp57) = tmp12;
207   const float tmp61 = 0.0;
208   S4 tmp66 = S0_get1(tmp21);
209   S4_ch tmp69 = S4_children(tmp66, tmp203);
210   S5 tmp70 = S4_get0(tmp69);
211   #define _at_tmp70 _mem_f32
212   _At_(tmp70) = tmp61;
213   S6 tmp82 = S4_get1(tmp69);
214   #define _at_tmp82 _mem_f32
215   _At_(tmp82) = tmp61;
216   const float tmp86 = 1.0;
217   S12 tmp91 = S0_get3(tmp21);
218   S12_ch tmp94 = S12_children(tmp91, tmp203);
219   S13 tmp95 = S12_get0(tmp94);
220   #define _at_tmp95 _mem_f32
221   _At_(tmp95) = tmp86;
222   S14 tmp107 = S12_get1(tmp94);
223   #define _at_tmp107 _mem_f32
224   _At_(tmp107) = tmp61;
225   S15 tmp119 = S12_get2(tmp94);
226   #define _at_tmp119 _mem_f32
227   _At_(tmp119) = tmp61;
228   S16 tmp131 = S12_get3(tmp94);
229   #define _at_tmp131 _mem_f32
230   _At_(tmp131) = tmp86;
231   S19 tmp139 = S0_get5(tmp21);
232   S19_ch tmp142 = S19_children(tmp139, tmp203);
233   S20 tmp143 = S19_get0(tmp142);
234   #define _at_tmp143 _mem_f32
235   _At_(tmp143) = tmp86;
236 }
237 
238 void main()
239 {
240   _init_rand();
241   initialize_c6_00();
242 }
243 layout(local_size_x = 1792, local_size_y = 1, local_size_z = 1) in;

0(243) : error C7604: layout(layout_size_x = 1792) exceeds maximum value

archibate · 2020-03-27T04:16:22Z

Not the same issue. This is because a hardcoded magic number, @archibate will find that glGetInteger(GL_MAX_THREADS_PER_GROUP); later.

Found: https://stackoverflow.com/questions/39004898/get-maximum-workgroup-size-for-compute-shaders

archibate · 2020-03-30T13:25:11Z

Fixed by 90055dd in #666.

@k-ye

* use GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS instead of 1792 for portability * modify mpm128.py to reproduce bug #633 * Update opengl_api.cpp * misc * gather #define _at_{} * [skip ci] use ptr_signat * no #define _At_ [skip ci] fix typo [skip ci] fix again * attempt to fix opengl on test_loops * [skip ci] really fix test_loops * [skip ci] enable _GLSL_DEBUG & try improve used.atomic_float for all * [skip ci] gtmp test * [skip ci] fix calloc null when gtmp_size uninited * no atan(double, double) * [skip ci] better inform TI_ARCH * [skip ci] Update misc/make_changelog.py * hardcoded _GLSL_NVIDIA for built-in atomic float ops * [skip ci] share work about stride_map_ [skip ci] really did stride_map_ test passing * [skip ci] save my power to sleep * [skip ci] fix const mutable by no const qua struct_compiled_ * [skip ci] also class_children_map_ * [skip ci] no use struct_compiled->source_code * [skip ci] no macro for _earg_i32 * [skip ci] no macro like _arg_{}({}) * also make data/gtmp/extr no macroed * use fancier short_name() to make NV GLSL compiler ridiculously faster * no extra float(...) bracing BinaryOpStmt * [skip ci] remove useless TI_INFO some * auto detect GL_NV_shader_atomic_float * [skip ci] fix typo in atomic sim * apply reviews (thanks to @k-ye!) * [skip ci] fix mpm88/99 bug (do we have better solution?) * [skip ci] disable _GLSL_DEBUG * guard short_name.cpp with TI_NAMESPACE_BEGIN/END * [skip ci] use STR macro by k-ye for shader code * [skip ci] enforce code format * [skip ci] add clang-format off/on guard for STR * [skip ci] enforce code format * platform/opengl -> backends/opengl (like metal does) * [skip ci] use opengl/shaders/*.glsl.h for STR(..) * [skip ci] minor shader code adjustments Co-authored-by: Yuanming Hu <[email protected]> Co-authored-by: Taichi Gardener <[email protected]>

archibate added the potential bug Something that looks like a bug but not yet confirmed label Mar 21, 2020

archibate self-assigned this Mar 26, 2020

archibate changed the title ~~Randomly breaking down mpm128.py in OpenGL~~ [opengl] Randomly breaking down mpm128.py Mar 26, 2020

archibate mentioned this issue Mar 27, 2020

[OpenGL] Support NVIDIA GLSL compiler #666

Merged

archibate added a commit to archibate/taichi that referenced this issue Mar 27, 2020

modify mpm128.py to reproduce bug taichi-dev#633

849a175

archibate closed this as completed Mar 30, 2020

archibate mentioned this issue Apr 23, 2020

[test] test_ad_atomic.py::test_ad_reduce sometimes fails #828

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[opengl] Randomly breaking down mpm128.py #633

[opengl] Randomly breaking down mpm128.py #633

archibate commented Mar 21, 2020

archibate commented Mar 21, 2020

archibate commented Mar 26, 2020

yuanming-hu commented Mar 27, 2020

archibate commented Mar 27, 2020 •

edited

Loading

archibate commented Mar 30, 2020

[opengl] Randomly breaking down mpm128.py #633

[opengl] Randomly breaking down mpm128.py #633

Comments

archibate commented Mar 21, 2020

archibate commented Mar 21, 2020

archibate commented Mar 26, 2020

yuanming-hu commented Mar 27, 2020

archibate commented Mar 27, 2020 • edited Loading

archibate commented Mar 30, 2020

archibate commented Mar 27, 2020 •

edited

Loading