Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenGL] Support NVIDIA GLSL compiler #666

Merged
merged 44 commits into from
Apr 2, 2020

Conversation

archibate
Copy link
Collaborator

@archibate archibate commented Mar 27, 2020

@archibate
Copy link
Collaborator Author

Please checkout this branch and see if you get fixed this issue.

@archibate
Copy link
Collaborator Author

@yuanming-hu See if you can reproduce that now.

@yuanming-hu
Copy link
Member

Thanks. Now I get undefined variable "atomicAdd_at_tmp1347":

python3 mpm99.py  
[Taichi] mode=development
[Taichi] preparing sandbox at /tmp/taichi-atzv1_w_
[Taichi] sandbox prepared
[Taichi] version 0.5.8, cuda 10.0, commit 849a1759, python 3.6.9
[I 03/27/20 12:19:28.165] [program.cpp:materialize_layout@255] OpenGL root buffer size: 1114112 B
[W 03/27/20 12:19:28.165] [opengl_api.cpp:initialize_opengl@194] OpenGL backend currently WIP, MAY NOT WORK
[I 03/27/20 12:19:28.332] [opengl_api.cpp:initialize_opengl@223] [glsl] OpenGL 4.3.0 NVIDIA 430.26
[E 03/27/20 12:19:30.477] [opengl_api.cpp:compile@62] [glsl] error while compiling shader:
  1 #version 430 core
  2 precision highp float;
  3 #define S25 const int // place float
  4 #define S25_stride 4 // sizeof(float)
  5 #define S24_ch const int
  6 #define S24_get0(a_) (a_) // S25
  7 #define S24_ch_stride (S25_stride)
  8 #define S24 const int // dense
  9 #define S24_n 16384
 10 #define S24_stride (S24_ch_stride * S24_n)
 11 #define S24_children(a_, i) ((a_) + S24_ch_stride * (i))
 12 #define S23 const int // place float
 13 #define S23_stride 4 // sizeof(float)
 14 #define S22 const int // place float
 15 #define S22_stride 4 // sizeof(float)
 16 #define S21_ch const int
 17 #define S21_get0(a_) (a_) // S22
 18 #define S21_get1(a_) ((a_) + (S22_stride)) // S23
 19 #define S21_ch_stride (S22_stride + S23_stride)
 20 #define S21 const int // dense
 21 #define S21_n 16384
 22 #define S21_stride (S21_ch_stride * S21_n)
 23 #define S21_children(a_, i) ((a_) + S21_ch_stride * (i))
 24 #define S20 const int // place float
 25 #define S20_stride 4 // sizeof(float)
 26 #define S19_ch const int
 27 #define S19_get0(a_) (a_) // S20
 28 #define S19_ch_stride (S20_stride)
 29 #define S19 const int // dense
 30 #define S19_n 16384
 31 #define S19_stride (S19_ch_stride * S19_n)
 32 #define S19_children(a_, i) ((a_) + S19_ch_stride * (i))
 33 #define S18 const int // place int
 34 #define S18_stride 4 // sizeof(int)
 35 #define S17_ch const int
 36 #define S17_get0(a_) (a_) // S18
 37 #define S17_ch_stride (S18_stride)
 38 #define S17 const int // dense
 39 #define S17_n 16384
 40 #define S17_stride (S17_ch_stride * S17_n)
 41 #define S17_children(a_, i) ((a_) + S17_ch_stride * (i))
 42 #define S16 const int // place float
 43 #define S16_stride 4 // sizeof(float)
 44 #define S15 const int // place float
 45 #define S15_stride 4 // sizeof(float)
 46 #define S14 const int // place float
 47 #define S14_stride 4 // sizeof(float)
 48 #define S13 const int // place float
 49 #define S13_stride 4 // sizeof(float)
 50 #define S12_ch const int
 51 #define S12_get0(a_) (a_) // S13
 52 #define S12_get1(a_) ((a_) + (S13_stride)) // S14
 53 #define S12_get2(a_) ((a_) + (S13_stride + S14_stride)) // S15
 54 #define S12_get3(a_) ((a_) + (S13_stride + S14_stride + S15_stride)) // S16
 55 #define S12_ch_stride (S13_stride + S14_stride + S15_stride + S16_stride)
 56 #define S12 const int // dense
 57 #define S12_n 16384
 58 #define S12_stride (S12_ch_stride * S12_n)
 59 #define S12_children(a_, i) ((a_) + S12_ch_stride * (i))
 60 #define S11 const int // place float
 61 #define S11_stride 4 // sizeof(float)
 62 #define S10 const int // place float
 63 #define S10_stride 4 // sizeof(float)
 64 #define S9 const int // place float
 65 #define S9_stride 4 // sizeof(float)
 66 #define S8 const int // place float
 67 #define S8_stride 4 // sizeof(float)
 68 #define S7_ch const int
 69 #define S7_get0(a_) (a_) // S8
 70 #define S7_get1(a_) ((a_) + (S8_stride)) // S9
 71 #define S7_get2(a_) ((a_) + (S8_stride + S9_stride)) // S10
 72 #define S7_get3(a_) ((a_) + (S8_stride + S9_stride + S10_stride)) // S11
 73 #define S7_ch_stride (S8_stride + S9_stride + S10_stride + S11_stride)
 74 #define S7 const int // dense
 75 #define S7_n 16384
 76 #define S7_stride (S7_ch_stride * S7_n)
 77 #define S7_children(a_, i) ((a_) + S7_ch_stride * (i))
 78 #define S6 const int // place float
 79 #define S6_stride 4 // sizeof(float)
 80 #define S5 const int // place float
 81 #define S5_stride 4 // sizeof(float)
 82 #define S4_ch const int
 83 #define S4_get0(a_) (a_) // S5
 84 #define S4_get1(a_) ((a_) + (S5_stride)) // S6
 85 #define S4_ch_stride (S5_stride + S6_stride)
 86 #define S4 const int // dense
 87 #define S4_n 16384
 88 #define S4_stride (S4_ch_stride * S4_n)
 89 #define S4_children(a_, i) ((a_) + S4_ch_stride * (i))
 90 #define S3 const int // place float
 91 #define S3_stride 4 // sizeof(float)
 92 #define S2 const int // place float
 93 #define S2_stride 4 // sizeof(float)
 94 #define S1_ch const int
 95 #define S1_get0(a_) (a_) // S2
 96 #define S1_get1(a_) ((a_) + (S2_stride)) // S3
 97 #define S1_ch_stride (S2_stride + S3_stride)
 98 #define S1 const int // dense
 99 #define S1_n 16384
100 #define S1_stride (S1_ch_stride * S1_n)
101 #define S1_children(a_, i) ((a_) + S1_ch_stride * (i))
102 #define S0_ch const int
103 #define S0_get0(a_) (a_) // S1
104 #define S0_get1(a_) ((a_) + (S1_stride)) // S4
105 #define S0_get2(a_) ((a_) + (S1_stride + S4_stride)) // S7
106 #define S0_get3(a_) ((a_) + (S1_stride + S4_stride + S7_stride)) // S12
107 #define S0_get4(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride)) // S17
108 #define S0_get5(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride)) // S19
109 #define S0_get6(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride)) // S21
110 #define S0_get7(a_) ((a_) + (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride + S21_stride)) // S24
111 #define S0_ch_stride (S1_stride + S4_stride + S7_stride + S12_stride + S17_stride + S19_stride + S21_stride + S24_stride)
112 #define S0 const int // root
113 #define S0_n 1
114 #define S0_stride (S0_ch_stride * S0_n)
115 #define S0_children(a_, i) ((a_) + S0_ch_stride * (i))
116 
117 layout(std430, binding = 0) buffer data_i32 { int _data_i32_[]; };
118 layout(std430, binding = 0) buffer data_f32 { float _data_f32_[]; };
119 layout(std430, binding = 0) buffer data_f64 { double _data_f64_[]; };
120 #define _mem_i32(x) _data_i32_[(x) >> 2]
121 #define _mem_f32(x) _data_f32_[(x) >> 2]
122 #define _mem_f64(x) _data_f64_[(x) >> 3]
123 #define _Ax_(x) x
124 #define _At_(x) _Ax_(_at_##x(x))
125 #define _Atmf_Def(Add, _f_, _o_, mem, _32, float) float atomic##Add##_##mem##_f##_32(int addr, float rhs) {   int old, new, ret;   do {     old = _##mem##_i##_32(addr);     new = floatBitsToInt(_f_(intBitsToFloat(old) _o_ rhs));   } while (old != atomicCompSwap(_Ax_(_##mem##_i##_32(addr)), old, new));   return intBitsToFloat(old); }
126 #define _Acma_ ,
127 #define _Atm_(func, at, x, rhs) _Ax_(func##at(x, rhs))
128 _Atmf_Def(Add,, +, mem, 32, float)
129 _Atmf_Def(Sub,, -, mem, 32, float)
130 _Atmf_Def(Max, max, _Acma_, mem, 32, float)
131 _Atmf_Def(Min, min, _Acma_, mem, 32, float)
132 
133 
134 void substep_c4_01()
135 { // range for
136   // range known at compile time
137   const int _thread_id_ = int(gl_GlobalInvocationID.x);
138   if (_thread_id_ >= 16384) return;
139   const int _it_value_ = 0 + _thread_id_ * 1;
140   const int tmp61 = _it_value_;
141   const int tmp62 = -1;
142   const int tmp63 = (((0 + tmp61) >> 0) & ((1 << 14) - 1));
143   const int tmp67 = 9000;
144   const int tmp68 = int(tmp63 < tmp67);
145   const int tmp69 = int(tmp62 & tmp68);
146   int tmp70 = 0;
147   tmp70 = tmp63;
148   if (tmp69 != 0) {
149     float tmp73 = 0;
150     float tmp74 = 0;
151     const int tmp75 = tmp70;
152     S0 tmp78 = 0;
153     const int tmp8459 = 0;
154     S0_ch tmp80 = S0_children(tmp78, tmp8459);
155     S1 tmp81 = S0_get0(tmp80);
156     const int tmp82 = (((0 + tmp75) >> 0) & ((1 << 14) - 1));
157     const int tmp8461 = 1;
158     const int tmp8462 = int(tmp82 * tmp8461);
159     const int tmp8463 = int(tmp8459 + tmp8462);
160     S1_ch tmp84 = S1_children(tmp81, tmp8463);
161     S2 tmp85 = S1_get0(tmp84);
162     #define _at_tmp85 _mem_f32
163     const float tmp87 = _At_(tmp85);
164     const float tmp88 = 128.0;
165     const float tmp89 = float(tmp87 * tmp88);
166     tmp73 = tmp89;
167     S3 tmp100 = S1_get1(tmp84);
168     #define _at_tmp100 _mem_f32
169     const float tmp102 = _At_(tmp100);
170     const float tmp103 = float(tmp102 * tmp88);
171     tmp74 = tmp103;
172     float tmp105 = 0;
173     float tmp106 = 0;
174     const float tmp107 = 0.5;
175     const float tmp108 = float(tmp89 - tmp107);
176     tmp105 = tmp108;
177     const float tmp110 = float(tmp103 - tmp107);
178     tmp106 = tmp110;
179     int tmp112 = 0;
180     const int tmp113 = int(tmp108);
181     tmp112 = tmp113;
182     int tmp115 = 0;
183     const int tmp116 = int(tmp110);
184     tmp115 = tmp116;
185     float tmp118 = 0;
186     float tmp119 = 0;
187     tmp118 = tmp89;
188     tmp119 = tmp103;
189     float tmp122 = 0;
190     float tmp123 = 0;
191     const float tmp124 = float(tmp113);
192     const float tmp125 = float(tmp89 - tmp124);
193     tmp122 = tmp125;
194     const float tmp127 = float(tmp116);
195     const float tmp128 = float(tmp103 - tmp127);
196     tmp123 = tmp128;
197     float tmp130 = 0;
198     tmp130 = tmp125;
199     float tmp132 = 0;
200     tmp132 = tmp128;
201     float tmp134 = 0;
202     float tmp135 = 0;
203     const float tmp136 = 1.5;
204     const float tmp137 = float(tmp136 - tmp125);
205     tmp134 = tmp137;
206     const float tmp139 = float(tmp136 - tmp128);
207     tmp135 = tmp139;
208     float tmp141 = 0;
209     float tmp142 = 0;
210     const float tmp143 = float(tmp137 * tmp137);
211     tmp141 = tmp143;
212     const float tmp145 = float(tmp139 * tmp139);
213     tmp142 = tmp145;
214     float tmp147 = 0;
215     float tmp148 = 0;
216     const float tmp149 = float(tmp143 * tmp107);
217     tmp147 = tmp149;
218     const float tmp151 = float(tmp145 * tmp107);
219     tmp148 = tmp151;
220     float tmp153 = 0;
221     float tmp154 = 0;
222     const float tmp156 = 1.0;
223     const float tmp157 = float(tmp125 - tmp156);
224     tmp153 = tmp157;
225     const float tmp159 = float(tmp128 - tmp156);
226     tmp154 = tmp159;
227     float tmp161 = 0;
228     float tmp162 = 0;
229     const float tmp163 = float(tmp157 * tmp157);
230     tmp161 = tmp163;
231     const float tmp165 = float(tmp159 * tmp159);
232     tmp162 = tmp165;
233     float tmp167 = 0;
234     float tmp168 = 0;
235     const float tmp169 = 0.75;
236     const float tmp170 = float(tmp169 - tmp163);
237     tmp167 = tmp170;
238     const float tmp172 = float(tmp169 - tmp165);
239     tmp168 = tmp172;
240     float tmp174 = 0;
241     float tmp175 = 0;
242     const float tmp176 = float(tmp125 - tmp107);
243     tmp174 = tmp176;
244     const float tmp178 = float(tmp128 - tmp107);
245     tmp175 = tmp178;
246     float tmp180 = 0;
247     float tmp181 = 0;
248     const float tmp182 = float(tmp176 * tmp176);
249     tmp180 = tmp182;
250     const float tmp184 = float(tmp178 * tmp178);
251     tmp181 = tmp184;
252     float tmp186 = 0;
253     float tmp187 = 0;
254     const float tmp188 = float(tmp182 * tmp107);
255     tmp186 = tmp188;
256     const float tmp190 = float(tmp184 * tmp107);
257     tmp187 = tmp190;
258     float tmp192 = 0;
259     tmp192 = tmp149;
260     float tmp194 = 0;
261     tmp194 = tmp151;
262     float tmp196 = 0;
263     tmp196 = tmp170;
264     float tmp198 = 0;
265     tmp198 = tmp172;
266     float tmp200 = 0;
267     tmp200 = tmp188;
268     float tmp202 = 0;
269     tmp202 = tmp190;
270     float tmp204 = 0;
271     float tmp205 = 0;
272     float tmp206 = 0;
273     float tmp207 = 0;
274     S7 tmp213 = S0_get2(tmp80);
275     S7_ch tmp216 = S7_children(tmp213, tmp8463);
276     S8 tmp217 = S7_get0(tmp216);
277     #define _at_tmp217 _mem_f32
278     const float tmp219 = _At_(tmp217);
279     const float tmp220 = 0.0001;
280     const float tmp221 = float(tmp219 * tmp220);
281     tmp204 = tmp221;
282     S9 tmp232 = S7_get1(tmp216);
283     #define _at_tmp232 _mem_f32
284     const float tmp234 = _At_(tmp232);
285     const float tmp235 = float(tmp234 * tmp220);
286     tmp205 = tmp235;
287     S10 tmp246 = S7_get2(tmp216);
288     #define _at_tmp246 _mem_f32
289     const float tmp248 = _At_(tmp246);
290     const float tmp249 = float(tmp248 * tmp220);
291     tmp206 = tmp249;
292     S11 tmp260 = S7_get3(tmp216);
293     #define _at_tmp260 _mem_f32
294     const float tmp262 = _At_(tmp260);
295     const float tmp263 = float(tmp262 * tmp220);
296     tmp207 = tmp263;
297     float tmp265 = 0;
298     float tmp266 = 0;
299     float tmp267 = 0;
300     float tmp268 = 0;
301     const float tmp269 = float(tmp156 + tmp221);
302     tmp265 = tmp269;
303     const float tmp272 = 0.0;
304     tmp266 = tmp235;
305     tmp267 = tmp249;
306     const float tmp277 = float(tmp156 + tmp263);
307     tmp268 = tmp277;
308     float tmp279 = 0;
309     float tmp280 = 0;
310     float tmp281 = 0;
311     float tmp282 = 0;
312     S12 tmp288 = S0_get3(tmp80);
313     S12_ch tmp291 = S12_children(tmp288, tmp8463);
314     S13 tmp292 = S12_get0(tmp291);
315     #define _at_tmp292 _mem_f32
316     const float tmp294 = _At_(tmp292);
317     const float tmp295 = float(tmp269 * tmp294);
318     S15 tmp305 = S12_get2(tmp291);
319     #define _at_tmp305 _mem_f32
320     const float tmp307 = _At_(tmp305);
321     const float tmp308 = float(tmp235 * tmp307);
322     const float tmp309 = float(tmp295 + tmp308);
323     tmp279 = tmp309;
324     S14 tmp320 = S12_get1(tmp291);
325     #define _at_tmp320 _mem_f32
326     const float tmp322 = _At_(tmp320);
327     const float tmp323 = float(tmp269 * tmp322);
328     S16 tmp333 = S12_get3(tmp291);
329     #define _at_tmp333 _mem_f32
330     const float tmp335 = _At_(tmp333);
331     const float tmp336 = float(tmp235 * tmp335);
332     const float tmp337 = float(tmp323 + tmp336);
333     tmp280 = tmp337;
334     const float tmp339 = float(tmp249 * tmp294);
335     const float tmp340 = float(tmp277 * tmp307);
336     const float tmp341 = float(tmp339 + tmp340);
337     tmp281 = tmp341;
338     const float tmp343 = float(tmp249 * tmp322);
339     const float tmp344 = float(tmp277 * tmp335);
340     const float tmp345 = float(tmp343 + tmp344);
341     tmp282 = tmp345;
342     _At_(tmp292) = tmp309;
343     _At_(tmp320) = tmp337;
344     _At_(tmp305) = tmp341;
345     _At_(tmp333) = tmp345;
346     float tmp391 = 0;
347     S19 tmp397 = S0_get5(tmp80);
348     S19_ch tmp400 = S19_children(tmp397, tmp8463);
349     S20 tmp401 = S19_get0(tmp400);
350     #define _at_tmp401 _mem_f32
351     const float tmp403 = _At_(tmp401);
352     const float tmp404 = float(tmp156 - tmp403);
353     const float tmp405 = 10.0;
354     const float tmp406 = float(tmp404 * tmp405);
355     const float tmp407 = float(exp(tmp406));
356     int tmp408 = 0;
357     S17 tmp414 = S0_get4(tmp80);
358     S17_ch tmp417 = S17_children(tmp414, tmp8463);
359     S18 tmp418 = S17_get0(tmp417);
360     #define _at_tmp418 _mem_i32
361     const int tmp420 = _At_(tmp418);
362     tmp408 = tmp420;
363     int tmp422 = 0;
364     tmp422 = tmp8461;
365     int tmp424 = 0;
366     tmp424 = tmp8461;
367     const int tmp426 = int(tmp420 == tmp8461);
368     const int tmp427 = int(tmp8461 & tmp426);
369     int tmp428 = 0;
370     int tmp429 = 0;
371     tmp428 = tmp427;
372     const int tmp431 = int(tmp427 == 0);
373     tmp429 = tmp431;
374     const float tmp433 = 0.3;
375     const float tmp434 = (tmp427) != 0 ? (tmp433) : (tmp407);
376     tmp391 = tmp434;
377     float tmp436 = 0;
378     const float tmp437 = 416.66666;
379     const float tmp438 = float(tmp434 * tmp437);
380     float tmp439 = 0;
381     const float tmp440 = 277.77777;
382     const float tmp441 = float(tmp434 * tmp440);
383     tmp439 = tmp441;
384     int tmp443 = 0;
385     tmp443 = tmp420;
386     int tmp445 = 0;
387     tmp445 = tmp8459;
388     int tmp447 = 0;
389     tmp447 = tmp8461;
390     const int tmp449 = int(tmp420 == tmp8459);
391     const int tmp450 = int(tmp8461 & tmp449);
392     int tmp451 = 0;
393     int tmp452 = 0;
394     tmp451 = tmp450;
395     const int tmp454 = int(tmp450 == 0);
396     tmp452 = tmp454;
397     const float tmp456 = (tmp450) != 0 ? (tmp272) : (tmp438);
398     tmp436 = tmp456;
399     float tmp458 = 0;
400     const float tmp469 = _At_(tmp292);
401     tmp458 = tmp469;
402     float tmp471 = 0;
403     const float tmp482 = _At_(tmp320);
404     tmp471 = tmp482;
405     float tmp484 = 0;
406     const float tmp495 = _At_(tmp305);
407     tmp484 = tmp495;
408     float tmp497 = 0;
409     const float tmp508 = _At_(tmp333);
410     tmp497 = tmp508;
411     float tmp510 = 0;
412     tmp510 = tmp469;
413     float tmp512 = 0;
414     tmp512 = tmp482;
415     float tmp514 = 0;
416     tmp514 = tmp495;
417     float tmp516 = 0;
418     tmp516 = tmp508;
419     float tmp518 = 0;
420     tmp518 = tmp469;
421     float tmp520 = 0;
422     tmp520 = tmp482;
423     float tmp522 = 0;
424     tmp522 = tmp495;
425     float tmp524 = 0;
426     tmp524 = tmp508;
427     float tmp526 = 0;
428     const float tmp527 = float(tmp469 + tmp508);
429     tmp526 = tmp527;
430     float tmp529 = 0;
431     const float tmp530 = float(tmp495 - tmp482);
432     tmp529 = tmp530;
433     float tmp532 = 0;
434     const float tmp533 = float(tmp527 * tmp527);
435     const float tmp534 = float(tmp530 * tmp530);
436     const float tmp535 = float(tmp533 + tmp534);
437     const float tmp536 = float(sqrt(tmp535));
438     const float tmp537 = float(tmp156 / tmp536);
439     tmp532 = tmp537;
440     float tmp539 = 0;
441     const float tmp540 = float(tmp527 * tmp537);
442     tmp539 = tmp540;
443     float tmp542 = 0;
444     const float tmp543 = float(tmp530 * tmp537);
445     tmp542 = tmp543;
446     float tmp545 = 0;
447     tmp545 = tmp540;
448     float tmp547 = 0;
449     const float tmp548 = float(-tmp543);
450     tmp547 = tmp548;
451     float tmp550 = 0;
452     tmp550 = tmp543;
453     float tmp552 = 0;
454     tmp552 = tmp540;
455     float tmp554 = 0;
456     float tmp555 = 0;
457     float tmp556 = 0;
458     float tmp557 = 0;
459     const float tmp558 = float(tmp540 * tmp469);
460     const float tmp559 = float(tmp543 * tmp495);
461     const float tmp560 = float(tmp558 + tmp559);
462     tmp554 = tmp560;
463     const float tmp562 = float(tmp540 * tmp482);
464     const float tmp563 = float(tmp543 * tmp508);
465     const float tmp564 = float(tmp562 + tmp563);
466     tmp555 = tmp564;
467     const float tmp566 = float(tmp548 * tmp469);
468     const float tmp567 = float(tmp540 * tmp495);
469     const float tmp568 = float(tmp566 + tmp567);
470     tmp556 = tmp568;
471     const float tmp570 = float(tmp548 * tmp482);
472     const float tmp571 = float(tmp540 * tmp508);
473     const float tmp572 = float(tmp570 + tmp571);
474     tmp557 = tmp572;
475     float tmp574 = 0;
476     tmp574 = tmp540;
477     float tmp576 = 0;
478     tmp576 = tmp548;
479     float tmp578 = 0;
480     tmp578 = tmp543;
481     float tmp580 = 0;
482     tmp580 = tmp540;
483     float tmp582 = 0;
484     tmp582 = tmp560;
485     float tmp584 = 0;
486     tmp584 = tmp564;
487     float tmp586 = 0;
488     tmp586 = tmp568;
489     float tmp588 = 0;
490     tmp588 = tmp572;
491     float tmp590 = 0;
492     float tmp591 = 0;
493     float tmp592 = 0;
494     float tmp593 = 0;
495     float tmp594 = 0;
496     const float tmp595 = float(abs(tmp564));
497     tmp594 = tmp595;
498     float tmp597 = 0;
499     const float tmp598 = 1e-05;
500     tmp597 = tmp598;
501     int tmp600 = 0;
502     tmp600 = tmp8461;
503     const int tmp602 = int(tmp595 < tmp598);
504     const int tmp603 = int(tmp8461 & tmp602);
505     int tmp604 = 0;
506     int tmp605 = 0;
507     tmp604 = tmp603;
508     const int tmp607 = int(tmp603 == 0);
509     tmp605 = tmp607;
510     const float tmp609 = (tmp603) != 0 ? (tmp156) : (tmp272);
511     const float tmp610 = (tmp603) != 0 ? (tmp272) : (tmp272);
512     const float tmp611 = (tmp603) != 0 ? (tmp560) : (tmp272);
513     const float tmp612 = (tmp603) != 0 ? (tmp572) : (tmp272);
514     float tmp613 = 0;
515     const float tmp614 = float(tmp560 - tmp572);
516     const float tmp615 = float(tmp107 * tmp614);
517     const float tmp616 = tmp613;
518     const float tmp617 = (tmp603) != 0 ? (tmp616) : (tmp615);
519     tmp613 = tmp617;
520     float tmp619 = 0;
521     const float tmp620 = float(tmp615 * tmp615);
522     const float tmp621 = tmp619;
523     const float tmp622 = (tmp603) != 0 ? (tmp621) : (tmp620);
524     tmp619 = tmp622;
525     float tmp624 = 0;
526     const float tmp625 = float(tmp620 * tmp620);
527     const float tmp626 = tmp624;
528     const float tmp627 = (tmp603) != 0 ? (tmp626) : (tmp625);
529     tmp624 = tmp627;
530     float tmp629 = 0;
531     const float tmp630 = float(tmp564 * tmp564);
532     const float tmp631 = tmp629;
533     const float tmp632 = (tmp603) != 0 ? (tmp631) : (tmp630);
534     tmp629 = tmp632;
535     float tmp634 = 0;
536     const float tmp635 = float(tmp630 * tmp630);
537     const float tmp636 = tmp634;
538     const float tmp637 = (tmp603) != 0 ? (tmp636) : (tmp635);
539     tmp634 = tmp637;
540     float tmp639 = 0;
541     const float tmp640 = float(tmp620 + tmp630);
542     const float tmp641 = float(sqrt(tmp640));
543     const float tmp642 = tmp639;
544     const float tmp643 = (tmp603) != 0 ? (tmp642) : (tmp641);
545     tmp639 = tmp643;
546     float tmp645 = 0;
547     float tmp646 = 0;
548     const float tmp647 = tmp646;
549     const float tmp648 = (tmp603) != 0 ? (tmp647) : (tmp615);
550     tmp646 = tmp648;
551     int tmp650 = 0;
552     const int tmp651 = tmp650;
553     const int tmp652 = (tmp603) != 0 ? (tmp651) : (tmp8459);
554     tmp650 = tmp652;
555     int tmp654 = 0;
556     const int tmp655 = tmp654;
557     const int tmp656 = (tmp603) != 0 ? (tmp655) : (tmp8461);
558     tmp654 = tmp656;
559     const int tmp659 = int(tmp615 > tmp272);
560     const int tmp660 = int(tmp8461 & tmp659);
561     int tmp661 = 0;
562     int tmp662 = 0;
563     const int tmp663 = tmp661;
564     const int tmp664 = (tmp603) != 0 ? (tmp663) : (tmp660);
565     tmp661 = tmp664;
566     const int tmp666 = int(tmp660 == 0);
567     const int tmp667 = tmp662;
568     const int tmp668 = (tmp603) != 0 ? (tmp667) : (tmp666);
569     tmp662 = tmp668;
570     const float tmp670 = float(tmp615 + tmp641);
571     const float tmp671 = float(tmp564 / tmp670);
572     const float tmp672 = (tmp660) != 0 ? (tmp671) : (tmp272);
573     const float tmp673 = float(tmp615 - tmp641);
574     const float tmp674 = float(tmp564 / tmp673);
575     const float tmp675 = (tmp660) != 0 ? (tmp672) : (tmp674);
576     const float tmp676 = tmp645;
577     const float tmp677 = (tmp603) != 0 ? (tmp676) : (tmp675);
578     tmp645 = tmp677;
579     float tmp679 = 0;
580     const float tmp680 = float(tmp675 * tmp675);
581     const float tmp681 = tmp679;
582     const float tmp682 = (tmp603) != 0 ? (tmp681) : (tmp680);
583     tmp679 = tmp682;
584     const float tmp684 = float(tmp680 + tmp156);
585     const float tmp685 = float(sqrt(tmp684));
586     const float tmp686 = float(tmp156 / tmp685);
587     const float tmp687 = (tmp603) != 0 ? (tmp609) : (tmp686);
588     tmp590 = tmp687;
589     const float tmp689 = float(-tmp675);
590     const float tmp690 = float(tmp689 * tmp686);
591     const float tmp691 = (tmp603) != 0 ? (tmp610) : (tmp690);
592     tmp591 = tmp691;
593     float tmp693 = 0;
594     const float tmp694 = float(tmp686 * tmp686);
595     const float tmp695 = tmp693;
596     const float tmp696 = (tmp603) != 0 ? (tmp695) : (tmp694);
597     tmp693 = tmp696;
598     float tmp698 = 0;
599     const float tmp699 = float(tmp690 * tmp690);
600     const float tmp700 = tmp698;
601     const float tmp701 = (tmp603) != 0 ? (tmp700) : (tmp699);
602     tmp698 = tmp701;
603     const float tmp703 = float(tmp694 * tmp560);
604     const int tmp704 = 2;
605     const float tmp705 = 2.0;
606     const float tmp706 = float(tmp686 * tmp705);
607     const float tmp707 = float(tmp706 * tmp690);
608     const float tmp708 = float(tmp707 * tmp564);
609     const float tmp709 = float(tmp703 - tmp708);
610     const float tmp710 = float(tmp699 * tmp572);
611     const float tmp711 = float(tmp709 + tmp710);
612     const float tmp712 = (tmp603) != 0 ? (tmp611) : (tmp711);
613     float tmp713 = 0;
614     const float tmp714 = tmp713;
615     const float tmp715 = (tmp603) != 0 ? (tmp714) : (tmp699);
616     tmp713 = tmp715;
617     float tmp717 = 0;
618     const float tmp718 = tmp717;
619     const float tmp719 = (tmp603) != 0 ? (tmp718) : (tmp694);
620     tmp717 = tmp719;
621     const float tmp721 = float(tmp699 * tmp560);
622     const float tmp722 = float(tmp721 + tmp708);
623     const float tmp723 = float(tmp694 * tmp572);
624     const float tmp724 = float(tmp722 + tmp723);
625     const float tmp725 = (tmp603) != 0 ? (tmp612) : (tmp724);
626     float tmp726 = 0;
627     float tmp727 = 0;
628     float tmp728 = 0;
629     float tmp729 = 0;
630     float tmp730 = 0;
631     tmp730 = tmp712;
632     float tmp732 = 0;
633     tmp732 = tmp725;
634     int tmp734 = 0;
635     tmp734 = tmp8461;
636     const int tmp736 = int(tmp712 < tmp725);
637     const int tmp737 = int(tmp8461 & tmp736);
638     int tmp738 = 0;
639     int tmp739 = 0;
640     tmp738 = tmp737;
641     const int tmp741 = int(tmp737 == 0);
642     tmp739 = tmp741;
643     float tmp743 = 0;
644     const float tmp744 = tmp743;
645     const float tmp745 = (tmp737) != 0 ? (tmp712) : (tmp744);
646     tmp743 = tmp745;
647     const float tmp747 = (tmp737) != 0 ? (tmp725) : (tmp712);
648     tmp592 = tmp747;
649     const float tmp749 = (tmp737) != 0 ? (tmp745) : (tmp725);
650     tmp593 = tmp749;
651     const float tmp751 = float(-tmp691);
652     const float tmp752 = (tmp737) != 0 ? (tmp751) : (tmp272);
653     const float tmp753 = (tmp737) != 0 ? (tmp687) : (tmp272);
654     const float tmp754 = float(-tmp687);
655     const float tmp755 = (tmp737) != 0 ? (tmp754) : (tmp272);
656     const float tmp756 = (tmp737) != 0 ? (tmp752) : (tmp687);
657     tmp726 = tmp756;
658     const float tmp758 = (tmp737) != 0 ? (tmp753) : (tmp691);
659     tmp727 = tmp758;
660     const float tmp760 = (tmp737) != 0 ? (tmp755) : (tmp751);
661     tmp728 = tmp760;
662     tmp729 = tmp756;
663     float tmp763 = 0;
664     float tmp764 = 0;
665     float tmp765 = 0;
666     float tmp766 = 0;
667     const float tmp767 = float(tmp540 * tmp756);
668     const float tmp768 = float(tmp548 * tmp760);
669     const float tmp769 = float(tmp767 + tmp768);
670     tmp763 = tmp769;
671     const float tmp771 = float(tmp540 * tmp758);
672     const float tmp772 = float(tmp548 * tmp756);
673     const float tmp773 = float(tmp771 + tmp772);
674     tmp764 = tmp773;
675     const float tmp775 = float(tmp543 * tmp756);
676     const float tmp776 = float(tmp540 * tmp760);
677     const float tmp777 = float(tmp775 + tmp776);
678     tmp765 = tmp777;
679     const float tmp779 = float(tmp543 * tmp758);
680     const float tmp780 = float(tmp779 + tmp767);
681     tmp766 = tmp780;
682     float tmp782 = 0;
683     tmp782 = tmp769;
684     float tmp784 = 0;
685     tmp784 = tmp773;
686     float tmp786 = 0;
687     tmp786 = tmp777;
688     float tmp788 = 0;
689     tmp788 = tmp780;
690     float tmp790 = 0;
691     tmp790 = tmp769;
692     float tmp792 = 0;
693     tmp792 = tmp773;
694     float tmp794 = 0;
695     tmp794 = tmp777;
696     float tmp796 = 0;
697     tmp796 = tmp780;
698     float tmp798 = 0;
699     tmp798 = tmp747;
700     float tmp800 = 0;
701     tmp800 = tmp272;
702     float tmp802 = 0;
703     tmp802 = tmp272;
704     float tmp804 = 0;
705     tmp804 = tmp749;
706     float tmp806 = 0;
707     tmp806 = tmp756;
708     float tmp808 = 0;
709     tmp808 = tmp758;
710     float tmp810 = 0;
711     tmp810 = tmp760;
712     float tmp812 = 0;
713     tmp812 = tmp756;
714     float tmp814 = 0;
715     tmp814 = tmp769;
716     float tmp816 = 0;
717     tmp816 = tmp773;
718     float tmp818 = 0;
719     tmp818 = tmp777;
720     float tmp820 = 0;
721     tmp820 = tmp780;
722     float tmp822 = 0;
723     float tmp823 = 0;
724     tmp823 = tmp272;
725     float tmp825 = 0;
726     tmp825 = tmp272;
727     float tmp827 = 0;
728     float tmp828 = 0;
729     tmp828 = tmp756;
730     float tmp830 = 0;
731     tmp830 = tmp758;
732     float tmp832 = 0;
733     tmp832 = tmp760;
734     float tmp834 = 0;
735     tmp834 = tmp756;
736     float tmp836 = 0;
737     float tmp837 = 0;
738     int tmp838 = 0;
739     tmp838 = tmp420;
740     int tmp840 = 0;
741     tmp840 = tmp704;
742     int tmp842 = 0;
743     tmp842 = tmp8461;
744     const int tmp844 = int(tmp420 == tmp704);
745     const int tmp845 = int(tmp8461 & tmp844);
746     int tmp846 = 0;
747     int tmp847 = 0;
748     tmp846 = tmp845;
749     const int tmp849 = int(tmp845 == 0);
750     tmp847 = tmp849;
751     const float tmp851 = 0.975;
752     const float tmp852 = float(max(tmp747, tmp851));
753     const float tmp853 = 1.0045;
754     const float tmp854 = float(min(tmp852, tmp853));
755     const float tmp855 = (tmp845) != 0 ? (tmp854) : (tmp747);
756     tmp837 = tmp855;
757     const float tmp857 = float(tmp747 / tmp855);
758     const float tmp858 = float(tmp403 * tmp857);
759     _At_(tmp401) = tmp858;
760     tmp822 = tmp855;
761     float tmp872 = 0;
762     int tmp873 = 0;
763     const int tmp884 = _At_(tmp418);
764     tmp873 = tmp884;
765     int tmp886 = 0;
766     tmp886 = tmp704;
767     int tmp888 = 0;
768     tmp888 = tmp8461;
769     const int tmp890 = int(tmp884 == tmp704);
770     const int tmp891 = int(tmp8461 & tmp890);
771     int tmp892 = 0;
772     int tmp893 = 0;
773     tmp892 = tmp891;
774     const int tmp895 = int(tmp891 == 0);
775     tmp893 = tmp895;
776     const float tmp897 = float(max(tmp749, tmp851));
777     const float tmp898 = float(min(tmp897, tmp853));
778     const float tmp899 = (tmp891) != 0 ? (tmp898) : (tmp749);
779     tmp872 = tmp899;
780     const float tmp911 = _At_(tmp401);
781     const float tmp912 = float(tmp749 / tmp899);
782     const float tmp913 = float(tmp911 * tmp912);
783     _At_(tmp401) = tmp913;
784     tmp827 = tmp899;
785     const float tmp926 = float(tmp855 * tmp899);
786     tmp836 = tmp926;
787     int tmp928 = 0;
788     const int tmp939 = _At_(tmp418);
789     tmp928 = tmp939;
790     int tmp941 = 0;
791     tmp941 = tmp8459;
792     int tmp943 = 0;
793     tmp943 = tmp8461;
794     const int tmp945 = int(tmp939 == tmp8459);
795     const int tmp946 = int(tmp8461 & tmp945);
796     int tmp947 = 0;
797     int tmp948 = 0;
798     tmp947 = tmp946;
799     const int tmp950 = int(tmp946 == 0);
800     tmp948 = tmp950;
801     const float tmp955 = float(sqrt(tmp926));
802     if (tmp946 != 0) {
803       _At_(tmp292) = tmp955;
804       _At_(tmp320) = tmp272;
805       _At_(tmp305) = tmp272;
806       _At_(tmp333) = tmp955;
807     } else {
808       int tmp1008 = 0;
809       const int tmp1009 = tmp70;
810       S0 tmp1012 = 0;
811       const int tmp8870 = 0;
812       S0_ch tmp1014 = S0_children(tmp1012, tmp8870);
813       S17 tmp1015 = S0_get4(tmp1014);
814       const int tmp1016 = (((0 + tmp1009) >> 0) & ((1 << 14) - 1));
815       const int tmp8872 = 1;
816       const int tmp8873 = int(tmp1016 * tmp8872);
817       const int tmp8874 = int(tmp8870 + tmp8873);
818       S17_ch tmp1018 = S17_children(tmp1015, tmp8874);
819       S18 tmp1019 = S17_get0(tmp1018);
820       #define _at_tmp1019 _mem_i32
821       const int tmp1021 = _At_(tmp1019);
822       tmp1008 = tmp1021;
823       int tmp1023 = 0;
824       const int tmp1024 = 2;
825       tmp1023 = tmp1024;
826       int tmp1026 = 0;
827       tmp1026 = tmp8872;
828       const int tmp1029 = int(tmp1021 == tmp1024);
829       const int tmp1030 = int(tmp8872 & tmp1029);
830       int tmp1031 = 0;
831       int tmp1032 = 0;
832       tmp1031 = tmp1030;
833       const int tmp1034 = int(tmp1030 == 0);
834       tmp1032 = tmp1034;
835       const float tmp1037 = tmp814;
836       const float tmp1038 = tmp822;
837       const float tmp1039 = float(tmp1037 * tmp1038);
838       const float tmp1040 = tmp816;
839       const float tmp1041 = tmp825;
840       const float tmp1042 = float(tmp1040 * tmp1041);
841       const float tmp1043 = float(tmp1039 + tmp1042);
842       const float tmp1044 = tmp823;
843       const float tmp1045 = float(tmp1037 * tmp1044);
844       const float tmp1046 = tmp827;
845       const float tmp1047 = float(tmp1040 * tmp1046);
846       const float tmp1048 = float(tmp1045 + tmp1047);
847       const float tmp1049 = tmp818;
848       const float tmp1050 = float(tmp1049 * tmp1038);
849       const float tmp1051 = tmp820;
850       const float tmp1052 = float(tmp1051 * tmp1041);
851       const float tmp1053 = float(tmp1050 + tmp1052);
852       const float tmp1054 = float(tmp1049 * tmp1044);
853       const float tmp1055 = float(tmp1051 * tmp1046);
854       const float tmp1056 = float(tmp1054 + tmp1055);
855       const float tmp1057 = tmp828;
856       const float tmp1058 = float(tmp1043 * tmp1057);
857       const float tmp1059 = tmp830;
858       const float tmp1060 = float(tmp1048 * tmp1059);
859       const float tmp1061 = float(tmp1058 + tmp1060);
860       const float tmp1062 = tmp832;
861       const float tmp1063 = float(tmp1043 * tmp1062);
862       const float tmp1064 = tmp834;
863       const float tmp1065 = float(tmp1048 * tmp1064);
864       const float tmp1066 = float(tmp1063 + tmp1065);
865       const float tmp1067 = float(tmp1053 * tmp1057);
866       const float tmp1068 = float(tmp1056 * tmp1059);
867       const float tmp1069 = float(tmp1067 + tmp1068);
868       const float tmp1070 = float(tmp1053 * tmp1062);
869       const float tmp1071 = float(tmp1056 * tmp1064);
870       const float tmp1072 = float(tmp1070 + tmp1071);
871       S12 tmp1079 = S0_get3(tmp1014);
872       S12_ch tmp1082 = S12_children(tmp1079, tmp8874);
873       S13 tmp1083 = S12_get0(tmp1082);
874       #define _at_tmp1083 _mem_f32
875       S14 tmp1095 = S12_get1(tmp1082);
876       #define _at_tmp1095 _mem_f32
877       S15 tmp1107 = S12_get2(tmp1082);
878       #define _at_tmp1107 _mem_f32
879       S16 tmp1119 = S12_get3(tmp1082);
880       #define _at_tmp1119 _mem_f32
881       if (tmp1030 != 0) {
882         _At_(tmp1083) = tmp1061;
883         _At_(tmp1095) = tmp1066;
884         _At_(tmp1107) = tmp1069;
885         _At_(tmp1119) = tmp1072;
886       }
887     }
888     const float tmp1122 = tmp814;
889     const float tmp1123 = tmp828;
890     const float tmp1124 = float(tmp1122 * tmp1123);
891     const float tmp1125 = tmp816;
892     const float tmp1126 = tmp830;
893     const float tmp1127 = float(tmp1125 * tmp1126);
894     const float tmp1128 = float(tmp1124 + tmp1127);
895     const float tmp1129 = tmp832;
896     const float tmp1130 = float(tmp1122 * tmp1129);
897     const float tmp1131 = tmp834;
898     const float tmp1132 = float(tmp1125 * tmp1131);
899     const float tmp1133 = float(tmp1130 + tmp1132);
900     const float tmp1134 = tmp818;
901     const float tmp1135 = float(tmp1134 * tmp1123);
902     const float tmp1136 = tmp820;
903     const float tmp1137 = float(tmp1136 * tmp1126);
904     const float tmp1138 = float(tmp1135 + tmp1137);
905     const float tmp1139 = float(tmp1134 * tmp1129);
906     const float tmp1140 = float(tmp1136 * tmp1131);
907     const float tmp1141 = float(tmp1139 + tmp1140);
908     const int tmp1142 = tmp70;
909     const int tmp1149 = (((0 + tmp1142) >> 0) & ((1 << 14) - 1));
910     const int tmp8607 = int(tmp1149 * tmp8461);
911     const int tmp8608 = int(tmp8459 + tmp8607);
912     S12_ch tmp1151 = S12_children(tmp288, tmp8608);
913     S13 tmp1152 = S12_get0(tmp1151);
914     #define _at_tmp1152 _mem_f32
915     const float tmp1154 = _At_(tmp1152);
916     const float tmp1155 = float(tmp1154 - tmp1128);
917     S14 tmp1165 = S12_get1(tmp1151);
918     #define _at_tmp1165 _mem_f32
919     const float tmp1167 = _At_(tmp1165);
920     const float tmp1168 = float(tmp1167 - tmp1133);
921     S15 tmp1178 = S12_get2(tmp1151);
922     #define _at_tmp1178 _mem_f32
923     const float tmp1180 = _At_(tmp1178);
924     const float tmp1181 = float(tmp1180 - tmp1138);
925     S16 tmp1191 = S12_get3(tmp1151);
926     #define _at_tmp1191 _mem_f32
927     const float tmp1193 = _At_(tmp1191);
928     const float tmp1194 = float(tmp1193 - tmp1141);
929     const float tmp1195 = tmp436;
930     const float tmp1196 = float(tmp1195 * tmp705);
931     const float tmp1197 = float(tmp1155 * tmp1196);
932     const float tmp1198 = float(tmp1168 * tmp1196);
933     const float tmp1199 = float(tmp1181 * tmp1196);
934     const float tmp1200 = float(tmp1194 * tmp1196);
935     const float tmp1201 = float(tmp1197 * tmp1154);
936     const float tmp1202 = float(tmp1198 * tmp1167);
937     const float tmp1203 = float(tmp1201 + tmp1202);
938     const float tmp1204 = float(tmp1197 * tmp1180);
939     const float tmp1205 = float(tmp1198 * tmp1193);
940     const float tmp1206 = float(tmp1204 + tmp1205);
941     const float tmp1207 = float(tmp1199 * tmp1154);
942     const float tmp1208 = float(tmp1200 * tmp1167);
943     const float tmp1209 = float(tmp1207 + tmp1208);
944     const float tmp1210 = float(tmp1199 * tmp1180);
945     const float tmp1211 = float(tmp1200 * tmp1193);
946     const float tmp1212 = float(tmp1210 + tmp1211);
947     const float tmp1213 = tmp439;
948     const float tmp1216 = tmp836;
949     const float tmp1217 = float(tmp1213 * tmp1216);
950     const float tmp1219 = float(tmp1216 - tmp156);
951     const float tmp1220 = float(tmp1217 * tmp1219);
952     const float tmp1222 = float(tmp1203 + tmp1220);
953     const float tmp1225 = float(tmp1212 + tmp1220);
954     const float tmp1226 = -0.0001;
955     const float tmp1227 = float(tmp1222 * tmp1226);
956     const float tmp1228 = float(tmp1206 * tmp1226);
957     const float tmp1229 = float(tmp1209 * tmp1226);
958     const float tmp1230 = float(tmp1225 * tmp1226);
959     S7_ch tmp1239 = S7_children(tmp213, tmp8608);
960     S8 tmp1240 = S7_get0(tmp1239);
961     #define _at_tmp1240 _mem_f32
962     const float tmp1242 = _At_(tmp1240);
963     const float tmp1243 = 1.5258789e-05;
964     const float tmp1244 = float(tmp1242 * tmp1243);
965     S9 tmp1254 = S7_get1(tmp1239);
966     #define _at_tmp1254 _mem_f32
967     const float tmp1256 = _At_(tmp1254);
968     const float tmp1257 = float(tmp1256 * tmp1243);
969     S10 tmp1267 = S7_get2(tmp1239);
970     #define _at_tmp1267 _mem_f32
971     const float tmp1269 = _At_(tmp1267);
972     const float tmp1270 = float(tmp1269 * tmp1243);
973     S11 tmp1280 = S7_get3(tmp1239);
974     #define _at_tmp1280 _mem_f32
975     const float tmp1282 = _At_(tmp1280);
976     const float tmp1283 = float(tmp1282 * tmp1243);
977     const float tmp1284 = float(tmp1227 + tmp1244);
978     const float tmp1285 = float(tmp1228 + tmp1257);
979     const float tmp1286 = float(tmp1229 + tmp1270);
980     const float tmp1287 = float(tmp1230 + tmp1283);
981     const float tmp1288 = tmp130;
982     const float tmp1289 = float(tmp272 - tmp1288);
983     const float tmp1290 = tmp132;
984     const float tmp1291 = float(tmp272 - tmp1290);
985     const float tmp1292 = 0.0078125;
986     const float tmp1293 = float(tmp1289 * tmp1292);
987     const float tmp1294 = float(tmp1291 * tmp1292);
988     const float tmp1295 = tmp192;
989     const float tmp1296 = tmp194;
990     const float tmp1297 = float(tmp1295 * tmp1296);
991     const int tmp1298 = tmp112;
992     const int tmp1299 = tmp115;
993     S4 tmp1305 = S0_get1(tmp80);
994     S4_ch tmp1308 = S4_children(tmp1305, tmp8608);
995     S5 tmp1309 = S4_get0(tmp1308);
996     #define _at_tmp1309 _mem_f32
997     const float tmp1311 = _At_(tmp1309);
998     const float tmp1312 = float(tmp1311 * tmp1243);
999     S6 tmp1322 = S4_get1(tmp1308);
1000     #define _at_tmp1322 _mem_f32
1001     const float tmp1324 = _At_(tmp1322);
1002     const float tmp1325 = float(tmp1324 * tmp1243);
1003     const float tmp1326 = float(tmp1284 * tmp1293);
1004     const float tmp1327 = float(tmp1285 * tmp1294);
1005     const float tmp1328 = float(tmp1326 + tmp1327);
1006     const float tmp1329 = float(tmp1286 * tmp1293);
1007     const float tmp1330 = float(tmp1287 * tmp1294);
1008     const float tmp1331 = float(tmp1329 + tmp1330);
1009     const float tmp1332 = float(tmp1312 + tmp1328);
1010     const float tmp1333 = float(tmp1325 + tmp1331);
1011     const float tmp1334 = float(tmp1332 * tmp1297);
1012     const float tmp1335 = float(tmp1333 * tmp1297);
1013     S21 tmp1342 = S0_get6(tmp80);
1014     const int tmp1343 = (((0 + tmp1298) >> 0) & ((1 << 7) - 1));
1015     const int tmp1344 = (((0 + tmp1299) >> 0) & ((1 << 7) - 1));
1016     const int tmp8657 = int(tmp1344 * tmp8461);
1017     const int tmp8658 = int(tmp8459 + tmp8657);
1018     const int tmp8659 = 128;
1019     const int tmp8660 = int(tmp1343 * tmp8659);
1020     const int tmp8661 = int(tmp8658 + tmp8660);
1021     S21_ch tmp1346 = S21_children(tmp1342, tmp8661);
1022     S22 tmp1347 = S21_get0(tmp1346);
1023     #define _at_tmp1347 _mem_f32
1024     float tmp1349 = _Atm_(atomicAdd, _at_tmp1347, tmp1347, tmp1334);
1025     S23 tmp1361 = S21_get1(tmp1346);
1026     #define _at_tmp1361 _mem_f32
1027     float tmp1363 = _Atm_(atomicAdd, _at_tmp1361, tmp1361, tmp1335);
1028     const float tmp1364 = float(tmp1297 * tmp1243);
1029     S24 tmp1371 = S0_get7(tmp80);
1030     S24_ch tmp1375 = S24_children(tmp1371, tmp8661);
1031     S25 tmp1376 = S24_get0(tmp1375);
1032     #define _at_tmp1376 _mem_f32
1033     float tmp1378 = _Atm_(atomicAdd, _at_tmp1376, tmp1376, tmp1364);
1034     const float tmp1380 = float(tmp156 - tmp1290);
1035     const float tmp1381 = float(tmp1380 * tmp1292);
1036     const float tmp1382 = tmp198;
1037     const float tmp1383 = float(tmp1295 * tmp1382);
1038     const int tmp1384 = int(tmp1299 + tmp8461);
1039     const float tmp1385 = float(tmp1285 * tmp1381);
1040     const float tmp1386 = float(tmp1326 + tmp1385);
1041     const float tmp1387 = float(tmp1287 * tmp1381);
1042     const float tmp1388 = float(tmp1329 + tmp1387);
1043     const float tmp1389 = float(tmp1312 + tmp1386);
1044     const float tmp1390 = float(tmp1325 + tmp1388);
1045     const float tmp1391 = float(tmp1389 * tmp1383);
1046     const float tmp1392 = float(tmp1390 * tmp1383);
1047     const int tmp1401 = (((0 + tmp1384) >> 0) & ((1 << 7) - 1));
1048     const int tmp8681 = int(tmp1401 * tmp8461);
1049     const int tmp8682 = int(tmp8459 + tmp8681);
1050     const int tmp8685 = int(tmp8682 + tmp8660);
1051     S21_ch tmp1403 = S21_children(tmp1342, tmp8685);
1052     S22 tmp1404 = S21_get0(tmp1403);
1053     #define _at_tmp1404 _mem_f32
1054     float tmp1406 = _Atm_(atomicAdd, _at_tmp1404, tmp1404, tmp1391);
1055     S23 tmp1418 = S21_get1(tmp1403);
1056     #define _at_tmp1418 _mem_f32
1057     float tmp1420 = _Atm_(atomicAdd, _at_tmp1418, tmp1418, tmp1392);
1058     const float tmp1421 = float(tmp1383 * tmp1243);
1059     S24_ch tmp1432 = S24_children(tmp1371, tmp8685);
1060     S25 tmp1433 = S24_get0(tmp1432);
1061     #define _at_tmp1433 _mem_f32
1062     float tmp1435 = _Atm_(atomicAdd, _at_tmp1433, tmp1433, tmp1421);
1063     const float tmp1437 = float(tmp705 - tmp1290);
1064     const float tmp1438 = float(tmp1437 * tmp1292);
1065     const float tmp1439 = tmp202;
1066     const float tmp1440 = float(tmp1295 * tmp1439);
1067     const int tmp1441 = int(tmp1299 + tmp704);
1068     const float tmp1442 = float(tmp1285 * tmp1438);
1069     const float tmp1443 = float(tmp1326 + tmp1442);
1070     const float tmp1444 = float(tmp1287 * tmp1438);
1071     const float tmp1445 = float(tmp1329 + tmp1444);
1072     const float tmp1446 = float(tmp1312 + tmp1443);
1073     const float tmp1447 = float(tmp1325 + tmp1445);
1074     const float tmp1448 = float(tmp1446 * tmp1440);
1075     const float tmp1449 = float(tmp1447 * tmp1440);
1076     const int tmp1458 = (((0 + tmp1441) >> 0) & ((1 << 7) - 1));
1077     const int tmp8705 = int(tmp1458 * tmp8461);
1078     const int tmp8706 = int(tmp8459 + tmp8705);
1079     const int tmp8709 = int(tmp8706 + tmp8660);
1080     S21_ch tmp1460 = S21_children(tmp1342, tmp8709);
1081     S22 tmp1461 = S21_get0(tmp1460);
1082     #define _at_tmp1461 _mem_f32
1083     float tmp1463 = _Atm_(atomicAdd, _at_tmp1461, tmp1461, tmp1448);
1084     S23 tmp1475 = S21_get1(tmp1460);
1085     #define _at_tmp1475 _mem_f32
1086     float tmp1477 = _Atm_(atomicAdd, _at_tmp1475, tmp1475, tmp1449);
1087     const float tmp1478 = float(tmp1440 * tmp1243);
1088     S24_ch tmp1489 = S24_children(tmp1371, tmp8709);
1089     S25 tmp1490 = S24_get0(tmp1489);
1090     #define _at_tmp1490 _mem_f32
1091     float tmp1492 = _Atm_(atomicAdd, _at_tmp1490, tmp1490, tmp1478);
1092     const float tmp1493 = float(tmp156 - tmp1288);
1093     const float tmp1494 = float(tmp1493 * tmp1292);
1094     const float tmp1495 = tmp196;
1095     const float tmp1496 = float(tmp1495 * tmp1296);
1096     const int tmp1497 = int(tmp1298 + tmp8461);
1097     const float tmp1498 = float(tmp1284 * tmp1494);
1098     const float tmp1499 = float(tmp1498 + tmp1327);
1099     const float tmp1500 = float(tmp1286 * tmp1494);
1100     const float tmp1501 = float(tmp1500 + tmp1330);
1101     const float tmp1502 = float(tmp1312 + tmp1499);
1102     const float tmp1503 = float(tmp1325 + tmp1501);
1103     const float tmp1504 = float(tmp1502 * tmp1496);
1104     const float tmp1505 = float(tmp1503 * tmp1496);
1105     const int tmp1513 = (((0 + tmp1497) >> 0) & ((1 << 7) - 1));
1106     const int tmp8732 = int(tmp1513 * tmp8659);
1107     const int tmp8733 = int(tmp8658 + tmp8732);
1108     S21_ch tmp1516 = S21_children(tmp1342, tmp8733);
1109     S22 tmp1517 = S21_get0(tmp1516);
1110     #define _at_tmp1517 _mem_f32
1111     float tmp1519 = _Atm_(atomicAdd, _at_tmp1517, tmp1517, tmp1504);
1112     S23 tmp1531 = S21_get1(tmp1516);
1113     #define _at_tmp1531 _mem_f32
1114     float tmp1533 = _Atm_(atomicAdd, _at_tmp1531, tmp1531, tmp1505);
1115     const float tmp1534 = float(tmp1496 * tmp1243);
1116     S24_ch tmp1545 = S24_children(tmp1371, tmp8733);
1117     S25 tmp1546 = S24_get0(tmp1545);
1118     #define _at_tmp1546 _mem_f32
1119     float tmp1548 = _Atm_(atomicAdd, _at_tmp1546, tmp1546, tmp1534);
1120     const float tmp1549 = float(tmp1495 * tmp1382);
1121     const float tmp1550 = float(tmp1498 + tmp1385);
1122     const float tmp1551 = float(tmp1500 + tmp1387);
1123     const float tmp1552 = float(tmp1312 + tmp1550);
1124     const float tmp1553 = float(tmp1325 + tmp1551);
1125     const float tmp1554 = float(tmp1552 * tmp1549);
1126     const float tmp1555 = float(tmp1553 * tmp1549);
1127     const int tmp8757 = int(tmp8682 + tmp8732);
1128     S21_ch tmp1566 = S21_children(tmp1342, tmp8757);
1129     S22 tmp1567 = S21_get0(tmp1566);
1130     #define _at_tmp1567 _mem_f32
1131     float tmp1569 = _Atm_(atomicAdd, _at_tmp1567, tmp1567, tmp1554);
1132     S23 tmp1581 = S21_get1(tmp1566);
1133     #define _at_tmp1581 _mem_f32
1134     float tmp1583 = _Atm_(atomicAdd, _at_tmp1581, tmp1581, tmp1555);
1135     const float tmp1584 = float(tmp1549 * tmp1243);
1136     S24_ch tmp1595 = S24_children(tmp1371, tmp8757);
1137     S25 tmp1596 = S24_get0(tmp1595);
1138     #define _at_tmp1596 _mem_f32
1139     float tmp1598 = _Atm_(atomicAdd, _at_tmp1596, tmp1596, tmp1584);
1140     const float tmp1599 = float(tmp1495 * tmp1439);
1141     const float tmp1600 = float(tmp1498 + tmp1442);
1142     const float tmp1601 = float(tmp1500 + tmp1444);
1143     const float tmp1602 = float(tmp1312 + tmp1600);
1144     const float tmp1603 = float(tmp1325 + tmp1601);
1145     const float tmp1604 = float(tmp1602 * tmp1599);
1146     const float tmp1605 = float(tmp1603 * tmp1599);
1147     const int tmp8781 = int(tmp8706 + tmp8732);
1148     S21_ch tmp1616 = S21_children(tmp1342, tmp8781);
1149     S22 tmp1617 = S21_get0(tmp1616);
1150     #define _at_tmp1617 _mem_f32
1151     float tmp1619 = _Atm_(atomicAdd, _at_tmp1617, tmp1617, tmp1604);
1152     S23 tmp1631 = S21_get1(tmp1616);
1153     #define _at_tmp1631 _mem_f32
1154     float tmp1633 = _Atm_(atomicAdd, _at_tmp1631, tmp1631, tmp1605);
1155     const float tmp1634 = float(tmp1599 * tmp1243);
1156     S24_ch tmp1645 = S24_children(tmp1371, tmp8781);
1157     S25 tmp1646 = S24_get0(tmp1645);
1158     #define _at_tmp1646 _mem_f32
1159     float tmp1648 = _Atm_(atomicAdd, _at_tmp1646, tmp1646, tmp1634);
1160     const float tmp1649 = float(tmp705 - tmp1288);
1161     const float tmp1650 = float(tmp1649 * tmp1292);
1162     const float tmp1651 = tmp200;
1163     const float tmp1652 = float(tmp1651 * tmp1296);
1164     const int tmp1653 = int(tmp1298 + tmp704);
1165     const float tmp1654 = float(tmp1284 * tmp1650);
1166     const float tmp1655 = float(tmp1654 + tmp1327);
1167     const float tmp1656 = float(tmp1286 * tmp1650);
1168     const float tmp1657 = float(tmp1656 + tmp1330);
1169     const float tmp1658 = float(tmp1312 + tmp1655);
1170     const float tmp1659 = float(tmp1325 + tmp1657);
1171     const float tmp1660 = float(tmp1658 * tmp1652);
1172     const float tmp1661 = float(tmp1659 * tmp1652);
1173     const int tmp1669 = (((0 + tmp1653) >> 0) & ((1 << 7) - 1));
1174     const int tmp8804 = int(tmp1669 * tmp8659);
1175     const int tmp8805 = int(tmp8658 + tmp8804);
1176     S21_ch tmp1672 = S21_children(tmp1342, tmp8805);
1177     S22 tmp1673 = S21_get0(tmp1672);
1178     #define _at_tmp1673 _mem_f32
1179     float tmp1675 = _Atm_(atomicAdd, _at_tmp1673, tmp1673, tmp1660);
1180     S23 tmp1687 = S21_get1(tmp1672);
1181     #define _at_tmp1687 _mem_f32
1182     float tmp1689 = _Atm_(atomicAdd, _at_tmp1687, tmp1687, tmp1661);
1183     const float tmp1690 = float(tmp1652 * tmp1243);
1184     S24_ch tmp1701 = S24_children(tmp1371, tmp8805);
1185     S25 tmp1702 = S24_get0(tmp1701);
1186     #define _at_tmp1702 _mem_f32
1187     float tmp1704 = _Atm_(atomicAdd, _at_tmp1702, tmp1702, tmp1690);
1188     const float tmp1705 = float(tmp1651 * tmp1382);
1189     const float tmp1706 = float(tmp1654 + tmp1385);
1190     const float tmp1707 = float(tmp1656 + tmp1387);
1191     const float tmp1708 = float(tmp1312 + tmp1706);
1192     const float tmp1709 = float(tmp1325 + tmp1707);
1193     const float tmp1710 = float(tmp1708 * tmp1705);
1194     const float tmp1711 = float(tmp1709 * tmp1705);
1195     const int tmp8829 = int(tmp8682 + tmp8804);
1196     S21_ch tmp1722 = S21_children(tmp1342, tmp8829);
1197     S22 tmp1723 = S21_get0(tmp1722);
1198     #define _at_tmp1723 _mem_f32
1199     float tmp1725 = _Atm_(atomicAdd, _at_tmp1723, tmp1723, tmp1710);
1200     S23 tmp1737 = S21_get1(tmp1722);
1201     #define _at_tmp1737 _mem_f32
1202     float tmp1739 = _Atm_(atomicAdd, _at_tmp1737, tmp1737, tmp1711);
1203     const float tmp1740 = float(tmp1705 * tmp1243);
1204     S24_ch tmp1751 = S24_children(tmp1371, tmp8829);
1205     S25 tmp1752 = S24_get0(tmp1751);
1206     #define _at_tmp1752 _mem_f32
1207     float tmp1754 = _Atm_(atomicAdd, _at_tmp1752, tmp1752, tmp1740);
1208     const float tmp1755 = float(tmp1651 * tmp1439);
1209     const float tmp1756 = float(tmp1654 + tmp1442);
1210     const float tmp1757 = float(tmp1656 + tmp1444);
1211     const float tmp1758 = float(tmp1312 + tmp1756);
1212     const float tmp1759 = float(tmp1325 + tmp1757);
1213     const float tmp1760 = float(tmp1758 * tmp1755);
1214     const float tmp1761 = float(tmp1759 * tmp1755);
1215     const int tmp8853 = int(tmp8706 + tmp8804);
1216     S21_ch tmp1772 = S21_children(tmp1342, tmp8853);
1217     S22 tmp1773 = S21_get0(tmp1772);
1218     #define _at_tmp1773 _mem_f32
1219     float tmp1775 = _Atm_(atomicAdd, _at_tmp1773, tmp1773, tmp1760);
1220     S23 tmp1787 = S21_get1(tmp1772);
1221     #define _at_tmp1787 _mem_f32
1222     float tmp1789 = _Atm_(atomicAdd, _at_tmp1787, tmp1787, tmp1761);
1223     const float tmp1790 = float(tmp1755 * tmp1243);
1224     S24_ch tmp1801 = S24_children(tmp1371, tmp8853);
1225     S25 tmp1802 = S24_get0(tmp1801);
1226     #define _at_tmp1802 _mem_f32
1227     float tmp1804 = _Atm_(atomicAdd, _at_tmp1802, tmp1802, tmp1790);
1228   }
1229 }
1230 
1231 void main()
1232 {
1233   substep_c4_01();
1234 }
1235 layout(local_size_x = 1536, local_size_y = 1, local_size_z = 1) in;

0(1024) : error C1503: undefined variable "atomicAdd_at_tmp1347"
0(1027) : error C1503: undefined variable "atomicAdd_at_tmp1361"
0(1033) : error C1503: undefined variable "atomicAdd_at_tmp1376"
0(1054) : error C1503: undefined variable "atomicAdd_at_tmp1404"
0(1057) : error C1503: undefined variable "atomicAdd_at_tmp1418"
0(1062) : error C1503: undefined variable "atomicAdd_at_tmp1433"
0(1083) : error C1503: undefined variable "atomicAdd_at_tmp1461"
0(1086) : error C1503: undefined variable "atomicAdd_at_tmp1475"
0(1091) : error C1503: undefined variable "atomicAdd_at_tmp1490"
0(1111) : error C1503: undefined variable "atomicAdd_at_tmp1517"
0(1114) : error C1503: undefined variable "atomicAdd_at_tmp1531"
0(1119) : error C1503: undefined variable "atomicAdd_at_tmp1546"
0(1131) : error C1503: undefined variable "atomicAdd_at_tmp1567"
0(1134) : error C1503: undefined variable "atomicAdd_at_tmp1581"
0(1139) : error C1503: undefined variable "atomicAdd_at_tmp1596"
0(1151) : error C1503: undefined variable "atomicAdd_at_tmp1617"
0(1154) : error C1503: undefined variable "atomicAdd_at_tmp1631"
0(1159) : error C1503: undefined variable "atomicAdd_at_tmp1646"
0(1179) : error C1503: undefined variable "atomicAdd_at_tmp1673"
0(1182) : error C1503: undefined variable "atomicAdd_at_tmp1687"
0(1187) : error C1503: undefined variable "atomicAdd_at_tmp1702"
0(1199) : error C1503: undefined variable "atomicAdd_at_tmp1723"
0(1202) : error C1503: undefined variable "atomicAdd_at_tmp1737"
0(1207) : error C1503: undefined variable "atomicAdd_at_tmp1752"
0(1219) : error C1503: undefined variable "atomicAdd_at_tmp1773"
0(1222) : error C1503: undefined variable "atomicAdd_at_tmp1787"
0(1227) : error C1503: undefined variable "atomicAdd_at_tmp1802"

@archibate archibate self-assigned this Mar 27, 2020
@archibate
Copy link
Collaborator Author

What's your OpenGL implementation? My is mesa.

@yuanming-hu
Copy link
Member

My GL config:

>>> glxinfo                        
name of display: :1
display: :1  screen: 0
direct rendering: Yes
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
server glx extensions:
    GLX_ARB_context_flush_control, GLX_ARB_create_context, 
    GLX_ARB_create_context_no_error, GLX_ARB_create_context_profile, 
    GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, 
    GLX_ARB_multisample, GLX_EXT_buffer_age, 
    GLX_EXT_create_context_es2_profile, GLX_EXT_create_context_es_profile, 
    GLX_EXT_framebuffer_sRGB, GLX_EXT_import_context, GLX_EXT_libglvnd, 
    GLX_EXT_stereo_tree, GLX_EXT_swap_control, GLX_EXT_swap_control_tear, 
    GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, GLX_EXT_visual_rating, 
    GLX_NV_copy_image, GLX_NV_delay_before_swap, GLX_NV_float_buffer, 
    GLX_NV_robustness_video_memory_purge, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGI_swap_control, GLX_SGI_video_sync
client glx vendor string: NVIDIA Corporation
client glx version string: 1.4
client glx extensions:
    GLX_ARB_context_flush_control, GLX_ARB_create_context, 
    GLX_ARB_create_context_no_error, GLX_ARB_create_context_profile, 
    GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, 
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_buffer_age, 
    GLX_EXT_create_context_es2_profile, GLX_EXT_create_context_es_profile, 
    GLX_EXT_fbconfig_packed_float, GLX_EXT_framebuffer_sRGB, 
    GLX_EXT_import_context, GLX_EXT_stereo_tree, GLX_EXT_swap_control, 
    GLX_EXT_swap_control_tear, GLX_EXT_texture_from_pixmap, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_NV_copy_buffer, 
    GLX_NV_copy_image, GLX_NV_delay_before_swap, GLX_NV_float_buffer, 
    GLX_NV_multisample_coverage, GLX_NV_present_video, 
    GLX_NV_robustness_video_memory_purge, GLX_NV_swap_group, 
    GLX_NV_video_capture, GLX_NV_video_out, GLX_SGIX_fbconfig, 
    GLX_SGIX_pbuffer, GLX_SGI_swap_control, GLX_SGI_video_sync
GLX version: 1.4
GLX extensions:
    GLX_ARB_context_flush_control, GLX_ARB_create_context, 
    GLX_ARB_create_context_no_error, GLX_ARB_create_context_profile, 
    GLX_ARB_create_context_robustness, GLX_ARB_fbconfig_float, 
    GLX_ARB_get_proc_address, GLX_ARB_multisample, GLX_EXT_buffer_age, 
    GLX_EXT_create_context_es2_profile, GLX_EXT_create_context_es_profile, 
    GLX_EXT_framebuffer_sRGB, GLX_EXT_import_context, GLX_EXT_stereo_tree, 
    GLX_EXT_swap_control, GLX_EXT_swap_control_tear, 
    GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, GLX_EXT_visual_rating, 
    GLX_NV_copy_image, GLX_NV_delay_before_swap, GLX_NV_float_buffer, 
    GLX_NV_robustness_video_memory_purge, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, 
    GLX_SGI_swap_control, GLX_SGI_video_sync
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 11264 MB
    Total available memory: 11264 MB
    Currently available dedicated video memory: 10043 MB
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce GTX 1080 Ti/PCIe/SSE2
OpenGL core profile version string: 4.6.0 NVIDIA 430.26
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
    GL_AMD_multi_draw_indirect, GL_AMD_seamless_cubemap_per_texture, 
    GL_AMD_vertex_shader_layer, GL_AMD_vertex_shader_viewport_index, 
    GL_ARB_ES2_compatibility, GL_ARB_ES3_1_compatibility, 
    GL_ARB_ES3_2_compatibility, GL_ARB_ES3_compatibility, 
    GL_ARB_arrays_of_arrays, GL_ARB_base_instance, GL_ARB_bindless_texture, 
    GL_ARB_blend_func_extended, GL_ARB_buffer_storage, 
    GL_ARB_clear_buffer_object, GL_ARB_clear_texture, GL_ARB_clip_control, 
    GL_ARB_color_buffer_float, GL_ARB_compressed_texture_pixel_storage, 
    GL_ARB_compute_shader, GL_ARB_compute_variable_group_size, 
    GL_ARB_conditional_render_inverted, GL_ARB_conservative_depth, 
    GL_ARB_copy_buffer, GL_ARB_copy_image, GL_ARB_cull_distance, 
    GL_ARB_debug_output, GL_ARB_depth_buffer_float, GL_ARB_depth_clamp, 
    GL_ARB_depth_texture, GL_ARB_derivative_control, 
    GL_ARB_direct_state_access, GL_ARB_draw_buffers, 
    GL_ARB_draw_buffers_blend, GL_ARB_draw_elements_base_vertex, 
    GL_ARB_draw_indirect, GL_ARB_draw_instanced, GL_ARB_enhanced_layouts, 
    GL_ARB_explicit_attrib_location, GL_ARB_explicit_uniform_location, 
    GL_ARB_fragment_coord_conventions, GL_ARB_fragment_layer_viewport, 
    GL_ARB_fragment_program, GL_ARB_fragment_program_shadow, 
    GL_ARB_fragment_shader, GL_ARB_fragment_shader_interlock, 
    GL_ARB_framebuffer_no_attachments, GL_ARB_framebuffer_object, 
    GL_ARB_framebuffer_sRGB, GL_ARB_geometry_shader4, 
    GL_ARB_get_program_binary, GL_ARB_get_texture_sub_image, GL_ARB_gl_spirv, 
    GL_ARB_gpu_shader5, GL_ARB_gpu_shader_fp64, GL_ARB_gpu_shader_int64, 
    GL_ARB_half_float_pixel, GL_ARB_half_float_vertex, GL_ARB_imaging, 
    GL_ARB_indirect_parameters, GL_ARB_instanced_arrays, 
    GL_ARB_internalformat_query, GL_ARB_internalformat_query2, 
    GL_ARB_invalidate_subdata, GL_ARB_map_buffer_alignment, 
    GL_ARB_map_buffer_range, GL_ARB_multi_bind, GL_ARB_multi_draw_indirect, 
    GL_ARB_multisample, GL_ARB_multitexture, GL_ARB_occlusion_query, 
    GL_ARB_occlusion_query2, GL_ARB_parallel_shader_compile, 
    GL_ARB_pipeline_statistics_query, GL_ARB_pixel_buffer_object, 
    GL_ARB_point_parameters, GL_ARB_point_sprite, GL_ARB_polygon_offset_clamp, 
    GL_ARB_post_depth_coverage, GL_ARB_program_interface_query, 
    GL_ARB_provoking_vertex, GL_ARB_query_buffer_object, 
    GL_ARB_robust_buffer_access_behavior, GL_ARB_robustness, 
    GL_ARB_sample_locations, GL_ARB_sample_shading, GL_ARB_sampler_objects, 
    GL_ARB_seamless_cube_map, GL_ARB_seamless_cubemap_per_texture, 
    GL_ARB_separate_shader_objects, GL_ARB_shader_atomic_counter_ops, 
    GL_ARB_shader_atomic_counters, GL_ARB_shader_ballot, 
    GL_ARB_shader_bit_encoding, GL_ARB_shader_clock, 
    GL_ARB_shader_draw_parameters, GL_ARB_shader_group_vote, 
    GL_ARB_shader_image_load_store, GL_ARB_shader_image_size, 
    GL_ARB_shader_objects, GL_ARB_shader_precision, 
    GL_ARB_shader_storage_buffer_object, GL_ARB_shader_subroutine, 
    GL_ARB_shader_texture_image_samples, GL_ARB_shader_texture_lod, 
    GL_ARB_shader_viewport_layer_array, GL_ARB_shading_language_100, 
    GL_ARB_shading_language_420pack, GL_ARB_shading_language_include, 
    GL_ARB_shading_language_packing, GL_ARB_shadow, GL_ARB_sparse_buffer, 
    GL_ARB_sparse_texture, GL_ARB_sparse_texture2, 
    GL_ARB_sparse_texture_clamp, GL_ARB_spirv_extensions, 
    GL_ARB_stencil_texturing, GL_ARB_sync, GL_ARB_tessellation_shader, 
    GL_ARB_texture_barrier, GL_ARB_texture_border_clamp, 
    GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_object_rgb32, 
    GL_ARB_texture_buffer_range, GL_ARB_texture_compression, 
    GL_ARB_texture_compression_bptc, GL_ARB_texture_compression_rgtc, 
    GL_ARB_texture_cube_map, GL_ARB_texture_cube_map_array, 
    GL_ARB_texture_env_add, GL_ARB_texture_env_combine, 
    GL_ARB_texture_env_crossbar, GL_ARB_texture_env_dot3, 
    GL_ARB_texture_filter_anisotropic, GL_ARB_texture_filter_minmax, 
    GL_ARB_texture_float, GL_ARB_texture_gather, 
    GL_ARB_texture_mirror_clamp_to_edge, GL_ARB_texture_mirrored_repeat, 
    GL_ARB_texture_multisample, GL_ARB_texture_non_power_of_two, 
    GL_ARB_texture_query_levels, GL_ARB_texture_query_lod, 
    GL_ARB_texture_rectangle, GL_ARB_texture_rg, GL_ARB_texture_rgb10_a2ui, 
    GL_ARB_texture_stencil8, GL_ARB_texture_storage, 
    GL_ARB_texture_storage_multisample, GL_ARB_texture_swizzle, 
    GL_ARB_texture_view, GL_ARB_timer_query, GL_ARB_transform_feedback2, 
    GL_ARB_transform_feedback3, GL_ARB_transform_feedback_instanced, 
    GL_ARB_transform_feedback_overflow_query, GL_ARB_transpose_matrix, 
    GL_ARB_uniform_buffer_object, GL_ARB_vertex_array_bgra, 
    GL_ARB_vertex_array_object, GL_ARB_vertex_attrib_64bit, 
    GL_ARB_vertex_attrib_binding, GL_ARB_vertex_buffer_object, 
    GL_ARB_vertex_program, GL_ARB_vertex_shader, 
    GL_ARB_vertex_type_10f_11f_11f_rev, GL_ARB_vertex_type_2_10_10_10_rev, 
    GL_ARB_viewport_array, GL_ARB_window_pos, GL_ATI_draw_buffers, 
    GL_ATI_texture_float, GL_ATI_texture_mirror_once, 
    GL_EXTX_framebuffer_mixed_formats, GL_EXT_Cg_shader, GL_EXT_abgr, 
    GL_EXT_bgra, GL_EXT_bindable_uniform, GL_EXT_blend_color, 
    GL_EXT_blend_equation_separate, GL_EXT_blend_func_separate, 
    GL_EXT_blend_minmax, GL_EXT_blend_subtract, GL_EXT_compiled_vertex_array, 
    GL_EXT_depth_bounds_test, GL_EXT_direct_state_access, 
    GL_EXT_draw_buffers2, GL_EXT_draw_instanced, GL_EXT_draw_range_elements, 
    GL_EXT_fog_coord, GL_EXT_framebuffer_blit, GL_EXT_framebuffer_multisample, 
    GL_EXT_framebuffer_multisample_blit_scaled, GL_EXT_framebuffer_object, 
    GL_EXT_framebuffer_sRGB, GL_EXT_geometry_shader4, 
    GL_EXT_gpu_program_parameters, GL_EXT_gpu_shader4, 
    GL_EXT_import_sync_object, GL_EXT_memory_object, GL_EXT_memory_object_fd, 
    GL_EXT_multi_draw_arrays, GL_EXT_packed_depth_stencil, 
    GL_EXT_packed_float, GL_EXT_packed_pixels, GL_EXT_pixel_buffer_object, 
    GL_EXT_point_parameters, GL_EXT_polygon_offset_clamp, 
    GL_EXT_post_depth_coverage, GL_EXT_provoking_vertex, 
    GL_EXT_raster_multisample, GL_EXT_rescale_normal, GL_EXT_secondary_color, 
    GL_EXT_semaphore, GL_EXT_semaphore_fd, GL_EXT_separate_shader_objects, 
    GL_EXT_separate_specular_color, GL_EXT_shader_image_load_formatted, 
    GL_EXT_shader_image_load_store, GL_EXT_shader_integer_mix, 
    GL_EXT_shadow_funcs, GL_EXT_sparse_texture2, GL_EXT_stencil_two_side, 
    GL_EXT_stencil_wrap, GL_EXT_texture3D, GL_EXT_texture_array, 
    GL_EXT_texture_buffer_object, GL_EXT_texture_compression_dxt1, 
    GL_EXT_texture_compression_latc, GL_EXT_texture_compression_rgtc, 
    GL_EXT_texture_compression_s3tc, GL_EXT_texture_cube_map, 
    GL_EXT_texture_edge_clamp, GL_EXT_texture_env_add, 
    GL_EXT_texture_env_combine, GL_EXT_texture_env_dot3, 
    GL_EXT_texture_filter_anisotropic, GL_EXT_texture_filter_minmax, 
    GL_EXT_texture_integer, GL_EXT_texture_lod, GL_EXT_texture_lod_bias, 
    GL_EXT_texture_mirror_clamp, GL_EXT_texture_object, GL_EXT_texture_sRGB, 
    GL_EXT_texture_sRGB_R8, GL_EXT_texture_sRGB_decode, 
    GL_EXT_texture_shared_exponent, GL_EXT_texture_storage, 
    GL_EXT_texture_swizzle, GL_EXT_timer_query, GL_EXT_transform_feedback2, 
    GL_EXT_vertex_array, GL_EXT_vertex_array_bgra, GL_EXT_vertex_attrib_64bit, 
    GL_EXT_window_rectangles, GL_EXT_x11_sync_object, GL_IBM_rasterpos_clip, 
    GL_IBM_texture_mirrored_repeat, GL_KHR_blend_equation_advanced, 
    GL_KHR_blend_equation_advanced_coherent, GL_KHR_context_flush_control, 
    GL_KHR_debug, GL_KHR_no_error, GL_KHR_parallel_shader_compile, 
    GL_KHR_robust_buffer_access_behavior, GL_KHR_robustness, 
    GL_KTX_buffer_region, GL_NVX_blend_equation_advanced_multi_draw_buffers, 
    GL_NVX_conditional_render, GL_NVX_gpu_memory_info, GL_NVX_nvenc_interop, 
    GL_NV_ES1_1_compatibility, GL_NV_ES3_1_compatibility, 
    GL_NV_alpha_to_coverage_dither_control, GL_NV_bindless_multi_draw_indirect, 
    GL_NV_bindless_multi_draw_indirect_count, GL_NV_bindless_texture, 
    GL_NV_blend_equation_advanced, GL_NV_blend_equation_advanced_coherent, 
    GL_NV_blend_minmax_factor, GL_NV_blend_square, GL_NV_clip_space_w_scaling, 
    GL_NV_command_list, GL_NV_compute_program5, GL_NV_conditional_render, 
    GL_NV_conservative_raster, GL_NV_conservative_raster_dilate, 
    GL_NV_conservative_raster_pre_snap_triangles, GL_NV_copy_depth_to_color, 
    GL_NV_copy_image, GL_NV_depth_buffer_float, GL_NV_depth_clamp, 
    GL_NV_draw_texture, GL_NV_draw_vulkan_image, GL_NV_explicit_multisample, 
    GL_NV_feature_query, GL_NV_fence, GL_NV_fill_rectangle, 
    GL_NV_float_buffer, GL_NV_fog_distance, GL_NV_fragment_coverage_to_color, 
    GL_NV_fragment_program, GL_NV_fragment_program2, 
    GL_NV_fragment_program_option, GL_NV_fragment_shader_interlock, 
    GL_NV_framebuffer_mixed_samples, GL_NV_framebuffer_multisample_coverage, 
    GL_NV_geometry_shader4, GL_NV_geometry_shader_passthrough, 
    GL_NV_gpu_program4, GL_NV_gpu_program4_1, GL_NV_gpu_program5, 
    GL_NV_gpu_program5_mem_extended, GL_NV_gpu_program_fp64, 
    GL_NV_gpu_shader5, GL_NV_half_float, GL_NV_internalformat_sample_query, 
    GL_NV_light_max_exponent, GL_NV_memory_attachment, 
    GL_NV_multisample_coverage, GL_NV_multisample_filter_hint, 
    GL_NV_occlusion_query, GL_NV_packed_depth_stencil, 
    GL_NV_parameter_buffer_object, GL_NV_parameter_buffer_object2, 
    GL_NV_path_rendering, GL_NV_path_rendering_shared_edge, 
    GL_NV_pixel_data_range, GL_NV_point_sprite, GL_NV_primitive_restart, 
    GL_NV_query_resource, GL_NV_query_resource_tag, GL_NV_register_combiners, 
    GL_NV_register_combiners2, GL_NV_robustness_video_memory_purge, 
    GL_NV_sample_locations, GL_NV_sample_mask_override_coverage, 
    GL_NV_shader_atomic_counters, GL_NV_shader_atomic_float, 
    GL_NV_shader_atomic_float64, GL_NV_shader_atomic_fp16_vector, 
    GL_NV_shader_atomic_int64, GL_NV_shader_buffer_load, 
    GL_NV_shader_storage_buffer_object, GL_NV_shader_thread_group, 
    GL_NV_shader_thread_shuffle, GL_NV_stereo_view_rendering, 
    GL_NV_texgen_reflection, GL_NV_texture_barrier, 
    GL_NV_texture_compression_vtc, GL_NV_texture_env_combine4, 
    GL_NV_texture_multisample, GL_NV_texture_rectangle, 
    GL_NV_texture_rectangle_compressed, GL_NV_texture_shader, 
    GL_NV_texture_shader2, GL_NV_texture_shader3, GL_NV_transform_feedback, 
    GL_NV_transform_feedback2, GL_NV_uniform_buffer_unified_memory, 
    GL_NV_vdpau_interop, GL_NV_vdpau_interop2, GL_NV_vertex_array_range, 
    GL_NV_vertex_array_range2, GL_NV_vertex_attrib_integer_64bit, 
    GL_NV_vertex_buffer_unified_memory, GL_NV_vertex_program, 
    GL_NV_vertex_program1_1, GL_NV_vertex_program2, 
    GL_NV_vertex_program2_option, GL_NV_vertex_program3, 
    GL_NV_viewport_array2, GL_NV_viewport_swizzle, GL_OVR_multiview, 
    GL_OVR_multiview2, GL_S3_s3tc, GL_SGIS_generate_mipmap, 
    GL_SGIS_texture_lod, GL_SGIX_depth_texture, GL_SGIX_shadow, 
    GL_SUN_slice_accum

OpenGL version string: 4.6.0 NVIDIA 430.26
OpenGL shading language version string: 4.60 NVIDIA
OpenGL context flags: (none)
OpenGL profile mask: (none)
OpenGL extensions:
    GL_AMD_multi_draw_indirect, GL_AMD_seamless_cubemap_per_texture, 
    GL_AMD_vertex_shader_layer, GL_AMD_vertex_shader_viewport_index, 
    GL_ARB_ES2_compatibility, GL_ARB_ES3_1_compatibility, 
    GL_ARB_ES3_2_compatibility, GL_ARB_ES3_compatibility, 
    GL_ARB_arrays_of_arrays, GL_ARB_base_instance, GL_ARB_bindless_texture, 
    GL_ARB_blend_func_extended, GL_ARB_buffer_storage, 
    GL_ARB_clear_buffer_object, GL_ARB_clear_texture, GL_ARB_clip_control, 
    GL_ARB_color_buffer_float, GL_ARB_compatibility, 
    GL_ARB_compressed_texture_pixel_storage, GL_ARB_compute_shader, 
    GL_ARB_compute_variable_group_size, GL_ARB_conditional_render_inverted, 
    GL_ARB_conservative_depth, GL_ARB_copy_buffer, GL_ARB_copy_image, 
    GL_ARB_cull_distance, GL_ARB_debug_output, GL_ARB_depth_buffer_float, 
    GL_ARB_depth_clamp, GL_ARB_depth_texture, GL_ARB_derivative_control, 
    GL_ARB_direct_state_access, GL_ARB_draw_buffers, 
    GL_ARB_draw_buffers_blend, GL_ARB_draw_elements_base_vertex, 
    GL_ARB_draw_indirect, GL_ARB_draw_instanced, GL_ARB_enhanced_layouts, 
    GL_ARB_explicit_attrib_location, GL_ARB_explicit_uniform_location, 
    GL_ARB_fragment_coord_conventions, GL_ARB_fragment_layer_viewport, 
    GL_ARB_fragment_program, GL_ARB_fragment_program_shadow, 
    GL_ARB_fragment_shader, GL_ARB_fragment_shader_interlock, 
    GL_ARB_framebuffer_no_attachments, GL_ARB_framebuffer_object, 
    GL_ARB_framebuffer_sRGB, GL_ARB_geometry_shader4, 
    GL_ARB_get_program_binary, GL_ARB_get_texture_sub_image, GL_ARB_gl_spirv, 
    GL_ARB_gpu_shader5, GL_ARB_gpu_shader_fp64, GL_ARB_gpu_shader_int64, 
    GL_ARB_half_float_pixel, GL_ARB_half_float_vertex, GL_ARB_imaging, 
    GL_ARB_indirect_parameters, GL_ARB_instanced_arrays, 
    GL_ARB_internalformat_query, GL_ARB_internalformat_query2, 
    GL_ARB_invalidate_subdata, GL_ARB_map_buffer_alignment, 
    GL_ARB_map_buffer_range, GL_ARB_multi_bind, GL_ARB_multi_draw_indirect, 
    GL_ARB_multisample, GL_ARB_multitexture, GL_ARB_occlusion_query, 
    GL_ARB_occlusion_query2, GL_ARB_parallel_shader_compile, 
    GL_ARB_pipeline_statistics_query, GL_ARB_pixel_buffer_object, 
    GL_ARB_point_parameters, GL_ARB_point_sprite, GL_ARB_polygon_offset_clamp, 
    GL_ARB_post_depth_coverage, GL_ARB_program_interface_query, 
    GL_ARB_provoking_vertex, GL_ARB_query_buffer_object, 
    GL_ARB_robust_buffer_access_behavior, GL_ARB_robustness, 
    GL_ARB_sample_locations, GL_ARB_sample_shading, GL_ARB_sampler_objects, 
    GL_ARB_seamless_cube_map, GL_ARB_seamless_cubemap_per_texture, 
    GL_ARB_separate_shader_objects, GL_ARB_shader_atomic_counter_ops, 
    GL_ARB_shader_atomic_counters, GL_ARB_shader_ballot, 
    GL_ARB_shader_bit_encoding, GL_ARB_shader_clock, 
    GL_ARB_shader_draw_parameters, GL_ARB_shader_group_vote, 
    GL_ARB_shader_image_load_store, GL_ARB_shader_image_size, 
    GL_ARB_shader_objects, GL_ARB_shader_precision, 
    GL_ARB_shader_storage_buffer_object, GL_ARB_shader_subroutine, 
    GL_ARB_shader_texture_image_samples, GL_ARB_shader_texture_lod, 
    GL_ARB_shader_viewport_layer_array, GL_ARB_shading_language_100, 
    GL_ARB_shading_language_420pack, GL_ARB_shading_language_include, 
    GL_ARB_shading_language_packing, GL_ARB_shadow, GL_ARB_sparse_buffer, 
    GL_ARB_sparse_texture, GL_ARB_sparse_texture2, 
    GL_ARB_sparse_texture_clamp, GL_ARB_spirv_extensions, 
    GL_ARB_stencil_texturing, GL_ARB_sync, GL_ARB_tessellation_shader, 
    GL_ARB_texture_barrier, GL_ARB_texture_border_clamp, 
    GL_ARB_texture_buffer_object, GL_ARB_texture_buffer_object_rgb32, 
    GL_ARB_texture_buffer_range, GL_ARB_texture_compression, 
    GL_ARB_texture_compression_bptc, GL_ARB_texture_compression_rgtc, 
    GL_ARB_texture_cube_map, GL_ARB_texture_cube_map_array, 
    GL_ARB_texture_env_add, GL_ARB_texture_env_combine, 
    GL_ARB_texture_env_crossbar, GL_ARB_texture_env_dot3, 
    GL_ARB_texture_filter_anisotropic, GL_ARB_texture_filter_minmax, 
    GL_ARB_texture_float, GL_ARB_texture_gather, 
    GL_ARB_texture_mirror_clamp_to_edge, GL_ARB_texture_mirrored_repeat, 
    GL_ARB_texture_multisample, GL_ARB_texture_non_power_of_two, 
    GL_ARB_texture_query_levels, GL_ARB_texture_query_lod, 
    GL_ARB_texture_rectangle, GL_ARB_texture_rg, GL_ARB_texture_rgb10_a2ui, 
    GL_ARB_texture_stencil8, GL_ARB_texture_storage, 
    GL_ARB_texture_storage_multisample, GL_ARB_texture_swizzle, 
    GL_ARB_texture_view, GL_ARB_timer_query, GL_ARB_transform_feedback2, 
    GL_ARB_transform_feedback3, GL_ARB_transform_feedback_instanced, 
    GL_ARB_transform_feedback_overflow_query, GL_ARB_transpose_matrix, 
    GL_ARB_uniform_buffer_object, GL_ARB_vertex_array_bgra, 
    GL_ARB_vertex_array_object, GL_ARB_vertex_attrib_64bit, 
    GL_ARB_vertex_attrib_binding, GL_ARB_vertex_buffer_object, 
    GL_ARB_vertex_program, GL_ARB_vertex_shader, 
    GL_ARB_vertex_type_10f_11f_11f_rev, GL_ARB_vertex_type_2_10_10_10_rev, 
    GL_ARB_viewport_array, GL_ARB_window_pos, GL_ATI_draw_buffers, 
    GL_ATI_texture_float, GL_ATI_texture_mirror_once, 
    GL_EXTX_framebuffer_mixed_formats, GL_EXT_Cg_shader, GL_EXT_abgr, 
    GL_EXT_bgra, GL_EXT_bindable_uniform, GL_EXT_blend_color, 
    GL_EXT_blend_equation_separate, GL_EXT_blend_func_separate, 
    GL_EXT_blend_minmax, GL_EXT_blend_subtract, GL_EXT_compiled_vertex_array, 
    GL_EXT_depth_bounds_test, GL_EXT_direct_state_access, 
    GL_EXT_draw_buffers2, GL_EXT_draw_instanced, GL_EXT_draw_range_elements, 
    GL_EXT_fog_coord, GL_EXT_framebuffer_blit, GL_EXT_framebuffer_multisample, 
    GL_EXT_framebuffer_multisample_blit_scaled, GL_EXT_framebuffer_object, 
    GL_EXT_framebuffer_sRGB, GL_EXT_geometry_shader4, 
    GL_EXT_gpu_program_parameters, GL_EXT_gpu_shader4, 
    GL_EXT_import_sync_object, GL_EXT_memory_object, GL_EXT_memory_object_fd, 
    GL_EXT_multi_draw_arrays, GL_EXT_packed_depth_stencil, 
    GL_EXT_packed_float, GL_EXT_packed_pixels, GL_EXT_pixel_buffer_object, 
    GL_EXT_point_parameters, GL_EXT_polygon_offset_clamp, 
    GL_EXT_post_depth_coverage, GL_EXT_provoking_vertex, 
    GL_EXT_raster_multisample, GL_EXT_rescale_normal, GL_EXT_secondary_color, 
    GL_EXT_semaphore, GL_EXT_semaphore_fd, GL_EXT_separate_shader_objects, 
    GL_EXT_separate_specular_color, GL_EXT_shader_image_load_formatted, 
    GL_EXT_shader_image_load_store, GL_EXT_shader_integer_mix, 
    GL_EXT_shadow_funcs, GL_EXT_sparse_texture2, GL_EXT_stencil_two_side, 
    GL_EXT_stencil_wrap, GL_EXT_texture3D, GL_EXT_texture_array, 
    GL_EXT_texture_buffer_object, GL_EXT_texture_compression_dxt1, 
    GL_EXT_texture_compression_latc, GL_EXT_texture_compression_rgtc, 
    GL_EXT_texture_compression_s3tc, GL_EXT_texture_cube_map, 
    GL_EXT_texture_edge_clamp, GL_EXT_texture_env_add, 
    GL_EXT_texture_env_combine, GL_EXT_texture_env_dot3, 
    GL_EXT_texture_filter_anisotropic, GL_EXT_texture_filter_minmax, 
    GL_EXT_texture_integer, GL_EXT_texture_lod, GL_EXT_texture_lod_bias, 
    GL_EXT_texture_mirror_clamp, GL_EXT_texture_object, GL_EXT_texture_sRGB, 
    GL_EXT_texture_sRGB_R8, GL_EXT_texture_sRGB_decode, 
    GL_EXT_texture_shared_exponent, GL_EXT_texture_storage, 
    GL_EXT_texture_swizzle, GL_EXT_timer_query, GL_EXT_transform_feedback2, 
    GL_EXT_vertex_array, GL_EXT_vertex_array_bgra, GL_EXT_vertex_attrib_64bit, 
    GL_EXT_window_rectangles, GL_EXT_x11_sync_object, GL_IBM_rasterpos_clip, 
    GL_IBM_texture_mirrored_repeat, GL_KHR_blend_equation_advanced, 
    GL_KHR_blend_equation_advanced_coherent, GL_KHR_context_flush_control, 
    GL_KHR_debug, GL_KHR_no_error, GL_KHR_parallel_shader_compile, 
    GL_KHR_robust_buffer_access_behavior, GL_KHR_robustness, 
    GL_KTX_buffer_region, GL_NVX_blend_equation_advanced_multi_draw_buffers, 
    GL_NVX_conditional_render, GL_NVX_gpu_memory_info, GL_NVX_nvenc_interop, 
    GL_NV_ES1_1_compatibility, GL_NV_ES3_1_compatibility, 
    GL_NV_alpha_to_coverage_dither_control, GL_NV_bindless_multi_draw_indirect, 
    GL_NV_bindless_multi_draw_indirect_count, GL_NV_bindless_texture, 
    GL_NV_blend_equation_advanced, GL_NV_blend_equation_advanced_coherent, 
    GL_NV_blend_minmax_factor, GL_NV_blend_square, GL_NV_clip_space_w_scaling, 
    GL_NV_command_list, GL_NV_compute_program5, GL_NV_conditional_render, 
    GL_NV_conservative_raster, GL_NV_conservative_raster_dilate, 
    GL_NV_conservative_raster_pre_snap_triangles, GL_NV_copy_depth_to_color, 
    GL_NV_copy_image, GL_NV_depth_buffer_float, GL_NV_depth_clamp, 
    GL_NV_draw_texture, GL_NV_draw_vulkan_image, GL_NV_explicit_multisample, 
    GL_NV_feature_query, GL_NV_fence, GL_NV_fill_rectangle, 
    GL_NV_float_buffer, GL_NV_fog_distance, GL_NV_fragment_coverage_to_color, 
    GL_NV_fragment_program, GL_NV_fragment_program2, 
    GL_NV_fragment_program_option, GL_NV_fragment_shader_interlock, 
    GL_NV_framebuffer_mixed_samples, GL_NV_framebuffer_multisample_coverage, 
    GL_NV_geometry_shader4, GL_NV_geometry_shader_passthrough, 
    GL_NV_gpu_program4, GL_NV_gpu_program4_1, GL_NV_gpu_program5, 
    GL_NV_gpu_program5_mem_extended, GL_NV_gpu_program_fp64, 
    GL_NV_gpu_shader5, GL_NV_half_float, GL_NV_internalformat_sample_query, 
    GL_NV_light_max_exponent, GL_NV_memory_attachment, 
    GL_NV_multisample_coverage, GL_NV_multisample_filter_hint, 
    GL_NV_occlusion_query, GL_NV_packed_depth_stencil, 
    GL_NV_parameter_buffer_object, GL_NV_parameter_buffer_object2, 
    GL_NV_path_rendering, GL_NV_path_rendering_shared_edge, 
    GL_NV_pixel_data_range, GL_NV_point_sprite, GL_NV_primitive_restart, 
    GL_NV_query_resource, GL_NV_query_resource_tag, GL_NV_register_combiners, 
    GL_NV_register_combiners2, GL_NV_robustness_video_memory_purge, 
    GL_NV_sample_locations, GL_NV_sample_mask_override_coverage, 
    GL_NV_shader_atomic_counters, GL_NV_shader_atomic_float, 
    GL_NV_shader_atomic_float64, GL_NV_shader_atomic_fp16_vector, 
    GL_NV_shader_atomic_int64, GL_NV_shader_buffer_load, 
    GL_NV_shader_storage_buffer_object, GL_NV_shader_thread_group, 
    GL_NV_shader_thread_shuffle, GL_NV_stereo_view_rendering, 
    GL_NV_texgen_reflection, GL_NV_texture_barrier, 
    GL_NV_texture_compression_vtc, GL_NV_texture_env_combine4, 
    GL_NV_texture_multisample, GL_NV_texture_rectangle, 
    GL_NV_texture_rectangle_compressed, GL_NV_texture_shader, 
    GL_NV_texture_shader2, GL_NV_texture_shader3, GL_NV_transform_feedback, 
    GL_NV_transform_feedback2, GL_NV_uniform_buffer_unified_memory, 
    GL_NV_vdpau_interop, GL_NV_vdpau_interop2, GL_NV_vertex_array_range, 
    GL_NV_vertex_array_range2, GL_NV_vertex_attrib_integer_64bit, 
    GL_NV_vertex_buffer_unified_memory, GL_NV_vertex_program, 
    GL_NV_vertex_program1_1, GL_NV_vertex_program2, 
    GL_NV_vertex_program2_option, GL_NV_vertex_program3, 
    GL_NV_viewport_array2, GL_NV_viewport_swizzle, GL_OVR_multiview, 
    GL_OVR_multiview2, GL_S3_s3tc, GL_SGIS_generate_mipmap, 
    GL_SGIS_texture_lod, GL_SGIX_depth_texture, GL_SGIX_shadow, 
    GL_SUN_slice_accum

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 430.26
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:
    GL_ANDROID_extension_pack_es31a, GL_EXT_EGL_image_external_wrap_modes, 
    GL_EXT_base_instance, GL_EXT_blend_func_extended, GL_EXT_blend_minmax, 
    GL_EXT_buffer_storage, GL_EXT_clear_texture, GL_EXT_clip_control, 
    GL_EXT_clip_cull_distance, GL_EXT_color_buffer_float, 
    GL_EXT_color_buffer_half_float, GL_EXT_compressed_ETC1_RGB8_sub_texture, 
    GL_EXT_conservative_depth, GL_EXT_copy_image, GL_EXT_debug_label, 
    GL_EXT_discard_framebuffer, GL_EXT_disjoint_timer_query, 
    GL_EXT_draw_buffers_indexed, GL_EXT_draw_elements_base_vertex, 
    GL_EXT_draw_transform_feedback, GL_EXT_float_blend, GL_EXT_frag_depth, 
    GL_EXT_geometry_point_size, GL_EXT_geometry_shader, GL_EXT_gpu_shader5, 
    GL_EXT_map_buffer_range, GL_EXT_memory_object, GL_EXT_memory_object_fd, 
    GL_EXT_multi_draw_indirect, GL_EXT_multisample_compatibility, 
    GL_EXT_multisampled_render_to_texture, 
    GL_EXT_multisampled_render_to_texture2, GL_EXT_occlusion_query_boolean, 
    GL_EXT_polygon_offset_clamp, GL_EXT_post_depth_coverage, 
    GL_EXT_primitive_bounding_box, GL_EXT_raster_multisample, 
    GL_EXT_render_snorm, GL_EXT_robustness, GL_EXT_sRGB, 
    GL_EXT_sRGB_write_control, GL_EXT_semaphore, GL_EXT_semaphore_fd, 
    GL_EXT_separate_shader_objects, GL_EXT_shader_group_vote, 
    GL_EXT_shader_implicit_conversions, GL_EXT_shader_integer_mix, 
    GL_EXT_shader_io_blocks, GL_EXT_shader_non_constant_global_initializers, 
    GL_EXT_shader_texture_lod, GL_EXT_shadow_samplers, GL_EXT_sparse_texture, 
    GL_EXT_sparse_texture2, GL_EXT_tessellation_point_size, 
    GL_EXT_tessellation_shader, GL_EXT_texture_border_clamp, 
    GL_EXT_texture_buffer, GL_EXT_texture_compression_bptc, 
    GL_EXT_texture_compression_dxt1, GL_EXT_texture_compression_rgtc, 
    GL_EXT_texture_compression_s3tc, GL_EXT_texture_cube_map_array, 
    GL_EXT_texture_filter_anisotropic, GL_EXT_texture_filter_minmax, 
    GL_EXT_texture_format_BGRA8888, GL_EXT_texture_mirror_clamp_to_edge, 
    GL_EXT_texture_norm16, GL_EXT_texture_rg, GL_EXT_texture_sRGB_R8, 
    GL_EXT_texture_sRGB_decode, GL_EXT_texture_storage, GL_EXT_texture_view, 
    GL_EXT_unpack_subimage, GL_EXT_window_rectangles, 
    GL_KHR_blend_equation_advanced, GL_KHR_blend_equation_advanced_coherent, 
    GL_KHR_context_flush_control, GL_KHR_debug, GL_KHR_no_error, 
    GL_KHR_parallel_shader_compile, GL_KHR_robust_buffer_access_behavior, 
    GL_KHR_robustness, GL_NVX_blend_equation_advanced_multi_draw_buffers, 
    GL_NV_bgr, GL_NV_bindless_texture, GL_NV_blend_equation_advanced, 
    GL_NV_blend_equation_advanced_coherent, GL_NV_blend_minmax_factor, 
    GL_NV_clip_space_w_scaling, GL_NV_conditional_render, 
    GL_NV_conservative_raster, GL_NV_conservative_raster_pre_snap_triangles, 
    GL_NV_copy_buffer, GL_NV_copy_image, GL_NV_draw_buffers, 
    GL_NV_draw_instanced, GL_NV_draw_texture, GL_NV_draw_vulkan_image, 
    GL_NV_explicit_attrib_location, GL_NV_fbo_color_attachments, 
    GL_NV_fill_rectangle, GL_NV_fragment_coverage_to_color, 
    GL_NV_fragment_shader_interlock, GL_NV_framebuffer_blit, 
    GL_NV_framebuffer_mixed_samples, GL_NV_framebuffer_multisample, 
    GL_NV_generate_mipmap_sRGB, GL_NV_geometry_shader_passthrough, 
    GL_NV_gpu_shader5, GL_NV_image_formats, GL_NV_instanced_arrays, 
    GL_NV_internalformat_sample_query, GL_NV_memory_attachment, 
    GL_NV_non_square_matrices, GL_NV_occlusion_query_samples, 
    GL_NV_pack_subimage, GL_NV_packed_float, GL_NV_packed_float_linear, 
    GL_NV_path_rendering, GL_NV_path_rendering_shared_edge, 
    GL_NV_pixel_buffer_object, GL_NV_polygon_mode, GL_NV_read_buffer, 
    GL_NV_read_depth, GL_NV_read_depth_stencil, GL_NV_read_stencil, 
    GL_NV_sRGB_formats, GL_NV_sample_locations, 
    GL_NV_sample_mask_override_coverage, GL_NV_shader_atomic_fp16_vector, 
    GL_NV_shader_noperspective_interpolation, GL_NV_shadow_samplers_array, 
    GL_NV_shadow_samplers_cube, GL_NV_stereo_view_rendering, 
    GL_NV_texture_array, GL_NV_texture_barrier, GL_NV_texture_border_clamp, 
    GL_NV_texture_compression_latc, GL_NV_texture_compression_s3tc, 
    GL_NV_texture_compression_s3tc_update, GL_NV_timer_query, 
    GL_NV_viewport_array, GL_NV_viewport_array2, GL_NV_viewport_swizzle, 
    GL_OES_compressed_ETC1_RGB8_texture, GL_OES_copy_image, GL_OES_depth24, 
    GL_OES_depth32, GL_OES_depth_texture, GL_OES_depth_texture_cube_map, 
    GL_OES_draw_buffers_indexed, GL_OES_draw_elements_base_vertex, 
    GL_OES_element_index_uint, GL_OES_fbo_render_mipmap, 
    GL_OES_geometry_point_size, GL_OES_geometry_shader, 
    GL_OES_get_program_binary, GL_OES_gpu_shader5, GL_OES_mapbuffer, 
    GL_OES_packed_depth_stencil, GL_OES_primitive_bounding_box, 
    GL_OES_rgb8_rgba8, GL_OES_sample_shading, GL_OES_sample_variables, 
    GL_OES_shader_image_atomic, GL_OES_shader_io_blocks, 
    GL_OES_shader_multisample_interpolation, GL_OES_standard_derivatives, 
    GL_OES_tessellation_point_size, GL_OES_tessellation_shader, 
    GL_OES_texture_border_clamp, GL_OES_texture_buffer, 
    GL_OES_texture_cube_map_array, GL_OES_texture_float, 
    GL_OES_texture_float_linear, GL_OES_texture_half_float, 
    GL_OES_texture_half_float_linear, GL_OES_texture_npot, 
    GL_OES_texture_stencil8, GL_OES_texture_storage_multisample_2d_array, 
    GL_OES_texture_view, GL_OES_vertex_array_object, GL_OES_vertex_half_float, 
    GL_OES_viewport_array, GL_OVR_multiview, GL_OVR_multiview2, 
    GL_OVR_multiview_multisampled_render_to_texture

@archibate
Copy link
Collaborator Author

Seems your version of GLSL can't deal with macros well, that they don't obey the expected behavior in ordinal C...

124 #define _At_(x) _Ax_(_at_##x(x))
125 #define _Atmf_Def(Add, _f_, _o_, mem, _32, float) float atomic##Add##_##mem##_f##_32(int addr, float rhs) {   int old, new, ret;   do {     old = _##mem##_i##_32(addr);     new = floatBitsToInt(_f_(intBitsToFloat(old) _o_ rhs));   } while (old != atomicCompSwap(_Ax_(_##mem##_i##_32(addr)), old, new));   return intBitsToFloat(old); }

Relied too much on macro system... GLSL don't have pointers, so I have to use macro to hold different memory types, which is very like virtual methods in C++, (eg. global_tmp, external_ptr, root_buf, all of which have different atomic functions)...

@archibate
Copy link
Collaborator Author

My:

OpenGL version string: 3.0 Mesa 19.3.4
OpenGL shading language version string: 1.30

Your:

OpenGL version string: 4.6.0 NVIDIA 430.26
OpenGL shading language version string: 4.60 NVIDIA

@archibate archibate changed the title [opengl] use GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS for portability [opengl] portability&compatibility improvements Mar 27, 2020
@archibate
Copy link
Collaborator Author

Fixed by reduce macro mech dep, could you test it now?

@archibate archibate requested a review from yuanming-hu March 27, 2020 17:28
@archibate archibate changed the title [opengl] portability&compatibility improvements [OpenGL] portability&compatibility improvements to ship it into taichi-nightly Mar 27, 2020
@archibate archibate changed the title [OpenGL] portability&compatibility improvements to ship it into taichi-nightly [OpenGL] portability&compatibility improvements to ship it with taichi-nightly Mar 27, 2020
@archibate
Copy link
Collaborator Author

Gonna sleepy, good night! Tell me if you run into a random one tmr!

@yuanming-hu
Copy link
Member

Thank you!! Now it compiles, yet the simulation gets stuck at the initial state:
Screenshot from 2020-03-27 13-34-33

@archibate
Copy link
Collaborator Author

archibate commented Mar 28, 2020

60fps?? Can you insert some print's to check if substep did execute?
Also may enable #define _GLSL_DEBUG to see /tmp/substep.comp.
I'm thinking about how to create that environment...

@archibate
Copy link
Collaborator Author

archibate commented Mar 28, 2020

May be we should first run ti test to test if gl function complete instead of rushing towards mpm99.
This maybe due to atomic can't write back result, try ti -v -a opengl test atomic.

[skip ci] fix typo

[skip ci] fix again
@yuanming-hu
Copy link
Member

Looks good to me in general. I think we are very close to v0.6!

@archibate
Copy link
Collaborator Author

Trying to make rand seed different in each kernel call, not done yet, see you tmr.

@yuanming-hu
Copy link
Member

Trying to make rand seed different in each kernel call, not done yet, see you tmr.

Actually I suggest we implement additional features (random seeds, multiple external arrays) in different PRs to make this one small and manageable...

@archibate
Copy link
Collaborator Author

Actually I suggest we implement additional features (random seeds, multiple external arrays) in different PRs to make this one small and manageable...

Right, I'll focus this PR on making it work for NV GL, thank for the reminder.

@archibate archibate requested a review from yuanming-hu April 2, 2020 03:46
Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just two minor places to fix now and I'll merge it afterward.

taichi/util/short_name.cpp Show resolved Hide resolved
new = floatBitsToInt((intBitsToFloat(old) + rhs));
} while (old != atomicCompSwap(_data_i32_[addr], old, new));
return intBitsToFloat(old);
} float atomicSub_data_f32(int addr, float rhs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adopting the STR macro! Now it looks much cleaner. It seems that clang-format is slightly confused by it and you'll need to turn it on/off as in

so that it will format the contents correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should guard like:

// clang-format off
STR(...)
// clang-format on

To make clang-format not formatting my glsl code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should turn it on inside STR:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably have already noticed this, but using STR() will flatten all your code into one line. So your OpenGl shader source code becomes harder to read.. E.g. this is what i got in Metal:

template <typename T, typename G> T union_cast(G g) { static_assert(sizeof(T) == sizeof(G), "Size mismatch"); return *reinterpret_cast<thread const T *>(&g); } inline int ifloordiv(int lhs, int rhs) { const int intm = (lhs / rhs); return (((lhs * rhs < 0) && (rhs * intm != lhs)) ? (intm - 1) : intm); } int32_t pow_i32(int32_t x, int32_t n) { int32_t tmp = x; int32_t ans = 1; while (n) { if (n & 1) ans *= tmp; tmp *= tmp; n >>= 1; } return ans; } float fatomic_fetch_add(device float *dest, const float operand) { bool ok = false; float old_val = 0.0f; while (!ok) { old_val = *dest; float new_val = (old_val + operand); ok = atomic_compare_exchange_weak_explicit( (device atomic_int *)dest, (thread int *)(&old_val), *((thread int *)(&new_val)), metal::memory_order_relaxed, metal::memory_order_relaxed); } return old_val; } float fatomic_fetch_min(device float *dest, const float operand) { bool ok = false; float old_val = 0.0f; while (!ok) { old_val = *dest; float new_val = (old_val < operand) ? old_val : operand; ok = atomic_compare_exchange_weak_explicit( (device atomic_int *)dest, (thread int *)(&old_val), *((thread int *)(&new_val)), metal::memory_order_relaxed, metal::memory_order_relaxed); } return old_val; } float fatomic_fetch_max(device float *dest, const float operand) { bool ok = false; float old_val = 0.0f; while (!ok) { old_val = *dest; float new_val = (old_val > operand) ? old_val : operand; ok = atomic_compare_exchange_weak_explicit( (device atomic_int *)dest, (thread int *)(&old_val), *((thread int *)(&new_val)), metal::memory_order_relaxed, metal::memory_order_relaxed); } return old_val; }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, but as long as we can compile, this is good. Also we'd better use /* clang-format on */ inside STR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit more investigation and unfortunately, there doesn't seem to be a way to perfectly resolve this (preserve change-of-line and spaces, while enabling clang-format) within C++.

It is possible to resolve this using a custom command in CMake, but that's a slightly more complex solution. Let's stick to the current solution until some better reason appears to force us to switch to the CMake + manual preprocessing approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, i searched for something like keeping the newlines inside macros before, but found no solution :( Like @archibate pointed out, so long as the compiler doesn't complain about the source code, we are good..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll take a final pass and if there're no other issues this will be merged in.

@archibate archibate requested review from yuanming-hu and removed request for yuanming-hu April 2, 2020 04:19
@archibate archibate requested a review from yuanming-hu April 2, 2020 04:27
@yuanming-hu
Copy link
Member

I'm going to sleep now, but I'll come back to this first thing tomorrow. Thanks so much for your hard work!

@archibate
Copy link
Collaborator Author

archibate commented Apr 2, 2020

What NV users thinking: https://community.khronos.org/t/precompiled-glsl/43095

@yuanming-hu yuanming-hu changed the title [OpenGL] portability&compatibility improvements to ship it with taichi-nightly [OpenGL] Support NVIDIA GLSL compiler Apr 2, 2020
@archibate archibate requested a review from yuanming-hu April 2, 2020 15:43
@yuanming-hu yuanming-hu merged commit 9536941 into taichi-dev:master Apr 2, 2020
@@ -30,7 +30,7 @@ class OpenglCodeGen {

Program *prog_;
Kernel *kernel_;
const StructCompiledResult *struct_compiled_;
StructCompiledResult *struct_compiled_;
Copy link
Member

@yuanming-hu yuanming-hu Apr 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@archibate let's add [[maybe_unused]] to StructCompiledResult *struct_compiled_, since it generates a warning on non-OpenGL build. Feel free to do this in any OpenGL-related PR since it's just one line deletion. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants