-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenGL] Support NVIDIA GLSL compiler #666
Conversation
Please checkout this branch and see if you get fixed this issue. |
@yuanming-hu See if you can reproduce that now. |
Thanks. Now I get
|
What's your OpenGL implementation? My is mesa. |
My GL config:
|
Seems your version of GLSL can't deal with macros well, that they don't obey the expected behavior in ordinal C...
Relied too much on macro system... GLSL don't have pointers, so I have to use macro to hold different memory types, which is very like virtual methods in C++, (eg. global_tmp, external_ptr, root_buf, all of which have different atomic functions)... |
My:
Your:
|
Fixed by reduce macro mech dep, could you test it now? |
Gonna sleepy, good night! Tell me if you run into a random one tmr! |
60fps?? Can you insert some |
May be we should first run |
[skip ci] fix typo [skip ci] fix again
Looks good to me in general. I think we are very close to v0.6! |
Trying to make rand seed different in each kernel call, not done yet, see you tmr. |
Actually I suggest we implement additional features (random seeds, multiple external arrays) in different PRs to make this one small and manageable... |
Right, I'll focus this PR on making it work for NV GL, thank for the reminder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just two minor places to fix now and I'll merge it afterward.
taichi/codegen/codegen_opengl.cpp
Outdated
new = floatBitsToInt((intBitsToFloat(old) + rhs)); | ||
} while (old != atomicCompSwap(_data_i32_[addr], old, new)); | ||
return intBitsToFloat(old); | ||
} float atomicSub_data_f32(int addr, float rhs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adopting the STR
macro! Now it looks much cleaner. It seems that clang-format is slightly confused by it and you'll need to turn it on/off as in
// clang-format off |
so that it will format the contents correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we should guard like:
// clang-format off
STR(...)
// clang-format on
To make clang-format not formatting my glsl code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we should turn it on inside STR
:
// clang-format on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably have already noticed this, but using STR()
will flatten all your code into one line. So your OpenGl shader source code becomes harder to read.. E.g. this is what i got in Metal:
template <typename T, typename G> T union_cast(G g) { static_assert(sizeof(T) == sizeof(G), "Size mismatch"); return *reinterpret_cast<thread const T *>(&g); } inline int ifloordiv(int lhs, int rhs) { const int intm = (lhs / rhs); return (((lhs * rhs < 0) && (rhs * intm != lhs)) ? (intm - 1) : intm); } int32_t pow_i32(int32_t x, int32_t n) { int32_t tmp = x; int32_t ans = 1; while (n) { if (n & 1) ans *= tmp; tmp *= tmp; n >>= 1; } return ans; } float fatomic_fetch_add(device float *dest, const float operand) { bool ok = false; float old_val = 0.0f; while (!ok) { old_val = *dest; float new_val = (old_val + operand); ok = atomic_compare_exchange_weak_explicit( (device atomic_int *)dest, (thread int *)(&old_val), *((thread int *)(&new_val)), metal::memory_order_relaxed, metal::memory_order_relaxed); } return old_val; } float fatomic_fetch_min(device float *dest, const float operand) { bool ok = false; float old_val = 0.0f; while (!ok) { old_val = *dest; float new_val = (old_val < operand) ? old_val : operand; ok = atomic_compare_exchange_weak_explicit( (device atomic_int *)dest, (thread int *)(&old_val), *((thread int *)(&new_val)), metal::memory_order_relaxed, metal::memory_order_relaxed); } return old_val; } float fatomic_fetch_max(device float *dest, const float operand) { bool ok = false; float old_val = 0.0f; while (!ok) { old_val = *dest; float new_val = (old_val > operand) ? old_val : operand; ok = atomic_compare_exchange_weak_explicit( (device atomic_int *)dest, (thread int *)(&old_val), *((thread int *)(&new_val)), metal::memory_order_relaxed, metal::memory_order_relaxed); } return old_val; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, but as long as we can compile, this is good. Also we'd better use /* clang-format on */
inside STR
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a bit more investigation and unfortunately, there doesn't seem to be a way to perfectly resolve this (preserve change-of-line and spaces, while enabling clang-format) within C++.
It is possible to resolve this using a custom command in CMake, but that's a slightly more complex solution. Let's stick to the current solution until some better reason appears to force us to switch to the CMake + manual preprocessing approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, i searched for something like keeping the newlines inside macros before, but found no solution :( Like @archibate pointed out, so long as the compiler doesn't complain about the source code, we are good..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll take a final pass and if there're no other issues this will be merged in.
I'm going to sleep now, but I'll come back to this first thing tomorrow. Thanks so much for your hard work! |
What NV users thinking: https://community.khronos.org/t/precompiled-glsl/43095 |
@@ -30,7 +30,7 @@ class OpenglCodeGen { | |||
|
|||
Program *prog_; | |||
Kernel *kernel_; | |||
const StructCompiledResult *struct_compiled_; | |||
StructCompiledResult *struct_compiled_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@archibate let's add [[maybe_unused]]
to StructCompiledResult *struct_compiled_
, since it generates a warning on non-OpenGL build. Feel free to do this in any OpenGL-related PR since it's just one line deletion. Thanks!
Related issue = #633 #664
[Click here for the format server]