-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promote bitfieldExtract
and bitfieldInsert
to become Slang intrinsics
#5020
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to come with a test.
Agreed. Looks like the current test suite is passing which is good. I’ll see if I can write something up which tests on a couple different target backends. |
@csyonghe What's the correct approach to constructing vectors for the CUDA backend? bitfieldExtract is a component-wise intrinsic, so I need to support a function like this as a right hand side which can be assigned to the left:
However, every time I try to construct an int4 from
I've tried constructing an int4 as I could try emitting a |
For the time being, looks like a special case lookup table works for CUDA. |
On adding these tests, I noticed that the current logic in glsl.meta.slang for bitfield extraction in the signed integer case was incorrect. For signed integers, the extracted bits are expected to be sign-extended, but the fallback for this wasn’t doing sign extension. The current logic now correctly accounts for this, and I have some explicit tests to back that up now too. |
@@ -0,0 +1,56 @@ | |||
//TEST(compute):COMPARE_COMPUTE(filecheck-buffer=CHECK):-cpu -output-using-type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to cover all the other targets, with this line duplicated where -cpu
is replaced with -vk
, -d3d
, -metal
and -cuda
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add these in and see if the existing tests pass. We might also want to consider adding overrides for metal and cuda if these languages have direct bitfield extraction intrinsics.
For CUDA, at least for scalars, we could be using the following PTX:
unsigned int bitfieldExtract(unsigned int val, int pos, int len) {
unsigned int r; asm("bfe.u32 %0, %1, %2, %3;" : "=r"(r) : "r"(val), "r"(pos), "r"(len));
return r;
}
unsigned int bitfieldInsert(unsigned int src, unsigned int dst, int pos, int len) {
unsigned int r;
asm("bfi.b32 %0, %1, %2, %3, %4;" : "=r"(r) : "r"(src), "r"(dst), "r"(pos), "r"(len));
return r;
}
We'd need to add a .s32
for the sign extension case, but that's not bad. I can look into a solution for Metal as well, but hopefully in the meantime the c-like fallback will work there too.
Looks like there are some newly failing tests on my end after adding these flags for the other targets. I'll see if I can figure out what's going on. For i32 extract
For i64 extract
The same failure patterns occur for i32/64 insertion. I'm not sure yet about for metal. Hopefully the CI will give us some answers there. |
I'm a bit confused with the Vulkan spec... I think I might have found a typo?... According to the spec:
This appears even if I require |
It appears the two outstanding issues are:
This appears to be a regression caused by this PR from Samsung: KhronosGroup/SPIRV-Tools#4758. Apparently their hardware does not support 64-bit values for certain intrinsics.
So for 1, unfortunately because of some restrictions with Samsung's HW, there is no way to really tap into the fast path HW on NVIDIA architectures for non-32-bit types with bitfield extraction / insertion. So I'll have to think about some workaround for non-32-bit types... For 2, this seems like a bug with the current Slang Metal backend, and isn't directly tied to my changes as far as I can tell. @csyonghe For 2, do you know what's causing the release build on MacOS to fail with |
You can add -skip-spirv-validation in the //TEST line to disable validation for that specific test. You can search for it in the codebase to see how it is used elsewhere for similar situations. |
Only release build runs the full suite of tests, debug build doesn't run most tests. |
I did try to use this flag like so: But this throws an error that The other test using this flag follows a different pattern: |
For compute tests, you need -slang -skip-spirv-validation. |
…g into natevm-bitfield-instructions
@csyonghe looks like this is the last major issue at the moment. I don't have a Mac, so I can't reproduce this locally to debug it. Looks like 64-bit integer vectors broken in general in the Metal backend. Perhaps we should file an issue to resolve that. |
You should be able to fix this by changing the way we emit vector types when the element type is 64bit int in slang-emit-metal.cpp, look for kIROp_VectorType in that file to find the place to emit the type. You can test this locally on a windows machine by adding a |
…g into natevm-bitfield-instructions
@csyonghe looks like all relevant tests are passing. There’s one windows CI machine which is reporting that bash is missing, but I see the same error on ToT, so I’m assuming that’s okay to see. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick pass over braces
m_writer->emitRawText(std::to_string(N).c_str()); | ||
} | ||
// Special handling required for Metal target | ||
else if (isMetalTarget(getTargetReq())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need this, and instead we should make sure the metal backend can emit all vector types correctly.
return true; | ||
} | ||
|
||
void CLikeSourceEmitter::emitVecNOrScalar(IRVectorType* vectorType, std::function<void()> emitComponentLogic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming suggestion: maybeEmitMakeVectorFromScalar
. Instead of doing call-back style with std::function (which is a virtual function call), just use bool maybeEmitMakeVectorFromScalar(IRType* type);
and void maybeCloseMakeVectorFromScalar(bool v)
.
@@ -538,7 +538,8 @@ bool MetalSourceEmitter::tryEmitInstExprImpl(IRInst* inst, const EmitOpInfo& inO | |||
|
|||
void MetalSourceEmitter::emitVectorTypeNameImpl(IRType* elementType, IRIntegerValue elementCount) | |||
{ | |||
emitSimpleTypeImpl(elementType); | |||
// NM: Passing count here, as Metal 64-bit vector type names do not match their scalar equivalents. | |||
emitSimpleTypeKnowingCount(elementType, elementCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can special case the 64bit types in the switch statement below, and emit vec<T, N>
instead of TN
for those types. We shouldn't need the emitSimpleTypeKnowingCount()
change.
Apologies on the delay here. Been a bit tied up with a move between states. I should have a bit more time to work on this in the upcoming week. |
|
For context, see the related issue here
In short, bitfield insertion and extraction are useful tools for reinterpreting structures where types have differing sizes, as they allow for simpler and more hardware-efficient instructions for byte manipulation.
Before this change,
bitfieldExtract
andbitfieldInsert
were only supported outside of the compiler, when using-allow-glsl
.Here, I'm introducing
KIROp_BitfieldExtract
andKIOROp_BitfieldInsert
instructions, which by default generate the original fallback bitfield manipulation logic but in CLikeSourceEmitter.Then, when emitting SPIR-V or GLSL, these compilation paths override this default logic to instead tap into more specialized intrinsics.