-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native layer norm gpu kernel conflicts return dtypes. #1623
base: main
Are you sure you want to change the base?
Conversation
@ramiro050 Any comments here? |
@silvasean could you take a look? |
This patch is not correct -- the shape transfer function cannot look at the return dtypes (there is no concept of "override"). Is the IR you are showing even valid? Either our dtype inference is wrong, or the the IR you are showing is invalid because it has a contradictory dtype. |
@silvasean This IR is a valid IR—generates from cuda version of stable diffusion's unet. |
Also, we are only checking the dtypes for the cpu version. |
// Override if the result dtype is already known. | ||
if (op->getResult(2).getType().cast<BaseTensorType>().hasDtype()) | ||
result2Knowledge.dtype = | ||
op->getResult(2).getType().cast<BaseTensorType>().getDtype(); | ||
incorporateKnowledge(op->getResult(0), result0Knowledge); | ||
incorporateKnowledge(op->getResult(1), result1Knowledge); | ||
incorporateKnowledge(op->getResult(2), result1Knowledge); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be result2Knowledge
It seems that the special case only happens when the input is |
Sure, I can go ahead with that. |
@pashu123, will this patch be testable e2e once your patch for f16 support lands? |
Nope, this will require torch's cuda version. Native layer norm with torch.half type is not supported on torch's cpu version. Let me know if you want me to add some other test. |
I see. We will first need to develop a way of testing these type of ops before landing this patch. @powderluv, would it be possible to have a CI that has access to a GPU? cc: @silvasean |
I've created an issue to track the e2e testing on GPU, so that this PR does not get filled with unrelated comments. |
yes easy to add a GPU builder/tester. However there is logistics / cost with standing it up since it has to be done by someone with admin access to the LLVM project. We can find a gpu VM for the purpose. However -- another option would be to mark the GPU tests experimental and not run as part of the CI but available for anyone to try locally. |
%result0, %result1, %result2 = torch.aten.native_layer_norm %821, %822, %668, %667, %float1.000000e-05 : !torch.tensor<[2,4096,320],f16>, !torch.list<int>, !torch.tensor, !torch.tensor, !torch.float -> !torch.tensor<[2,4096,320],f16>, !torch.tensor<[2,4096,1],f32>, !torch.tensor<[2,4096,1],f32>
The above IR is obtained, and the shape/dtype information is not dropped. The second and third result dtypes are not the same as self dtype.