Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native layer norm gpu kernel conflicts return dtypes. #1623

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pashu123
Copy link
Member

%result0, %result1, %result2 = torch.aten.native_layer_norm %821, %822, %668, %667, %float1.000000e-05 : !torch.tensor<[2,4096,320],f16>, !torch.list<int>, !torch.tensor, !torch.tensor, !torch.float -> !torch.tensor<[2,4096,320],f16>, !torch.tensor<[2,4096,1],f32>, !torch.tensor<[2,4096,1],f32>

The above IR is obtained, and the shape/dtype information is not dropped. The second and third result dtypes are not the same as self dtype.

@pashu123 pashu123 requested a review from ramiro050 November 21, 2022 15:47
@pashu123
Copy link
Member Author

@ramiro050 Any comments here?

@pashu123 pashu123 requested a review from silvasean November 25, 2022 14:11
@pashu123
Copy link
Member Author

@silvasean could you take a look?

@silvasean
Copy link
Contributor

This patch is not correct -- the shape transfer function cannot look at the return dtypes (there is no concept of "override"). Is the IR you are showing even valid? Either our dtype inference is wrong, or the the IR you are showing is invalid because it has a contradictory dtype.

@pashu123
Copy link
Member Author

@silvasean This IR is a valid IR—generates from cuda version of stable diffusion's unet.

@pashu123
Copy link
Member Author

Also, we are only checking the dtypes for the cpu version.

// Override if the result dtype is already known.
if (op->getResult(2).getType().cast<BaseTensorType>().hasDtype())
result2Knowledge.dtype =
op->getResult(2).getType().cast<BaseTensorType>().getDtype();
incorporateKnowledge(op->getResult(0), result0Knowledge);
incorporateKnowledge(op->getResult(1), result1Knowledge);
incorporateKnowledge(op->getResult(2), result1Knowledge);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be result2Knowledge

@ramiro050
Copy link
Collaborator

It seems that the special case only happens when the input is torch.float16. The approach should be to check when self.dtype is that type and set the results types to the right types, rather than looking at the value of op->getResult(..)

@pashu123
Copy link
Member Author

It seems that the special case only happens when the input is torch.float16. The approach should be to check when self.dtype is that type and set the results types to the right types, rather than looking at the value of op->getResult(..)

Sure, I can go ahead with that.

@ramiro050
Copy link
Collaborator

@pashu123, will this patch be testable e2e once your patch for f16 support lands?

@pashu123
Copy link
Member Author

pashu123 commented Dec 1, 2022

@pashu123, will this patch be testable e2e once your patch for f16 support lands?

Nope, this will require torch's cuda version. Native layer norm with torch.half type is not supported on torch's cpu version. Let me know if you want me to add some other test.

@ramiro050
Copy link
Collaborator

Nope, this will require torch's cuda version. Native layer norm with torch.half type is not supported on torch's cpu version. Let me know if you want me to add some other test.

I see. We will first need to develop a way of testing these type of ops before landing this patch. @powderluv, would it be possible to have a CI that has access to a GPU?

cc: @silvasean

@ramiro050
Copy link
Collaborator

I've created an issue to track the e2e testing on GPU, so that this PR does not get filled with unrelated comments.

@powderluv
Copy link
Collaborator

yes easy to add a GPU builder/tester. However there is logistics / cost with standing it up since it has to be done by someone with admin access to the LLVM project. We can find a gpu VM for the purpose.

However -- another option would be to mark the GPU tests experimental and not run as part of the CI but available for anyone to try locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants