Native layer norm gpu kernel conflicts return dtypes. #1623

pashu123 · 2022-11-21T15:47:25Z

%result0, %result1, %result2 = torch.aten.native_layer_norm %821, %822, %668, %667, %float1.000000e-05 : !torch.tensor<[2,4096,320],f16>, !torch.list<int>, !torch.tensor, !torch.tensor, !torch.float -> !torch.tensor<[2,4096,320],f16>, !torch.tensor<[2,4096,1],f32>, !torch.tensor<[2,4096,1],f32>

The above IR is obtained, and the shape/dtype information is not dropped. The second and third result dtypes are not the same as self dtype.

pashu123 · 2022-11-23T06:44:38Z

@ramiro050 Any comments here?

pashu123 · 2022-11-28T16:49:00Z

@silvasean could you take a look?

silvasean · 2022-11-28T16:55:15Z

This patch is not correct -- the shape transfer function cannot look at the return dtypes (there is no concept of "override"). Is the IR you are showing even valid? Either our dtype inference is wrong, or the the IR you are showing is invalid because it has a contradictory dtype.

pashu123 · 2022-11-28T17:15:10Z

@silvasean This IR is a valid IR—generates from cuda version of stable diffusion's unet.

pashu123 · 2022-11-28T17:27:31Z

Also, we are only checking the dtypes for the cpu version.

ramiro050 · 2022-11-28T19:23:22Z

lib/Dialect/Torch/Transforms/RefineTypes.cpp

+    // Override if the result dtype is already known.
+    if (op->getResult(2).getType().cast<BaseTensorType>().hasDtype())
+      result2Knowledge.dtype =
+          op->getResult(2).getType().cast<BaseTensorType>().getDtype();
    incorporateKnowledge(op->getResult(0), result0Knowledge);
    incorporateKnowledge(op->getResult(1), result1Knowledge);
    incorporateKnowledge(op->getResult(2), result1Knowledge);


this should be result2Knowledge

ramiro050 · 2022-11-28T19:25:21Z

It seems that the special case only happens when the input is torch.float16. The approach should be to check when self.dtype is that type and set the results types to the right types, rather than looking at the value of op->getResult(..)

pashu123 · 2022-11-28T19:43:00Z

It seems that the special case only happens when the input is torch.float16. The approach should be to check when self.dtype is that type and set the results types to the right types, rather than looking at the value of op->getResult(..)

Sure, I can go ahead with that.

ramiro050 · 2022-12-01T18:18:10Z

@pashu123, will this patch be testable e2e once your patch for f16 support lands?

pashu123 · 2022-12-01T18:34:18Z

@pashu123, will this patch be testable e2e once your patch for f16 support lands?

Nope, this will require torch's cuda version. Native layer norm with torch.half type is not supported on torch's cpu version. Let me know if you want me to add some other test.

ramiro050 · 2022-12-01T19:20:04Z

Nope, this will require torch's cuda version. Native layer norm with torch.half type is not supported on torch's cpu version. Let me know if you want me to add some other test.

I see. We will first need to develop a way of testing these type of ops before landing this patch. @powderluv, would it be possible to have a CI that has access to a GPU?

cc: @silvasean

ramiro050 · 2022-12-01T19:26:00Z

I've created an issue to track the e2e testing on GPU, so that this PR does not get filled with unrelated comments.

powderluv · 2022-12-01T19:32:31Z

yes easy to add a GPU builder/tester. However there is logistics / cost with standing it up since it has to be done by someone with admin access to the LLVM project. We can find a gpu VM for the purpose.

However -- another option would be to mark the GPU tests experimental and not run as part of the CI but available for anyone to try locally.

pashu123 requested a review from ramiro050 November 21, 2022 15:47

pashu123 requested a review from silvasean November 25, 2022 14:11

ramiro050 requested changes Nov 28, 2022

View reviewed changes

Native layer norm gpu kernel conflicts return dtypes.

81f10d7

pashu123 force-pushed the rf16 branch from f724f4b to 81f10d7 Compare November 30, 2022 18:57

pashu123 requested a review from ramiro050 November 30, 2022 18:59

ramiro050 mentioned this pull request Dec 7, 2022

Some ops that take f16 tensor inputs require GPU to run in e2e tests #1669

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native layer norm gpu kernel conflicts return dtypes. #1623

Native layer norm gpu kernel conflicts return dtypes. #1623

pashu123 commented Nov 21, 2022

pashu123 commented Nov 23, 2022

pashu123 commented Nov 28, 2022

silvasean commented Nov 28, 2022

pashu123 commented Nov 28, 2022

pashu123 commented Nov 28, 2022

ramiro050 Nov 28, 2022

ramiro050 commented Nov 28, 2022

pashu123 commented Nov 28, 2022

ramiro050 commented Dec 1, 2022

pashu123 commented Dec 1, 2022

ramiro050 commented Dec 1, 2022

ramiro050 commented Dec 1, 2022

powderluv commented Dec 1, 2022

Native layer norm gpu kernel conflicts return dtypes. #1623

Are you sure you want to change the base?

Native layer norm gpu kernel conflicts return dtypes. #1623

Conversation

pashu123 commented Nov 21, 2022

pashu123 commented Nov 23, 2022

pashu123 commented Nov 28, 2022

silvasean commented Nov 28, 2022

pashu123 commented Nov 28, 2022

pashu123 commented Nov 28, 2022

ramiro050 Nov 28, 2022

Choose a reason for hiding this comment

ramiro050 commented Nov 28, 2022

pashu123 commented Nov 28, 2022

ramiro050 commented Dec 1, 2022

pashu123 commented Dec 1, 2022

ramiro050 commented Dec 1, 2022

ramiro050 commented Dec 1, 2022

powderluv commented Dec 1, 2022