[QST] Is there any INT8 GEMM with INT8 alpha and beta? #1157

jhss · 2023-10-25T03:56:13Z

What is your question?

using Gemm = cutlass::gemm::device::Gemm<
      int8_t, cutlass::layout::RowMajor, int8_t, cutlass::layout::ColumnMajor,
      ElementOutput, cutlass::layout::RowMajor, ElementAccumulator,
      cutlass::arch::OpClassTensorOp, cutlass::arch::Sm75>;

cutlass::gemm::GemmCoord problem_size(M, N, K);

cutlass::TensorRef<ElementInputA, LayoutInputA> input_ref(input.data_ptr<int8_t>(), LayoutInputA::packed(input_size));
cutlass::TensorRef<ElementInputB, LayoutInputB> weight_ref(weight.data_ptr<int8_t>(), LayoutInputB::packed(weight_size));
cutlass::TensorRef<ElementOutput, LayoutOutput> out_ref(out.data_ptr<int8_t>(), LayoutOutput::packed(output_size));

typename Gemm::Arguments arguments{
      problem_size, // <- problem size of matrix multiplication
      input_ref,    // <- reference to matrix A on device
      weight_ref,   // <- reference to matrix B on device
      out_ref,      // <- reference to matrix C on device
      out_ref,      // <- reference to matrix D on device
      {alpha, beta}, 1};

In the code above, if I set alpha and beta as INT8, I got warning that narrowing conversion from int to float.

Does alpha and beta have to be float? I want to set it as INT8 to increase inference speed.

The text was updated successfully, but these errors were encountered:

hwu36 · 2023-10-25T15:59:16Z

just using 8 bit alpha/beta is not going to make performance difference.

mnicely · 2023-12-05T17:11:43Z

@jhss is your question resolved?

jhss · 2023-12-08T01:09:59Z

I want to know why 8 bit int alpha/beta doesn't effect performance

thakkarV · 2023-12-08T01:29:01Z

because shaving off 4 bytes to 1 byte for a single load per tile does not change the perf at all. Changing fp32 multiplication to int8 will also not move the needle too much in the grand scheme of things.. What is your problem size you are most interested in?

jhss · 2023-12-08T02:19:15Z

Thank you for answering.

I'm looking at smoothquant repository, they use matrix multiplication whose sizes are about (batch x 2048 x 768) * (768 x 768) in one layer. They perform matmul with torch_int, which use cutlass::eplilogue::thread::linear_combination at the end of matrix multiplication. By using this, they changed INT32 accumulator into float32 as follows:

weight_scale (alpha) * accumulator + bias_scale (beta) * bias

Suppose accumulator shape is (batch x 2048 x 768), then weight_scale shape is (2048, ), which is broadcasted and multiplied with accumulator. I'm just thinking that (1, 2048, 1) * (batch x 2048, 768) multiplication become faster when the type of weight_scale is changed from fp32 to int8.

thakkarV · 2023-12-08T02:31:57Z

Although I doubt it, you can certainly try int8 alpha/beta to see if it would help in this case. What you would have to do is modify the epilogue thread functor's ElementCompute type and then use that to construct your epilogue.
https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/thread/linear_combination.h#L68

mnicely · 2024-01-02T15:37:56Z

@jhss is your question resolved?

github-actions · 2024-02-01T16:05:36Z

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

jhss added ? - Needs Triage question Question labels Oct 25, 2023

mnicely removed the ? - Needs Triage label Nov 7, 2023

github-actions bot added the inactive-30d label Feb 1, 2024

mnicely closed this as completed Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Is there any INT8 GEMM with INT8 alpha and beta? #1157

[QST] Is there any INT8 GEMM with INT8 alpha and beta? #1157

jhss commented Oct 25, 2023 •

edited

Loading

hwu36 commented Oct 25, 2023

mnicely commented Dec 5, 2023

jhss commented Dec 8, 2023

thakkarV commented Dec 8, 2023 •

edited

Loading

jhss commented Dec 8, 2023

thakkarV commented Dec 8, 2023 •

edited

Loading

mnicely commented Jan 2, 2024

github-actions bot commented Feb 1, 2024

[QST] Is there any INT8 GEMM with INT8 alpha and beta? #1157

[QST] Is there any INT8 GEMM with INT8 alpha and beta? #1157

Comments

jhss commented Oct 25, 2023 • edited Loading

hwu36 commented Oct 25, 2023

mnicely commented Dec 5, 2023

jhss commented Dec 8, 2023

thakkarV commented Dec 8, 2023 • edited Loading

jhss commented Dec 8, 2023

thakkarV commented Dec 8, 2023 • edited Loading

mnicely commented Jan 2, 2024

github-actions bot commented Feb 1, 2024

jhss commented Oct 25, 2023 •

edited

Loading

thakkarV commented Dec 8, 2023 •

edited

Loading

thakkarV commented Dec 8, 2023 •

edited

Loading