[CUBLAS][FP8] Enable R.matmul + R.multiply offloading #16974

ibsidorenko · 2024-05-07T12:10:23Z

This commit enables offloading of the next pattern to cuBLAS:

  mm = R.linear(data, weights)
  scale = R.multiply(a_scale, w_scale)
  out = R.multiply(mm, scale)
  out = R.cast(out, dtype)

cc @csullivan @JosephTheOctonaut

This commit enables offloading of the next pattern to cuBLAS: mm = R.linear(data, weights) scale = R.multiply(a_scale, w_scale) out = R.multiply(mm, scale) out = R.cast(out, dtype)

github-actions bot requested a review from csullivan May 7, 2024 12:11

ibsidorenko force-pushed the cublas-gemm-multiply-offloading branch from 8737903 to 61328ae Compare May 7, 2024 12:42

[CUBLAS][FP8] Enable R.matmul + R.multiply offloading

01ee40f

This commit enables offloading of the next pattern to cuBLAS: mm = R.linear(data, weights) scale = R.multiply(a_scale, w_scale) out = R.multiply(mm, scale) out = R.cast(out, dtype)

ibsidorenko force-pushed the cublas-gemm-multiply-offloading branch from 61328ae to 01ee40f Compare May 7, 2024 12:46

masahi approved these changes May 8, 2024

View reviewed changes

masahi merged commit c0a47ed into apache:main May 8, 2024
19 checks passed

ibsidorenko deleted the cublas-gemm-multiply-offloading branch May 8, 2024 12:27

ysh329 mentioned this pull request Jul 20, 2024

[Release] v0.17.0 Release Candidate Notes #17178

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUBLAS][FP8] Enable R.matmul + R.multiply offloading #16974

[CUBLAS][FP8] Enable R.matmul + R.multiply offloading #16974

ibsidorenko commented May 7, 2024 •

edited

Loading

[CUBLAS][FP8] Enable R.matmul + R.multiply offloading #16974

[CUBLAS][FP8] Enable R.matmul + R.multiply offloading #16974

Conversation

ibsidorenko commented May 7, 2024 • edited Loading

ibsidorenko commented May 7, 2024 •

edited

Loading