Fuse MatMul + Mul/Div by constant #487

robertknight · 2024-12-26T19:05:42Z

A subgraph of the form Mul(MatMul(Mul(A, c), Mul(B, d)), e) where all of the Muls are optional and c, d and e are constants can be rewritten as MatMul(A, B, alpha = c * d * e) where alpha is the scaling already handled by the C = alpha * AB + beta * C result that GEMM already computes. Such scaling is common in transformers as part of attention operations (SDPA).

The initial implementation only handles two specific cases of this form which have been seen in real models. This should be generalized to all possible cases.

A subgraph of the form `Mul(MatMul(Mul(A, c), Mul(B, d)), e)` where all of the `Mul`s are optional and c, d and e are constants can be rewritten as `MatMul(A, B, alpha = c * d * e)` where `alpha` is the scaling already handled by the `C = alpha * AB + beta * C` result that GEMM already computes. Such scaling is common in transformers as part of attention operations (SDPA). The initial implementation only handles two specific cases of this form which have been seen in real models. This should be generalized to all possible cases.

robertknight merged commit 7f74d0c into main Dec 26, 2024
2 checks passed

robertknight mentioned this pull request Dec 26, 2024

Generalize scaled MatMul fusion #488

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuse MatMul + Mul/Div by constant #487

Fuse MatMul + Mul/Div by constant #487

robertknight commented Dec 26, 2024

Fuse MatMul + Mul/Div by constant #487

Fuse MatMul + Mul/Div by constant #487

Conversation

robertknight commented Dec 26, 2024