You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Aug 7, 2024. It is now read-only.
Sorry for late reply! We don't have a plan to support AMP in the near future because the eng cost to support delayed scaling in an AMP-like API would be too high, because delayed scaling is stateful. For now we would like to have a consistent API between dynamic and delayed scaling for easy ablation studies. If the community converges on dynamic scaling in the future (which is stateless), we could adjust.
configurability
Float8Linear
to individual modulesperformance
torch._scaled_mm
support for per-tensor scaled float8 gemmtorch._scaled_mm
support for rowwise scaled float8 gemmdistributed
other
use_fast_accum
(float8 accumulation of gemm) option to UX - Allow for modifying the scaled_mm compute #144The text was updated successfully, but these errors were encountered: