upcoming feature tracker #187

vkuzo · 2024-01-16T17:23:04Z

configurability

[done] support delayed vs dynamic scaling type, configurable separately for activations/weights/gradients
[planned] support rowwise/blockwise scaling granularity, configurable separately for each gemm
[planned] configure settings for each of the three gemms in linear fwd/bwd separately
[planned] support more fine grained configuration of how to apply Float8Linear to individual modules
[planned] inference support (see [RFC] Float8 Inference #314)

performance

[done] torch._scaled_mm support for per-tensor scaled float8 gemm
[in progress] torch._scaled_mm support for rowwise scaled float8 gemm
- [done] eager mode support
- [planned] torch.compile support, backed by triton/cutlass
[in progress] optimize torch.compile performance for float8 scaling/casting kernels

distributed

[done] integrate with TP/SP via DTensor APIs
[done] integrate with FSDP1 with 16-bit all-gather
[done] integrate with FSDP2 with 16-bit or 8-bit all-gather with dynamic scaling for weights
- performance optimizations are ongoing
[in progress] integrate with FSDP2 with 16-bit or 8-bit all-gather with delayed scaling for weights
- POC is done, performance optimizations are ongoing
[planned] verify integration with PP

other

weight gradient accumulation in float32
add use_fast_accum (float8 accumulation of gemm) option to UX - Allow for modifying the scaled_mm compute #144
improve saturated casting performance

The text was updated successfully, but these errors were encountered:

bhack · 2024-03-07T21:40:23Z

Is there a plan to support AMP?

vkuzo · 2024-05-02T17:22:38Z

Is there a plan to support AMP?

Sorry for late reply! We don't have a plan to support AMP in the near future because the eng cost to support delayed scaling in an AMP-like API would be too high, because delayed scaling is stateful. For now we would like to have a consistent API between dynamic and delayed scaling for easy ablation studies. If the community converges on dynamic scaling in the future (which is stateless), we could adjust.

vkuzo · 2024-07-30T15:00:37Z

moved to pytorch/ao#556

flozi00 mentioned this issue Feb 1, 2024

[WIP] FP8 support. huggingface/text-generation-inference#1484

Closed

5 tasks

drisspg added the Tracking label Feb 6, 2024

vkuzo mentioned this issue May 31, 2024

[QST] Dynamic Scaling #274

Closed

vkuzo mentioned this issue Jul 30, 2024

float8 upcoming feature tracker pytorch/ao#556

Open

vkuzo closed this as completed Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upcoming feature tracker #187

upcoming feature tracker #187

vkuzo commented Jan 16, 2024 •

edited

Loading

bhack commented Mar 7, 2024

vkuzo commented May 2, 2024

vkuzo commented Jul 30, 2024

upcoming feature tracker #187

upcoming feature tracker #187

Comments

vkuzo commented Jan 16, 2024 • edited Loading

configurability

performance

distributed

other

bhack commented Mar 7, 2024

vkuzo commented May 2, 2024

vkuzo commented Jul 30, 2024

vkuzo commented Jan 16, 2024 •

edited

Loading