Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8-bit float support #1759

Open
bjarthur opened this issue Feb 7, 2023 · 2 comments
Open

8-bit float support #1759

bjarthur opened this issue Feb 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@bjarthur
Copy link
Contributor

bjarthur commented Feb 7, 2023

the H100 has native support for two flavors of 8-bit floats: e5m2 has five exponent bits and e4m3 has four. for more details, scroll down to "NVIDIA Hopper FP8 data format" here: https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/, or for even more details see https://arxiv.org/abs/2209.05433

what is the roadmap for supporting these types?

maybe we leverage https://github.com/milankl/Float8s.jl, which has defined Float8 (3 exp bits) and Float8_4 (four exp bits), and has efficient CPU methods. there is an issue there discussing adding a 5 exponent bit type.

happy to contribute. don't have an H100 yet, but probably next year i will.

cc @milankl

@bjarthur bjarthur added the enhancement New feature or request label Feb 7, 2023
@maleadt
Copy link
Member

maleadt commented Feb 7, 2023

No plans yet. Note that I'm guessing FP8 can only be used with tensor cores again, when using them in kernel code they'll promote to and from FP32, just like FP16 does.

@milankl
Copy link

milankl commented Feb 7, 2023

I'm happy to support this. While I do see space to support 3,4 and 5 exponent bits (4,3,2 mantissa bits) simultaneously we should stick to one definition for each. Using Nvidia's notation, e3m4 (exponent 3, mantissa 4) in Float8s.jl is IEEE compliant, but their e4m3 isn't (no inf, 1 nan), and their e5m2 is again a truncated version of float16 (which probably makes it easiest to implement, similar to BFloat16-Float32). Do we want that too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants