-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float16 design doc #5313
Float16 design doc #5313
Conversation
doc/design/float16.md
Outdated
float16 float_to_half_rn(float f); // convert to half precision in round-to-nearest-even mode | ||
float half_to_float(float16 h); | ||
``` | ||
which provides one-to-one conversion between float32 and float16. These twos functions will do different conversion routines based on the current hardware. CUDA/ARM instrinsics will be used when the corresonding hardware is available. When the hardware falls back to non-ARM cpu, software emulation will be performed to do the conversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the hardware falls back to non-ARM cpu -> If the hardware or compiler level does not support float32 to float16 conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
doc/design/float16.md
Outdated
|
||
A brief survey of float16 support on different hardwares can be found [here](https://github.com/PaddlePaddle/Paddle/issues/4853). A brief survey of existing float16 implementations can be found [here](https://github.com/Xreki/Xreki.github.io/blob/master/multi_data_types_in_dl_framework/ppt/float16_and_quantized_type.md). | ||
|
||
There are various natively supported float16 implementations on different hardwares/linear algebra libraries including half on cuda, float16_t on ARM processor, and Eigen::half on Eigen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a more detailed description is needed here. Need to describe the support of float16 from the three levels of hardware, compiler, and library respectively.
- For nvcc compiler, after CUDA 7.5, the __half type is supported.
- For NVIDIA GPU, maybe after sm_6.0(or sm_5.3?)
- For gcc/clang? In the mobile, usually, clang compiler.
- For libraries. Currently, what libraries support float16 calculations?
This is important because of if we upgrade the environment, we need to know what the minimum environment to support is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! A detailed description has been added. Thanks!
No description provided.