We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey Opening an issue instead of a PR for this one because it's super dirty work atm:
Basically on neon aarch64 (M1 Mac) we can add pure f16 intrinsics and get pretty sizeable speedup: Something like ~2x on most matmuls
LaurentMazare#4
However this requires hacking new intrinsics using arm! macro which seem to confuse the compiler (most likely because I didn't write them properly)
arm!
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hey Opening an issue instead of a PR for this one because it's super dirty work atm:
Basically on neon aarch64 (M1 Mac) we can add pure f16 intrinsics and get pretty sizeable speedup:
Something like ~2x on most matmuls
LaurentMazare#4
However this requires hacking new intrinsics using
arm!
macro which seem to confuse the compiler (most likely because I didn't write them properly)The text was updated successfully, but these errors were encountered: