Skip to content

Commit

Permalink
[cuda] Fix LLVM15 rsqrt perf regression (taichi-dev#7012)
Browse files Browse the repository at this point in the history
  • Loading branch information
turbo0628 authored and quadpixels committed May 13, 2023
1 parent d6f455f commit 39e4581
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion taichi/runtime/cuda/jit_cuda.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,14 @@ std::string JITSessionCUDA::compile_module_to_ptx(

if (kFTZDenorms) {
for (llvm::Function &fn : *module) {
fn.addFnAttr("nvptx-f32ftz", "true");
/* nvptx-f32ftz was deprecated.
*
* https://github.com/llvm/llvm-project/commit/a4451d88ee456304c26d552749aea6a7f5154bde#diff-6fda74ef428299644e9f49a2b0994c0d850a760b89828f655030a114060d075a
*/
fn.addFnAttr("denormal-fp-math-f32", "preserve-sign");

// Use unsafe fp math for sqrt.approx instead of sqrt.rn
fn.addFnAttr("unsafe-fp-math", "true");
}
}

Expand Down

0 comments on commit 39e4581

Please sign in to comment.