From 8933d9277877dc06b9032685751a413e6a32d373 Mon Sep 17 00:00:00 2001 From: Marshall Ward Date: Thu, 22 Feb 2024 13:11:05 -0500 Subject: [PATCH] Restore bit repro using FMA in selected runs This patch modifies select calculations of PR#1616 in order to preserve bit reproducibility when FMA optimization is enabled. We add parentheses and reorder terms in selected expressions which either direct or suppress FMAs, ensuring equivalence with the previous release. We address two specific equations in the PR. The first is associated with vertical friction coupling coupling coefficient. The diff is shown below. - a_cpl(i,K) = Kv_tot(i,K) / (h_shear*GV%H_to_Z + I_amax*Kv_tot(i,K)) + a_cpl(i,K) = Kv_tot(i,K) / (h_shear + I_amax*Kv_tot(i,K)) The denominator is of the form `a*b + c*d`. A compiler may favor an FMA of the form `a*b + (c*d)`. However, the modified equation is of form which favors the `a + c*d` FMA. Each form gives different results in the final bits. We resolve this by expliciting wrapping the RHS in parentheses: a_cpl(i,K) = Kv_tot(i,K) / (h_shear + (I_amax*Kv_tot(i,K))) Although this disables the FMA, it produces the same bit-equivalent answer as the original expression. ---- The second equation for TKE due to kappa shear is shown below. - tke_src = dz_Int(K) *(((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - & - (TKE(k) - q0)*TKE_decay(k)) - & + tke_src = h_Int(K) * (dz_h_Int(K)*((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - & + (TKE(k) - q0)*TKE_decay(k)) - & ... The outer equation was of the form `b + c` but is promoted to `a*b + c`, transforming it to an FMA. We resolve this by suppressing this FMA optimization: tke_src = h_Int(K) * ((dz_h_Int(K) * ((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K))) - & (TKE(k) - q0)*TKE_decay(k)) - & ... ---- The following two changes are intended to be the smallest modification which preserves answers for known testing on target compilers. It does not encompass all equation changes in this PR. If needed, we could extend these changes to similar modifications of PR#1616. We do not expect to support bit reproducibility when FMAs are enabled. But this is an ongoing conversation, and the rules around FMAs should be expected to change as we learn more and agree on rules of reproducibility. --- src/parameterizations/vertical/MOM_kappa_shear.F90 | 3 ++- src/parameterizations/vertical/MOM_vert_friction.F90 | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/parameterizations/vertical/MOM_kappa_shear.F90 b/src/parameterizations/vertical/MOM_kappa_shear.F90 index 81ab0661cc..46b81388c7 100644 --- a/src/parameterizations/vertical/MOM_kappa_shear.F90 +++ b/src/parameterizations/vertical/MOM_kappa_shear.F90 @@ -1618,7 +1618,8 @@ subroutine find_kappa_tke(N2, S2, kappa_in, Idz, h_Int, dz_Int, dz_h_Int, I_L2_b ! Solve for dQ(K)... aQ(k) = (0.5*(kappa(K)+kappa(K+1))+kappa0) * Idz(k) dQdz(k) = 0.5*(TKE(K) - TKE(K+1))*Idz(k) - tke_src = h_Int(K) * (dz_h_Int(K)*((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - & + ! NOTE: (dz_h_int*(K'*S2 - K*N2)) is bracketed to prevent FMA optimization + tke_src = h_Int(K) * ((dz_h_Int(K) * ((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K))) - & (TKE(k) - q0)*TKE_decay(k)) - & (aQ(k) * (TKE(K) - TKE(K+1)) - aQ(k-1) * (TKE(K-1) - TKE(K))) v1 = aQ(k-1) + dQdz(k-1)*dKdQ(K-1) diff --git a/src/parameterizations/vertical/MOM_vert_friction.F90 b/src/parameterizations/vertical/MOM_vert_friction.F90 index 8d41fcb63a..4a755e1ee8 100644 --- a/src/parameterizations/vertical/MOM_vert_friction.F90 +++ b/src/parameterizations/vertical/MOM_vert_friction.F90 @@ -2116,7 +2116,8 @@ subroutine find_coupling_coef(a_cpl, hvel, do_i, h_harm, bbl_thick, kv_bbl, z_i, endif ! Calculate the coupling coefficients from the viscosities. - a_cpl(i,K) = Kv_tot(i,K) / (h_shear + I_amax*Kv_tot(i,K)) + ! NOTE: (I_amax * Kv_tot) is bracketed to prevent FMA optimization. + a_cpl(i,K) = Kv_tot(i,K) / (h_shear + (I_amax * Kv_tot(i,K))) endif ; enddo ; enddo ! i & k loops elseif (abs(CS%Kv_extra_bbl) > 0.0) then ! There is a simple enhancement of the near-bottom viscosities, but no adjustment