Skip to content

Commit

Permalink
Restore bit repro using FMA in selected runs
Browse files Browse the repository at this point in the history
This patch modifies select calculations of PR#1616 in order to preserve bit
reproducibility when FMA optimization is enabled.  We add parentheses and
reorder terms in selected expressions which either direct or suppress FMAs,
ensuring equivalence with the previous release.

We address two specific equations in the PR.  The first is associated
with vertical friction coupling coupling coefficient.  The diff is shown
below.

-  a_cpl(i,K) = Kv_tot(i,K) / (h_shear*GV%H_to_Z + I_amax*Kv_tot(i,K))
+  a_cpl(i,K) = Kv_tot(i,K) / (h_shear + I_amax*Kv_tot(i,K))

The denominator is of the form `a*b + c*d`.  A compiler may favor an FMA
of the form `a*b + (c*d)`.  However, the modified equation is of form
which favors the `a + c*d` FMA.  Each form gives different results in
the final bits.

We resolve this by expliciting wrapping the RHS in parentheses:

  a_cpl(i,K) = Kv_tot(i,K) / (h_shear + (I_amax*Kv_tot(i,K)))

Although this disables the FMA, it produces the same bit-equivalent
answer as the original expression.

----

The second equation for TKE due to kappa shear is shown below.

-  tke_src = dz_Int(K) *(((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - &
-                              (TKE(k) - q0)*TKE_decay(k)) - &
+  tke_src = h_Int(K) * (dz_h_Int(K)*((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - &
+                              (TKE(k) - q0)*TKE_decay(k)) - &
    ...

The outer equation was of the form `b + c` but is promoted to `a*b + c`,
transforming it to an FMA.

We resolve this by suppressing this FMA optimization:

  tke_src = h_Int(K) * ((dz_h_Int(K) * ((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K))) - &
                         (TKE(k) - q0)*TKE_decay(k)) - &
        ...

----

The following two changes are intended to be the smallest modification
which preserves answers for known testing on target compilers.  It does
not encompass all equation changes in this PR.  If needed, we could
extend these changes to similar modifications of PR#1616.

We do not expect to support bit reproducibility when FMAs are enabled.
But this is an ongoing conversation, and the rules around FMAs should be
expected to change as we learn more and agree on rules of
reproducibility.
  • Loading branch information
marshallward committed Feb 22, 2024
1 parent 40134ed commit 8933d92
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
3 changes: 2 additions & 1 deletion src/parameterizations/vertical/MOM_kappa_shear.F90
Original file line number Diff line number Diff line change
Expand Up @@ -1618,7 +1618,8 @@ subroutine find_kappa_tke(N2, S2, kappa_in, Idz, h_Int, dz_Int, dz_h_Int, I_L2_b
! Solve for dQ(K)...
aQ(k) = (0.5*(kappa(K)+kappa(K+1))+kappa0) * Idz(k)
dQdz(k) = 0.5*(TKE(K) - TKE(K+1))*Idz(k)
tke_src = h_Int(K) * (dz_h_Int(K)*((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K)) - &
! NOTE: (dz_h_int*(K'*S2 - K*N2)) is bracketed to prevent FMA optimization
tke_src = h_Int(K) * ((dz_h_Int(K) * ((kappa(K) + kappa0)*S2(K) - kappa(k)*N2(K))) - &
(TKE(k) - q0)*TKE_decay(k)) - &
(aQ(k) * (TKE(K) - TKE(K+1)) - aQ(k-1) * (TKE(K-1) - TKE(K)))
v1 = aQ(k-1) + dQdz(k-1)*dKdQ(K-1)
Expand Down
3 changes: 2 additions & 1 deletion src/parameterizations/vertical/MOM_vert_friction.F90
Original file line number Diff line number Diff line change
Expand Up @@ -2116,7 +2116,8 @@ subroutine find_coupling_coef(a_cpl, hvel, do_i, h_harm, bbl_thick, kv_bbl, z_i,
endif

! Calculate the coupling coefficients from the viscosities.
a_cpl(i,K) = Kv_tot(i,K) / (h_shear + I_amax*Kv_tot(i,K))
! NOTE: (I_amax * Kv_tot) is bracketed to prevent FMA optimization.
a_cpl(i,K) = Kv_tot(i,K) / (h_shear + (I_amax * Kv_tot(i,K)))
endif ; enddo ; enddo ! i & k loops
elseif (abs(CS%Kv_extra_bbl) > 0.0) then
! There is a simple enhancement of the near-bottom viscosities, but no adjustment
Expand Down

0 comments on commit 8933d92

Please sign in to comment.