-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGFPE in ch4Mod with DEBUG=TRUE
#1729
Comments
I was getting a similar error message with DEBUG=TRUE recently, which seemed to trigger on the new year for some reason, (using tag dev_074). I was no longer trying to run in DEBUG mode, and turning this off resolved my issue so I didn't think much of it. Happy to resurrect the case if it's helpful. |
@wwieder can you give us the specs of that case? Where were you running, and what compiler versions and all that. Just pointing to the case directory would be sufficient. And didn't we have a different issue for the new year that's resolved now? |
Yes, this does look essentially the same as the issue in #1013 - the code only works if the compiler is using short-circuit logic. It looks like the conditional should be refactored to: if (.not. lake .and. usephfact) then
if (pH(c) > pHmin .and.pH(c) < pHmax) then
! body of conditional
end if
end if The problem is that pH is NaN if usephfact is false (which appears to be the default). I think we should fix this since it looks like an easy fix and it could appear with other compilers as well: The Fortran standard makes no guarantees about short circuiting. |
Bigger question: we don't appear to have any tests of usephfact true. Do we actually still want to support that option? |
@ckoven can you comment on @billsacks question above -- is usephfact a useful option to support? It looks like one problem with it is that we don't have soil PH data, but we will be getting soil PH with ctsm5.2, so perhaps this will be a useful option at that point? And if so maybe it should stay in and we'll activate it in ctsm5.2? |
@ekluzek Looking closer I'm not sure this will be helpful or similar. My case directory is here not sure which of the many log files may helpful here, but this is not a FATES case, and from what I recall the case seemed to fail with negative CH4 concentrations with DEBUG=TRUE. This was NOT on new years, however, but Dec 22... /glade/scratch/wwieder/archive/ctsm51d074_2deg_GSWP3V1_hist_cnSlope0/logs/cesm.log.3893142.chadmin1.ib0.cheyenne.ucar.edu.220422-205201 |
@billsacks I just tried out your suggestion at Erik's encouragement on my machine with a simple 1x1 brazil ctsm-fates one year case using the nuopc driver and it ran to completion. |
@ekluzek I am not familiar with the CH4 pH sensitivity code, I think that it came in via Lei Meng's paper https://bg.copernicus.org/articles/9/2793/2012/ ? |
Yes. It came in from Lei Meng's paper. I don't think we should remove
this unless it is clearly identified as a problem.
…On Fri, Apr 29, 2022 at 3:28 PM Charlie Koven ***@***.***> wrote:
@ekluzek <https://github.com/ekluzek> I am not familiar with the CH4 pH
sensitivity code, I think that it came in via Lei Meng's paper
https://bg.copernicus.org/articles/9/2793/2012/ ?
—
Reply to this email directly, view it on GitHub
<#1729 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVFENFEW27FL5Q64YZLVHRIBRANCNFSM5UWSC6DA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@dlawrenncar I think it's fine to leave it in, especially because it will likely be useful with ctsm5.2. But, we should also use this example as something that illustrates the cost of non-functioning complexity. usephfact makes the code more complex and harder to read and understand, but until soil PH is in, won't be functional. But, in this case it's mere existence in the code caused an issue. We don't always have an example of the cost of maintaining complexity, but we can use this as an example of that. Maintaining complexity that's required is the cost of doing business, but maintaining complexity of something that's not functional and may only be used in the "future" isn't always good to do. I think sometimes we side too much on the side of leaving in complexity that might be useful, rather than making the code easier to maintain. An agile software development adage is to only implement what you need right now -- only add the stuff that you need in the future when it's actually needed. I just want to leave this as food for thought. This kind of decision on "should we keep this" is going to be constantly coming up in the code. |
Brief summary of bug
While rebuilding dependencies on my workstation I ran into a problem similar to an old issue: #1013. The code builds successfully, but crashes almost immediately upon run time with a floating point error in ch4Mod when
DEBUG=TRUE
.General bug information
CTSM version you are using: ctsm5.1.dev091
Does this bug cause significantly incorrect results in the model's science? No
Configurations affected: Debug mode with gnu compiler on local ubuntu workstation with both mct and nuopc drivers
Details of bug
The issue appears to be similar to #1013. It occurs in the code block just prior to the one reported in the referenced issue:
CTSM/src/biogeochem/ch4Mod.F90
Line 2635 in 82a63cc
Talking with @ekluzek, we surmise that this might be addressable with a newer gnu compiler version.
Important details of your setup / configuration so we can reproduce the bug
Machine: lobata
OS: Pop!_OS 20.04 (Ubuntu based distro)
Compiler: gfortran 9.4.0
Dependencies: openmpi 4.0.3, netcdf-c 4.8.1, netcdf-fortran 4.5.4, hdf5 1.12.1, esmf 8.2.0
Dependencies have been built with parallel enabling options.
Important output or errors that show the problem
The text was updated successfully, but these errors were encountered: