Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(CLM-only) LAND and PFTS 1D vector history averaging blows up... #77

Closed
ekluzek opened this issue Dec 16, 2017 · 12 comments
Closed

(CLM-only) LAND and PFTS 1D vector history averaging blows up... #77

ekluzek opened this issue Dec 16, 2017 · 12 comments
Assignees
Labels
bug something is working incorrectly

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2010-03-31 17:03:18 -0600
Bugzilla Id: 1139
Bugzilla CC: dlawren, jedwards, mvertens, oleson, thoar,

Keith found that doing 1D vector history averaging for LAND and PFTS blow up...

so both this...

hist_dov2xy = .true.,.false.
hist_fincl2 = 'TG'
hist_mfilt = 1
hist_nhtfrq = 0,-24
hist_type1d_pertape = ' ','LAND'

and this...

hist_dov2xy = .true.,.false.
hist_fincl2 = 'TG'
hist_mfilt = 1
hist_nhtfrq = 0,-24
hist_type1d_pertape = ' ','PFTS'

FAIL, while setting that second field to ' ', 'GRID', or 'COLS' works.

Errors look like this for 'LAND' averaging...

 127:(seq_frac_check) [atm init] sum ncnt/maxerr =        0   0.00000000000000000
 116: WARNING:  Total sensible heat does not equal sum of scaled heat fluxes for urban columns  nstep =  1  indexl=  25509  eflx_err=  -33.807991
3847853817
 116: clm model is stopping - error is greater than .01 W/m**2
 116: eflx_scale    =  28.7937518604499623
 116: eflx_sh_grnd_scale:  0.000000000000000000E+00 8.07671583705406526 5.25253898785417661 5.28264211229122260 10.1818549232504978
 116: eflx          =  62.6017432452353404
 116: ENDRUN: called without a message string
  51: Error: Forcing height is below canopy height for pft index  112310
  51: ENDRUN: called without a message string
 116:
 116:  Traceback:
 119: WARNING:  Total sensible heat does not equal sum of scaled heat fluxes for urban columns  nstep =  1  indexl=  26187  eflx_err=  -2.9487524
4007795345
 119: clm model is stopping - error is greater than .01 W/m**2
 119: eflx_scale    =  124.686394135918988
 119: eflx_sh_grnd_scale:  9.52250025157385593 12.5780280802565674 8.75716782178573538 29.2372776613224765 64.5914203209803475
 119: eflx          =  127.635146575996941
 119: ENDRUN: called without a message string
 119:
 119:  Traceback:
  51:
  51:  Traceback:
  49: energy balance in canopy  106865 , err= 0.171669314575057796
  49: energy balance in canopy  106865 , err= 0.171669314575057796
  33:Communication statistics of task 33 is associated with task key: 2071308156_33
  79: WARNING:  water balance error  nstep =  1  indexc=  28915  errh2o=  48.7947716632332060
  79: BalanceCheck: soil balance error nstep =         1 point = 28915 imbalance =************ W/m2
  47: WARNING:  water balance error  nstep =  1  indexc=  17210  errh2o=  -0.211449793149122363E-03

And for 'PFTS' averaging...

127:(seq_frac_check) [atm init] sum ncnt/maxerr =        0   0.00000000000000000
101: urban incident atmospheric longwave radiation balance error -52852.6710421718963
101: clm model is stopping
101: ENDRUN: called without a message string
101:
@ekluzek ekluzek added this to the clm5 milestone Dec 16, 2017
@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-04-25 14:09:10 -0600

We are opening this up again as we want to be able to do this with the latest model. We are currently outputting in vector format for all PFT's and it's costly both in terms of postprocessing and disc space for something that we don't need. I thought there might be a chance that the fix from Jim for 2184 might have fixed this issue, but Keith showed that wasn't the case.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Jim Edwards < jedwards > - 2017-04-25 14:17:40 -0600

This report does not provide adequate information to reproduce the problem.
Please provide a clm tag and testcase.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-04-25 14:18:51 -0600

(In reply to Jim Edwards from comment #2)

This report does not provide adequate information to reproduce the problem.
Please provide a clm tag and testcase.

Yes, I am setting that test case up right now.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-04-26 00:11:32 -0600

I setup a test case to demonstrate the problem, and Intel immediately showed a subscript overflow. So I figured I could probably figure the problem out fairly quickly. I boiled it down to the size of a single point problem, and have it running in DDT. I see the problems and also see that the code was never constructed in such a way that these cases would work. So it needs some work to enable them, but then should work as advertised.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-04-26 11:31:49 -0600

OK, so averaging up to landunit or column just wasn't implemented, even though the code claimed it was. Looking at the code the code allowed variables to be output on the native format or mapped up to gridcell level. And there's some hardwiring going on to assuming any mapping done is at the gridcell level. However, there is also some generality in place that allow if mapping is done for the mapping to be done to the landunit level or the column level by doing what is done for gridcell.

So in the example below I'm showing what needs to be done to coopt the gridcell averaging part to also allow mapping to landunit or column. In the end a more general name should be used for this. Also I'm just showing what needs to be done for the 1d arrays in hist_update_hbuf_field_1d, the same sort of thing needs to happen for 2D arrays in hist_update_hbuf_field_2d. But the code changes look almost identical other than the use of 2D arrays rather than 1D.

[yongi:clm/src/main] erik% svn diff histFileMod.F90
Index: histFileMod.F90
===================================================================
--- histFileMod.F90	(revision 84583)
+++ histFileMod.F90	(working copy)
@@ -1012,6 +1012,7 @@
     integer  :: hpindex                 ! history pointer index
     integer  :: k                       ! gridcell, landunit, column or patch index
     integer  :: beg1d,end1d             ! beginning and ending indices
+    integer  :: beg1d_out,end1d_out     ! beginning and ending indices on output grid
     logical  :: check_active            ! true => check 'active' flag of each point (this refers to a point being active, NOT a history field being active)
     logical  :: valid                   ! true => history operation is valid
     logical  :: map2gcell               ! true => map clm pointer field to gridcell
@@ -1025,7 +1026,7 @@
     integer , pointer :: nacs(:,:)      ! accumulation counter
     real(r8), pointer :: field(:)       ! clm 1d pointer field
     logical , pointer :: active(:)      ! flag saying whether each point is active (used for type1d = landunit/column/pft) (this refers to a point being active, NOT a history field being active)
-    real(r8) :: field_gcell(bounds%begg:bounds%endg)  ! gricell level field (used if mapping to gridcell is done)
+    real(r8), allocatable :: field_gcell(:)  ! gricell level field (used if mapping to gridcell is done)
     integer j
     character(len=*),parameter :: subname = 'hist_update_hbuf_field_1d'
     integer k_offset                    ! offset for mapping sliced subarray pointers when outputting variables in PFT/col vector form
@@ -1038,6 +1039,8 @@
     hbuf           => tape(t)%hlist(f)%hbuf
     beg1d          =  tape(t)%hlist(f)%field%beg1d
     end1d          =  tape(t)%hlist(f)%field%end1d
+    beg1d_out      =  tape(t)%hlist(f)%field%beg1d_out
+    end1d_out      =  tape(t)%hlist(f)%field%end1d_out
     type1d         =  tape(t)%hlist(f)%field%type1d
     type1d_out     =  tape(t)%hlist(f)%field%type1d_out
     p2c_scale_type =  tape(t)%hlist(f)%field%p2c_scale_type
@@ -1075,13 +1078,48 @@
           map2gcell = .true.
        end if
     end if
+    if (type1d_out == namel ) then
+       if (type1d == namep) then
+          ! In this and the following calls, we do NOT explicitly subset field using
+          ! bounds (e.g., we do NOT do field(bounds%begp:bounds%endp). This is because,
+          ! for some fields, the lower bound has been reset to 1 due to taking a pointer
+          ! to an array slice. Thus, this code will NOT work properly if done within a
+          ! threaded region! (See also bug 1786)
+          call p2l(bounds, &
+               field, &
+               field_gcell(beg1d_out:end1d_out), &
+               p2c_scale_type, c2l_scale_type, l2g_scale_type)
+          map2gcell = .true.
+       else if (type1d == namec) then
+          call c2l(bounds, &
+               field, &
+               field_gcell(beg1d_out:end1d_out), &
+               c2l_scale_type, l2g_scale_type)
+          map2gcell = .true.
+       end if
+    end if
+    if (type1d_out == namec ) then
+       if (type1d == namep) then
+          ! In this and the following calls, we do NOT explicitly subset field using
+          ! bounds (e.g., we do NOT do field(bounds%begp:bounds%endp). This is because,
+          ! for some fields, the lower bound has been reset to 1 due to taking a pointer
+          ! to an array slice. Thus, this code will NOT work properly if done within a
+          ! threaded region! (See also bug 1786)
+          call p2c(bounds, &
+               field, &
+               field_gcell(beg1d_out:end1d_out), &
+               p2c_scale_type, c2l_scale_type, l2g_scale_type)
+          map2gcell = .true.
+       end if
+    end if
 
     if (map2gcell) then  ! Map to gridcell
+       allocate( field_gcell(beg1d_out:end1d_out) )
 
        ! note that in this case beg1d = begg and end1d=endg
        select case (avgflag)
        case ('I') ! Instantaneous
-          do k = bounds%begg,bounds%endg
+          do k = beg1d_out, end1d_out
              if (field_gcell(k) /= spval) then
                 hbuf(k,1) = field_gcell(k)
              else
@@ -1090,7 +1128,7 @@
              nacs(k,1) = 1
           end do
        case ('A', 'SUM') ! Time average / sum
-          do k = bounds%begg,bounds%endg
+          do k = beg1d_out, end1d_out
              if (field_gcell(k) /= spval) then
                 if (nacs(k,1) == 0) hbuf(k,1) = 0._r8
                 hbuf(k,1) = hbuf(k,1) + field_gcell(k)
@@ -1100,7 +1138,7 @@
              end if
           end do
        case ('X') ! Maximum over time
-          do k = bounds%begg,bounds%endg
+          do k = beg1d_out, end1d_out
              if (field_gcell(k) /= spval) then
                 if (nacs(k,1) == 0) hbuf(k,1) = -1.e50_r8
                 hbuf(k,1) = max( hbuf(k,1), field_gcell(k) )
@@ -1110,7 +1148,7 @@
              nacs(k,1) = 1
           end do
        case ('M') ! Minimum over time
-          do k = bounds%begg,bounds%endg
+          do k = beg1d_out, end1d_out
              if (field_gcell(k) /= spval) then
                 if (nacs(k,1) == 0) hbuf(k,1) = +1.e50_r8
                 hbuf(k,1) = min( hbuf(k,1), field_gcell(k) )
@@ -1123,6 +1161,7 @@
           write(iulog,*) trim(subname),' ERROR: invalid time averaging flag ', avgflag
           call endrun(msg=errMsg(sourcefile, __LINE__))
        end select
+       deallocate( field_gcell )
 
     else  ! Do not map to gridcell

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-04-26 14:26:00 -0600

There's also a bug in a dimension for p2l, which doesn't seem to be used anywhere else.

Index: subgridAveMod.F90
===================================================================
--- subgridAveMod.F90	(revision 84625)
+++ subgridAveMod.F90	(working copy)
@@ -301,7 +301,7 @@
     integer  :: p,c,l,index                     ! indices
     logical  :: found                              ! temporary for error check
     real(r8) :: sumwt(bounds%begl:bounds%endl)     ! sum of weights
-    real(r8) :: scale_p2c(bounds%begc:bounds%endc) ! scale factor for patch->column mapping
+    real(r8) :: scale_p2c(bounds%begp:bounds%endp) ! scale factor for patch->column mapping
     real(r8) :: scale_c2l(bounds%begc:bounds%endc) ! scale factor for column->landunit mapping
     !------------------------------------------------------------------------

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-04-28 11:37:47 -0600

I have this working for a single point case. There are additional changes I need to do to get a global case to work. But, I can see that those can be resolved so I'm now confident I can finish this out.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-05-20 16:41:13 -0600

OK, so the remaining issue is that it has trouble when you say you want to output averaging up to a level lower than the native grid. For example you ask for COLS on a field that is native on the landunit. As it is this dies with an error, because the subscripts aren't dimensioned properly.

We decided that this is OK, we just want it to die with an error if you ask for such a thing. To do this cleanly, I think we just need to add an endrun call in the new subroutine I added: is_mapping_upto_subgrid. The call to it in htape_addfld, should have an additional optional field added to send a nice message for this.

We also thought we should change PFTS to PATCH. And I think we should change ' ' to 'NATIVE'.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 16, 2017

Erik Kluzek < erik > - 2017-11-13 13:08:39 -0700

This was partially fixed in clm4_5_16_r243

@ekluzek ekluzek self-assigned this Jul 19, 2018
@ekluzek ekluzek modified the milestones: clm5, cmip6 Jul 19, 2018
@ekluzek
Copy link
Collaborator Author

ekluzek commented Jul 19, 2018

OK, this issue caused me trouble. I asked for some variables defined at gridcell level and asked for them at landunit level, and the model chokes in a horrible way on trying to write them. In debug mode it shows subscript overflow. But, it's not obvious which variable is the problem. This was for CMIP6 output with 8 different tapes and it took a while for me to identify that this was the actual underlying problem. So having the model die with a proper error will save tons of time to people that use 1D vector data.

@ekluzek ekluzek added priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations and removed severity: critical labels Jul 19, 2018
@billsacks billsacks added the bug something is working incorrectly label Aug 13, 2018
@ekluzek ekluzek modified the milestones: cmip6, cesm2.1.1 Nov 5, 2018
@billsacks billsacks removed the priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations label Jul 15, 2019
@billsacks billsacks removed this from the cesm2.1.1 milestone Jul 15, 2019
@billsacks billsacks added the priority: low Background task that doesn't need to be done right away. label Jul 15, 2019
@billsacks
Copy link
Member

All that's left is better error checking.

samsrabin pushed a commit to samsrabin/CTSM that referenced this issue Apr 19, 2024
### Description of changes
Changeing the github action to do annotated tags

### Specific notes

All changes in github actions, testing done on merge.
@ekluzek ekluzek removed the priority: low Background task that doesn't need to be done right away. label Jun 4, 2024
@ekluzek
Copy link
Collaborator Author

ekluzek commented Jun 4, 2024

Closing as the main problem was solved, only the error checking wasn't and that's now encapsulated in #2580

@ekluzek ekluzek closed this as completed Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly
Projects
None yet
Development

No branches or pull requests

2 participants