Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uvm initialization #61

Closed
eclare108213 opened this issue Dec 26, 2017 · 14 comments
Closed

uvm initialization #61

eclare108213 opened this issue Dec 26, 2017 · 14 comments

Comments

@eclare108213
Copy link
Contributor

eclare108213 commented Dec 26, 2017

the NAG compiler reports a missing initialization in the uvm halo, line 1399 of ice_grid.f90
(issue noted by DMI collaborators)

@dabail10
Copy link
Contributor

I'm confused by this one. We do a halo update of uvm right after we compute it. Also, the NAG compiler with -nan on was not catching this.

@eclare108213
Copy link
Contributor Author

Okay, let's not worry about it for now. They've been working on their version of the code, which we will try to merge into the main CICE sooner or later. I'll ask them to run the new code through their tests, and we'll see if it comes up again. I'll take this off of the project board but keep the issue open.

@eclare108213
Copy link
Contributor Author

@mhrib @TillRasmussen
This is one of the issues that I entered based on your original text describing the various tests that you'd run, which turned up some problems. If you can, please run those tests again with the current code and see which problems persist. See also #59.

@eclare108213
Copy link
Contributor Author

@dabail10 please test with your Nag compiler - if it comes back clean then we can close this issue.

@TillRasmussen
Copy link
Contributor

I have looked into the compile flags that was used. For IEEE, which were the one that displayed the issue, the flags were "-g -00 -ieee=stop -nan". Unfortunatly we dont have the compiler inhouse at DMI anymore.

@apcraig
Copy link
Contributor

apcraig commented Sep 5, 2019

I have been playing with the nag compiler this morning and have not been able to duplicate this error. There is an interesting line at 1584,
! uvm = c0 which seems to have been modified (commented out) in 2018 by Dave. Not sure why, but given we are not able to duplicate the error from DMI, I think we can close this.

There is one other thing though. When I add -nan to the compile, I get a separate error,


> > /home/tcraig/cice-consortium/cice.nagclean/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90: In function ‘dmi_omp_MP_domp_get_domain_rlu’:
> > /home/tcraig/cice-consortium/cice.nagclean/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90:110:58: error: ‘dmi_omp_MP_rdomp_iamTPI’ undeclared (first use in this function)
> >    subroutine domp_get_domain_rlu(lower,upper,d_lower,d_upper)
> >                                                           ^
> > /home/tcraig/cice-consortium/cice.nagclean/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90:110:58: note: each undeclared identifier is reported only once for each function it appears in
> > /home/tcraig/cice-consortium/cice.nagclean/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90:110:57: error: ‘dmi_omp_MP_rdomp_ntTPI’ undeclared (first use in this function)
> >    subroutine domp_get_domain_rlu(lower,upper,d_lower,d_upper)
> >                                                          ^
> > /home/tcraig/cice-consortium/cice.nagclean/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90: In function ‘dmi_omp_MP_domp_initMP__omp459’:
> > /home/tcraig/cice-consortium/cice.nagclean/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90:70:57: error: ‘dmi_omp_MP_rdomp_ntTPI’ undeclared (first use in this function)
> >      !$OMP PARALLEL DEFAULT(none)
> >                                                          ^
> > /home/tcraig/cice-consortium/cice.nagclean/cicecore/cicedynB/dynamics/ice_dyn_evp_1d.F90:70:58: error: ‘dmi_omp_MP_rdomp_iamTPI’ undeclared (first use in this function)
> >      !$OMP PARALLEL DEFAULT(none)
> >                                                           ^
> > /home/tcraig/cice-consortium/cice.nagclean/izumi_nag_smoke_gx3_8x1_debug_diag1_run1day_thread.chk02/Makefile:144: recipe for target 'ice_dyn_evp_1d.o' failed
> > gmake: *** [ice_dyn_evp_1d.o] Error 1
> > 
> 

I spent a few minutes trying to understand this. This only happens when threading and -nan is set. The error went away if I removed the threadprivate OMP definition at line 51 and then changed the OMP PARALLEL DEFAULT(none) statement on line 70 to OMP PARALLEL PRIVATE(domp_iam,domp_nt,rdomp_iam,rdomp_nt).

At this time, I think we have to assume this is a compiler bug. It doesn't make a lot of sense that the -nan compiler argument breaks some thread private implementation. On the other hand, it would be nice to be able to use the -nan option, and I do have some lingering concerns about whether maybe the thread implementation in the evp_1d is completely correct.

What I will do is add a comment to #279 and reopen it (it probably should not have been closed). This might be part of the evp kernel=2 validation still needed. Separately, I suggest we close this issue.

@eclare108213
Copy link
Contributor Author

We have not been able to reproduce the original problem, and so closing the issue.

@TillRasmussen
Copy link
Contributor

Now that I Iook at this again. Is there a problem wtth the openmp?
!$OMP PARALLEL DO PRIVATE(iblk,i,j,ilo,ihi,jlo,jhi,this_block)
What is ilo, jlo... Doing here?

@phil-blain
Copy link
Member

For reference, these are the lines Till is talking about (right?):

!$OMP PARALLEL DO PRIVATE(iblk,i,j,ilo,ihi,jlo,jhi,this_block)
do iblk = 1, nblocks
this_block = get_block(blocks_ice(iblk),iblk)
ilo = this_block%ilo
ihi = this_block%ihi
jlo = this_block%jlo
jhi = this_block%jhi
do j = jlo, jhi
do i = ilo, ihi
uvm(i,j,iblk) = min (hm(i,j, iblk), hm(i+1,j, iblk), &
hm(i,j+1,iblk), hm(i+1,j+1,iblk))
bm(i,j,iblk) = my_task + iblk/100.0_dbl_kind
enddo
enddo
enddo
!$OMP END PARALLEL DO

@phil-blain
Copy link
Member

above is the first loop, the second one is:

!$OMP PARALLEL DO PRIVATE(iblk,i,j,ilo,ihi,jlo,jhi,this_block)
do iblk = 1, nblocks
do j = 1, ny_block
do i = 1, nx_block
tmask(i,j,iblk) = .false.
umask(i,j,iblk) = .false.
if ( hm(i,j,iblk) > p5) tmask(i,j,iblk) = .true.
if (uvm(i,j,iblk) > p5) umask(i,j,iblk) = .true.
enddo
enddo

and the uninitialized value in uvm I'm talking about in #560 is triggerd at line 1673 (in the second loop).

@dabail10
Copy link
Contributor

The variables in the OMP loop have to be private so they are unique to each block/thread. Variables that are already scattered and have iblk as an array index are fine.

@eclare108213
Copy link
Contributor Author

@apcraig @TillRasmussen
Is this issue relevant to the current conversation about NaNs in halos or padding? See #572

@TillRasmussen
Copy link
Contributor

Yes I think so. In the first loop from @phil-blain comment on the 29th of January uvm is calculated from ilo to ihi and jlo to jhi and then copied to the halo zones. I think that this is fine except now that I look at it there is an integer divided by a double in the calculation of bm.
After the loop a call to halo update is done for uvm and bm.
If the ice_haloupdate has a bug then this is not copied to the halo in the blocks with padding.
In the second loop uvm and hm are used in the padding zone.
This loop could also be limited to ilo,jlo to ihi,jjhi. This may require calls to icehaloupdate for variables calculated within this loop

@apcraig
Copy link
Contributor

apcraig commented Mar 16, 2021

This is being addressed in #560 and follow on. Will close now.

@apcraig apcraig closed this as completed Mar 16, 2021
JFLemieux73 pushed a commit to JFLemieux73/CICE that referenced this issue Mar 2, 2022
* Change averaging for C grid

* Fix masking

* Use the A averager

* Fix halo updates

* - Add wall kmt_type
- Add blockep4, uniformp5, medblocke, blocke to ice_data_type
- Add ability to check if C/CD fields are on restart file and skip
- Add new tests to gridsys_suite
- Update documentation with new namelist

* Rename box2000 to boxwallp5

Co-authored-by: David Bailey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants