-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ocean/ice post NCL programs with Python program #1736
Replace ocean/ice post NCL programs with Python program #1736
Conversation
Ocnpost and icepost NCL code has been ported to Python so that the ocean and ice post can be supported on WCOSS2.
The script that calls the ocnpost program has been modifed so that the ocnpost python program can be executed instead of the obsolete ocnpost and icepost NCL programs.
Ocean and ice post parm files that the ocnpost Python program requires have been added.
Unneeded lines have been removed from the run_regrid script and the script has been polished.
Code that has been commented out has been removed from ocnpost.
The ocnpost Python code has been updated to account for both daily and hourly raw CICE NetCDF input files.
The ocnpost Python program has been updated so that all 40 levels from MOM6 can be processed.
The ocnpost parm files have been updated for MOM6 and CICE6 to include all needed ocean and ice variables.
The input netcdf filename that is needed for ocnpost has been modified so that it agrees with the names of the netcdf files produced from MOM6 and CICE6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shellcheck found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.
@NeilBarton-NOAA @jiandewang Hold off on reviewing until all the checks passed and Rahul or I have given the go-ahead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See first cut at the review comments.
try: | ||
import esmfwghtinterp_f90subroutine | ||
except ImportError or ModuleNotFoundError: | ||
print('Weighted interpolation fortran subroutine has not been compiled. Compiling...') | ||
np.f2py.compile(fsource1, modulename='esmfwghtinterp_f90subroutine', verbose=False) | ||
import esmfwghtinterp_f90subroutine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One cannot be compiling everytime this script is run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. As of now, a compilation is only performed if the .so file does not exist in any Python path. However, the compile step can be removed completely from the code if that is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this can't just be this python function:
def bilin(src, S, row, col, frac_b, dst_len):
dst = [0.0] * dst_len # Initialize destination field to 0.0
# Apply weights
for i in range(len(row)):
dst[row[i]] += S[i] * src[col[i]]
# Apply fraction correction
for i in range(dst_len):
if frac_b[i] != 0.0:
dst[i] /= frac_b[i]
return dst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried something very similar to this. The program ran less efficiently when this was a python function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are efficient bilinear interpolation methods in numpy
. Did you try any of those?
If this is the meat and potatoes of this program, IMO, it makes sense for this to be a Fortran program. The rest is just IO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant something from numpy
or scipy
e.g. https://numpy.org/devdocs/reference/generated/numpy.interp.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding, np.interp
does not have an option for a weighted bilinear interpolation. This Fortran subroutine, which uses the ESMF regridding weights in a bilinear or conservative interpolation is referenced here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally avoid numpy
and/or scipy
. I have also tried these methods and they are slow and do not take into account issues such as landmask and fractional grids.
To reiterate, once you have the ESMF weights, all you need to do is add a short subroutine to read in the source grid variable and remap it to the destination grid. It should be short (on the order of 10 lines).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HenryWinterbottom-NOAA Thank you for your feedback. Just to confirm, is this the short subroutine that you are referring to?
! Initialize destination field to 0.0
do i=1,dst_len
dst(i)=0.0
enddo
! Apply weights
do i=1, n_s
dst(row(i))=dst(row(i))+S(i)*src(col(i))
enddo
do i=1, dst_len
if (frac_b(i) .ne. 0.0) then
dst(i)=dst(i)/frac_b(i)
endif
enddo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EricSinsky-NOAA Yes.
import esmfwghtinterp_f90subroutine | ||
|
||
def esmfmanualregrid(src,S,row,col,frac_b,n_s,varmeth,slatd,slond,dlatd,dlond): | ||
import esmfwghtinterp_f90subroutine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all imports must be at the top
|
||
def main(): | ||
|
||
print('Main program has started') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all new python programs must use wxflow
to setup logging, execution etc.
if len(sys.argv) == 6: | ||
infiles0=sys.argv[1] | ||
outfiles0=sys.argv[2] | ||
nemsrc=sys.argv[3] | ||
parmfile=sys.argv[4] | ||
dstgrds=[sys.argv[5]] | ||
else: | ||
print('Incorrect number of arguments.') | ||
exit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be handled with argparse
Use raise
for exception handling and exit with an error message that is clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I wasn't entirely sure if argparse was supported on WCOSS2. I will use argparse instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argparse is core python
if infiles0.endswith('.nc') and outfiles0.endswith('.nc'): | ||
print('Input and output files are netcdf.') | ||
infiles1=[sys.argv[1]] | ||
outfiles1=[sys.argv[2]] | ||
dimfiles=1 | ||
else: | ||
with open(infiles0) as f1: | ||
infiles1 = f1.read().splitlines() | ||
with open(outfiles0) as f2: | ||
outfiles1 = f2.read().splitlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide comments or describe what the code/section is doing.
Break code into methods for readability and testing.
Add tests.
ush/ocnpost.py
Outdated
#the following NCO command will be issued at the end | ||
#to rename the variable mld to ePBL if the variable mld is found | ||
|
||
ncocmd=['ncrename -O -v MLD_003,mld'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used? Remove unused codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I will remove this since it will not be needed.
if 'SST' in ncvarlist: | ||
model='MOM' | ||
elif 'Tsfc_d' in ncvarlist or 'Tsfc_h' in ncvarlist: | ||
model='CICE' | ||
if 'Tsfc_d' in ncvarlist: | ||
samplevar='Tsfc_d' | ||
if 'Tsfc_h' in ncvarlist: | ||
samplevar='Tsfc_h' | ||
else: | ||
print('Product not supported. Exiting program...') | ||
exit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be turned into methods for testing and extensibility
if model == 'MOM': | ||
dimxh=ocnf.dimensions['xh'].size | ||
dimyh=ocnf.dimensions['yh'].size | ||
if model == 'CICE': | ||
dimxh=ocnf.dimensions['ni'].size | ||
dimyh=ocnf.dimensions['nj'].size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can model
be both MOM
and CICE
at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This program can only process MOM and CICE one at a time, therefore model
can only be either MOM
or CICE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, this program needs to be written in a way that leverages OOP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The program should be agnostic to the model. All it requires if the ESMF coefficients and a FORTRAN bit to remap the variables. ESMF is also agnostic to the model. All it cares about is mapping points from a source grid to a destination grid. The only scenario that I can think of in which you would need to specify the model is if you are generating a MOM6 or CICE tri-polar projection from scratch.
if model == 'MOM': | ||
if 'sin_rot' in ncvarlist: | ||
sinrot=ocnf['sin_rot'] | ||
else: | ||
sinrot=ocnf['sinrot'] | ||
if 'cos_rot' in ncvarlist: | ||
cosrot=ocnf['cos_rot'] | ||
else: | ||
cosrot=ocnf['cosrot'] | ||
|
||
z_l=ocnf['z_l'] | ||
z_i=ocnf['z_i'] | ||
nlevs=len(z_l) | ||
|
||
if model == 'CICE': | ||
angleT=ocnf['ANGLET'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle MOM
in MOM
related methods and CICE
likewise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary. Both MOM6 and CICE are on the same tri-polar grid for the UFS. Further, there is no rotation necessary for CICE since the variables are all Arakawa-C mass points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we confirm with @DeniseWorthen who wrote the original NCL scripts. While I don't remember specific details, I do know that Denise is an expert on the CICE grid and carefully confirmed every variable was being interpolated the correct way and I'm assuming Eric is following what was done in the original NCL scripts here. Although maybe somethings changed since then.
PS Thank you @EricSinsky-NOAA for this PR. I know this has been a long standing need (#923 was created a year ago) and it's greatly appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JessicaMeixner-NOAA. The vector rotation is being performed the same way as in Denise's NCL code. It is possible, however, that something has changed since then (as you pointed out).
@@ -0,0 +1,646 @@ | |||
#------------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many questions from this script.
- Is this really efficient? Is python the right choice for doing this or is a compiled language (e.g. Fortran) better suited for the operations of this size.
- There are no tests for this. Every python program that is added must include tests and pass standards.
- Why are we using those
parm
files? Can that information not be described in a better form e.g.yaml
ornamelist
(if using Fortran).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I definitely agree that Fortran is more efficient with computation, which is why most of the intense computation is performed using a Fortran to Python interface generator (f2py) provided by Numpy.
- Sure, formal tests will be performed.
- There's no particular reason. I can change the format of the parm files to yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
From my experience, FORTRAN is going to be the most efficient way to do this. Further, there is no need to wrap the FORTRAN with
f2py
. If you have the weights already, you should only need to loop over the coefficients to do the remapping. Using Python only complicates things and makes the application less portable. -
A simple unit test would be using a static ESMF coefficients file, remapping from 5p0 to a comparable Gaussian grid nominal resolution and taking the difference of the fields. I would expect the different to be on the order of 0 aside from rounding error.
-
This is the perfect scenario for which to use a YAML configuration. The configuration file you provide is difficult to interpret. A YAML variation would be much more readable and easier to digest.
wgtsfile1 = nemsrc+'/'+'tripole.mx025.Ct.to.'+dsttype[0]+dstgrds[jj]+'.bilinear.nc' | ||
wgtsfile2 = nemsrc+'/'+'tripole.mx025.Ct.to.'+dsttype[0]+dstgrds[jj]+'.conserve.nc' | ||
if model == 'MOM': | ||
wgtsfile3 = nemsrc+'/'+'tripole.mx025.Cu.to.Ct.bilinear.nc' | ||
wgtsfile4 = nemsrc+'/'+'tripole.mx025.Cv.to.Ct.bilinear.nc' | ||
if model == 'CICE': | ||
wgtsfile3 = nemsrc+'/'+'tripole.mx025.Bu.to.Ct.bilinear.nc' | ||
wgtsfile4 = nemsrc+'/'+'tripole.mx025.Bu.to.Ct.bilinear.nc' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these seem to be hard-wired here. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will fix this so that the filenames not hard-wired. Thanks for catching this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these things that are different for MOM and CICE should ideally be moved to a yaml file. The python shouldn't need to know what it is working on, just a yaml with all the settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. yaml files will be used instead of the preliminary parm files that are being used now.
rgrdf1=nc.Dataset(wgtsfile1) | ||
S1=rgrdf1['S'][:].copy() | ||
row1=rgrdf1['row'][:] | ||
col1=rgrdf1['col'][:] | ||
frac_b1=rgrdf1['frac_b'][:] | ||
n_s1=len(S1) | ||
|
||
rgrdf2=nc.Dataset(wgtsfile2) | ||
S2=rgrdf2['S'][:].copy() | ||
row2=rgrdf2['row'][:] | ||
col2=rgrdf2['col'][:] | ||
frac_b2=rgrdf2['frac_b'][:] | ||
n_s2=len(S2) | ||
|
||
rgrdf3=nc.Dataset(wgtsfile3) | ||
S3=rgrdf3['S'][:].copy() | ||
row3=rgrdf3['row'][:] | ||
col3=rgrdf3['col'][:] | ||
frac_b3=rgrdf3['frac_b'][:] | ||
n_s3=len(S3) | ||
|
||
rgrdf4=nc.Dataset(wgtsfile4) | ||
S4=rgrdf4['S'][:].copy() | ||
row4=rgrdf4['row'][:] | ||
col4=rgrdf4['col'][:] | ||
frac_b4=rgrdf4['frac_b'][:] | ||
n_s4=len(S4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
netCDF4
affords lazy-loading of the dataset. Why are we extracting that data upfront? This doesn't seem like a good use of the library.
rgmask3dn = f.createVariable('rgmask3d', 'f4', ('time', 'z_l','lat', 'lon')) | ||
time = f.createVariable('Time', 'i4', 'time') | ||
|
||
longitude[:] = lond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are things you would do in a Fortran code.
This is not proper use of the python language.
#Create Mask NETCDF File######### | ||
testfile='masks_'+dstgrds[jj]+'.nc' | ||
os.system('rm -vf '+testfile) | ||
f = nc.Dataset(testfile,'w', format='NETCDF4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use contextmanagers
#Create output regridded netcdf file | ||
FILENAME_REGRID = outfile0 | ||
os.system('rm -f '+FILENAME_REGRID) | ||
outcdf = nc.Dataset(FILENAME_REGRID,'w', format='NETCDF3_CLASSIC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we writing out NETCDF3
version filed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe NETCDF3 was the format that was used in the ocean and ice post NCL programs. However, there is no particular reason we need to keep it this way. I'll change this to NETCDF4, or this can be a user-configurable option in a yaml file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe NETCDF3 was the format that was used in the ocean and ice post NCL programs. However, there is no particular reason we need to keep it this way. I'll change this to NETCDF4, or this can be a user-configurable option in a yaml file.
it is netcdf4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks Jiande.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
netCDF3 is an artifact of NGGPS.
Additional code that is not needed in ocnpost has been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anytime that you need to run over large loops in Python, consideration should be given to using a lower-level language (e.g., FORTRAN). Even with efficient use of Python list-comprehensions FORTRAN will still be a faster solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this application can be leaned out considerably.
-
Use YAML files as noted throughout the comments.
-
Take the remapping step out. If you already have the ESMF remapping coefficient files, all you need is a FORTRAN program to read the source variable files, do the interpolation, and write the remapped variable out.
-
The MOM6 velocity vector Earth -> grid and grid-> Earth relative rotations can also be done more efficiently in FORTRAN.
-
As is, this application limits portability.
Unused code that was used for preliminary testing purposes has been removed.
Description
The ocean and ice post NCL programs can no longer be used in the global-workflow because NCL is obsolete. Therefore, the ocean and ice post NCL programs have been replaced by one Python program. The ocean/ice post Python program has the same capabilities as the NCL programs.
Refs #923
Type of change
How Has This Been Tested?
Checklist