Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ocean/ice post NCL programs with Python program #1736

Closed
wants to merge 13 commits into from
Closed

Replace ocean/ice post NCL programs with Python program #1736

wants to merge 13 commits into from

Conversation

EricSinsky-NOAA
Copy link
Contributor

@EricSinsky-NOAA EricSinsky-NOAA commented Jul 13, 2023

Description

The ocean and ice post NCL programs can no longer be used in the global-workflow because NCL is obsolete. Therefore, the ocean and ice post NCL programs have been replaced by one Python program. The ocean/ice post Python program has the same capabilities as the NCL programs.

Refs #923

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • EP4 on WCOSS2

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

EricSinsky-NOAA and others added 10 commits July 1, 2023 16:13
Ocnpost and icepost NCL code has been ported to Python so that the ocean and ice post can be supported on WCOSS2.
The script that calls the ocnpost program has been modifed so that the ocnpost python program can be executed instead of the obsolete ocnpost and icepost NCL programs.
Ocean and ice post parm files that the ocnpost Python program requires have been added.
Unneeded lines have been removed from the run_regrid script and the script has been polished.
Code that has been commented out has been removed from ocnpost.
The ocnpost Python code has been updated to account for both daily and hourly raw CICE NetCDF input files.
The ocnpost Python program has been updated so that all 40 levels from
MOM6 can be processed.
The ocnpost parm files have been updated for MOM6 and CICE6 to include
all needed ocean and ice variables.
The input netcdf filename that is needed for ocnpost has been modified so that it agrees with
the names of the netcdf files produced from MOM6 and CICE6.
Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shellcheck found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

@WalterKolczynski-NOAA
Copy link
Contributor

@NeilBarton-NOAA @jiandewang Hold off on reviewing until all the checks passed and Rahul or I have given the go-ahead

Copy link
Contributor

@aerorahul aerorahul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See first cut at the review comments.

Comment on lines +137 to +142
try:
import esmfwghtinterp_f90subroutine
except ImportError or ModuleNotFoundError:
print('Weighted interpolation fortran subroutine has not been compiled. Compiling...')
np.f2py.compile(fsource1, modulename='esmfwghtinterp_f90subroutine', verbose=False)
import esmfwghtinterp_f90subroutine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One cannot be compiling everytime this script is run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. As of now, a compilation is only performed if the .so file does not exist in any Python path. However, the compile step can be removed completely from the code if that is preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this can't just be this python function:

def bilin(src, S, row, col, frac_b, dst_len):
    dst = [0.0] * dst_len  # Initialize destination field to 0.0

    # Apply weights
    for i in range(len(row)):
        dst[row[i]] += S[i] * src[col[i]]

    # Apply fraction correction
    for i in range(dst_len):
        if frac_b[i] != 0.0:
            dst[i] /= frac_b[i]

    return dst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something very similar to this. The program ran less efficiently when this was a python function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are efficient bilinear interpolation methods in numpy. Did you try any of those?
If this is the meat and potatoes of this program, IMO, it makes sense for this to be a Fortran program. The rest is just IO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something from numpy or scipy e.g. https://numpy.org/devdocs/reference/generated/numpy.interp.html

Copy link
Contributor Author

@EricSinsky-NOAA EricSinsky-NOAA Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, np.interp does not have an option for a weighted bilinear interpolation. This Fortran subroutine, which uses the ESMF regridding weights in a bilinear or conservative interpolation is referenced here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally avoid numpy and/or scipy. I have also tried these methods and they are slow and do not take into account issues such as landmask and fractional grids.

To reiterate, once you have the ESMF weights, all you need to do is add a short subroutine to read in the source grid variable and remap it to the destination grid. It should be short (on the order of 10 lines).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HenryWinterbottom-NOAA Thank you for your feedback. Just to confirm, is this the short subroutine that you are referring to?

       ! Initialize destination field to 0.0
       do i=1,dst_len
         dst(i)=0.0
       enddo

       ! Apply weights
       do i=1, n_s
         dst(row(i))=dst(row(i))+S(i)*src(col(i))
       enddo
       
       do i=1, dst_len
         if (frac_b(i) .ne. 0.0) then
           dst(i)=dst(i)/frac_b(i)
         endif
       enddo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import esmfwghtinterp_f90subroutine

def esmfmanualregrid(src,S,row,col,frac_b,n_s,varmeth,slatd,slond,dlatd,dlond):
import esmfwghtinterp_f90subroutine
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all imports must be at the top


def main():

print('Main program has started')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all new python programs must use wxflow to setup logging, execution etc.

Comment on lines +181 to +189
if len(sys.argv) == 6:
infiles0=sys.argv[1]
outfiles0=sys.argv[2]
nemsrc=sys.argv[3]
parmfile=sys.argv[4]
dstgrds=[sys.argv[5]]
else:
print('Incorrect number of arguments.')
exit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be handled with argparse
Use raise for exception handling and exit with an error message that is clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I wasn't entirely sure if argparse was supported on WCOSS2. I will use argparse instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argparse is core python

Comment on lines +191 to +200
if infiles0.endswith('.nc') and outfiles0.endswith('.nc'):
print('Input and output files are netcdf.')
infiles1=[sys.argv[1]]
outfiles1=[sys.argv[2]]
dimfiles=1
else:
with open(infiles0) as f1:
infiles1 = f1.read().splitlines()
with open(outfiles0) as f2:
outfiles1 = f2.read().splitlines()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide comments or describe what the code/section is doing.
Break code into methods for readability and testing.
Add tests.

ush/ocnpost.py Outdated
#the following NCO command will be issued at the end
#to rename the variable mld to ePBL if the variable mld is found

ncocmd=['ncrename -O -v MLD_003,mld']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used? Remove unused codes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I will remove this since it will not be needed.

Comment on lines +272 to +282
if 'SST' in ncvarlist:
model='MOM'
elif 'Tsfc_d' in ncvarlist or 'Tsfc_h' in ncvarlist:
model='CICE'
if 'Tsfc_d' in ncvarlist:
samplevar='Tsfc_d'
if 'Tsfc_h' in ncvarlist:
samplevar='Tsfc_h'
else:
print('Product not supported. Exiting program...')
exit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be turned into methods for testing and extensibility

Comment on lines +286 to +291
if model == 'MOM':
dimxh=ocnf.dimensions['xh'].size
dimyh=ocnf.dimensions['yh'].size
if model == 'CICE':
dimxh=ocnf.dimensions['ni'].size
dimyh=ocnf.dimensions['nj'].size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can model be both MOM and CICE at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This program can only process MOM and CICE one at a time, therefore model can only be either MOM or CICE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, this program needs to be written in a way that leverages OOP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The program should be agnostic to the model. All it requires if the ESMF coefficients and a FORTRAN bit to remap the variables. ESMF is also agnostic to the model. All it cares about is mapping points from a source grid to a destination grid. The only scenario that I can think of in which you would need to specify the model is if you are generating a MOM6 or CICE tri-polar projection from scratch.

Comment on lines +297 to +312
if model == 'MOM':
if 'sin_rot' in ncvarlist:
sinrot=ocnf['sin_rot']
else:
sinrot=ocnf['sinrot']
if 'cos_rot' in ncvarlist:
cosrot=ocnf['cos_rot']
else:
cosrot=ocnf['cosrot']

z_l=ocnf['z_l']
z_i=ocnf['z_i']
nlevs=len(z_l)

if model == 'CICE':
angleT=ocnf['ANGLET']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle MOM in MOM related methods and CICE likewise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary. Both MOM6 and CICE are on the same tri-polar grid for the UFS. Further, there is no rotation necessary for CICE since the variables are all Arakawa-C mass points.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we confirm with @DeniseWorthen who wrote the original NCL scripts. While I don't remember specific details, I do know that Denise is an expert on the CICE grid and carefully confirmed every variable was being interpolated the correct way and I'm assuming Eric is following what was done in the original NCL scripts here. Although maybe somethings changed since then.

PS Thank you @EricSinsky-NOAA for this PR. I know this has been a long standing need (#923 was created a year ago) and it's greatly appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JessicaMeixner-NOAA. The vector rotation is being performed the same way as in Denise's NCL code. It is possible, however, that something has changed since then (as you pointed out).

@@ -0,0 +1,646 @@
#------------------------------------------------------------------
Copy link
Contributor

@aerorahul aerorahul Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are too many questions from this script.

  1. Is this really efficient? Is python the right choice for doing this or is a compiled language (e.g. Fortran) better suited for the operations of this size.
  2. There are no tests for this. Every python program that is added must include tests and pass standards.
  3. Why are we using those parm files? Can that information not be described in a better form e.g. yaml or namelist (if using Fortran).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I definitely agree that Fortran is more efficient with computation, which is why most of the intense computation is performed using a Fortran to Python interface generator (f2py) provided by Numpy.
  2. Sure, formal tests will be performed.
  3. There's no particular reason. I can change the format of the parm files to yaml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. From my experience, FORTRAN is going to be the most efficient way to do this. Further, there is no need to wrap the FORTRAN with f2py. If you have the weights already, you should only need to loop over the coefficients to do the remapping. Using Python only complicates things and makes the application less portable.

  2. A simple unit test would be using a static ESMF coefficients file, remapping from 5p0 to a comparable Gaussian grid nominal resolution and taking the difference of the fields. I would expect the different to be on the order of 0 aside from rounding error.

  3. This is the perfect scenario for which to use a YAML configuration. The configuration file you provide is difficult to interpret. A YAML variation would be much more readable and easier to digest.

Comment on lines +396 to +403
wgtsfile1 = nemsrc+'/'+'tripole.mx025.Ct.to.'+dsttype[0]+dstgrds[jj]+'.bilinear.nc'
wgtsfile2 = nemsrc+'/'+'tripole.mx025.Ct.to.'+dsttype[0]+dstgrds[jj]+'.conserve.nc'
if model == 'MOM':
wgtsfile3 = nemsrc+'/'+'tripole.mx025.Cu.to.Ct.bilinear.nc'
wgtsfile4 = nemsrc+'/'+'tripole.mx025.Cv.to.Ct.bilinear.nc'
if model == 'CICE':
wgtsfile3 = nemsrc+'/'+'tripole.mx025.Bu.to.Ct.bilinear.nc'
wgtsfile4 = nemsrc+'/'+'tripole.mx025.Bu.to.Ct.bilinear.nc'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these seem to be hard-wired here. Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will fix this so that the filenames not hard-wired. Thanks for catching this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these things that are different for MOM and CICE should ideally be moved to a yaml file. The python shouldn't need to know what it is working on, just a yaml with all the settings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. yaml files will be used instead of the preliminary parm files that are being used now.

Comment on lines +411 to +437
rgrdf1=nc.Dataset(wgtsfile1)
S1=rgrdf1['S'][:].copy()
row1=rgrdf1['row'][:]
col1=rgrdf1['col'][:]
frac_b1=rgrdf1['frac_b'][:]
n_s1=len(S1)

rgrdf2=nc.Dataset(wgtsfile2)
S2=rgrdf2['S'][:].copy()
row2=rgrdf2['row'][:]
col2=rgrdf2['col'][:]
frac_b2=rgrdf2['frac_b'][:]
n_s2=len(S2)

rgrdf3=nc.Dataset(wgtsfile3)
S3=rgrdf3['S'][:].copy()
row3=rgrdf3['row'][:]
col3=rgrdf3['col'][:]
frac_b3=rgrdf3['frac_b'][:]
n_s3=len(S3)

rgrdf4=nc.Dataset(wgtsfile4)
S4=rgrdf4['S'][:].copy()
row4=rgrdf4['row'][:]
col4=rgrdf4['col'][:]
frac_b4=rgrdf4['frac_b'][:]
n_s4=len(S4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netCDF4 affords lazy-loading of the dataset. Why are we extracting that data upfront? This doesn't seem like a good use of the library.

rgmask3dn = f.createVariable('rgmask3d', 'f4', ('time', 'z_l','lat', 'lon'))
time = f.createVariable('Time', 'i4', 'time')

longitude[:] = lond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are things you would do in a Fortran code.
This is not proper use of the python language.

#Create Mask NETCDF File#########
testfile='masks_'+dstgrds[jj]+'.nc'
os.system('rm -vf '+testfile)
f = nc.Dataset(testfile,'w', format='NETCDF4')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use contextmanagers

#Create output regridded netcdf file
FILENAME_REGRID = outfile0
os.system('rm -f '+FILENAME_REGRID)
outcdf = nc.Dataset(FILENAME_REGRID,'w', format='NETCDF3_CLASSIC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we writing out NETCDF3 version filed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe NETCDF3 was the format that was used in the ocean and ice post NCL programs. However, there is no particular reason we need to keep it this way. I'll change this to NETCDF4, or this can be a user-configurable option in a yaml file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe NETCDF3 was the format that was used in the ocean and ice post NCL programs. However, there is no particular reason we need to keep it this way. I'll change this to NETCDF4, or this can be a user-configurable option in a yaml file.

it is netcdf4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks Jiande.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netCDF3 is an artifact of NGGPS.

eric sinsky and others added 2 commits July 14, 2023 12:08
Additional code that is not needed in ocnpost has been removed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anytime that you need to run over large loops in Python, consideration should be given to using a lower-level language (e.g., FORTRAN). Even with efficient use of Python list-comprehensions FORTRAN will still be a faster solution.

Copy link
Contributor

@HenryRWinterbottom HenryRWinterbottom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this application can be leaned out considerably.

  1. Use YAML files as noted throughout the comments.

  2. Take the remapping step out. If you already have the ESMF remapping coefficient files, all you need is a FORTRAN program to read the source variable files, do the interpolation, and write the remapped variable out.

  3. The MOM6 velocity vector Earth -> grid and grid-> Earth relative rotations can also be done more efficiently in FORTRAN.

  4. As is, this application limits portability.

Unused code that was used for preliminary testing purposes has been
removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants