Replace ocean/ice post NCL programs with Python program #1736

EricSinsky-NOAA · 2023-07-13T17:52:43Z

Description

The ocean and ice post NCL programs can no longer be used in the global-workflow because NCL is obsolete. Therefore, the ocean and ice post NCL programs have been replaced by one Python program. The ocean/ice post Python program has the same capabilities as the NCL programs.

Refs #923

Type of change

New feature (non-breaking change which adds functionality)

How Has This Been Tested?

EP4 on WCOSS2

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes need updates to the documentation. I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing tests pass with my changes
Any dependent changes have been merged and published

Ocnpost and icepost NCL code has been ported to Python so that the ocean and ice post can be supported on WCOSS2.

The script that calls the ocnpost program has been modifed so that the ocnpost python program can be executed instead of the obsolete ocnpost and icepost NCL programs.

Ocean and ice post parm files that the ocnpost Python program requires have been added.

Unneeded lines have been removed from the run_regrid script and the script has been polished.

Code that has been commented out has been removed from ocnpost.

The ocnpost Python code has been updated to account for both daily and hourly raw CICE NetCDF input files.

The ocnpost Python program has been updated so that all 40 levels from MOM6 can be processed.

The ocnpost parm files have been updated for MOM6 and CICE6 to include all needed ocean and ice variables.

The input netcdf filename that is needed for ocnpost has been modified so that it agrees with the names of the netcdf files produced from MOM6 and CICE6.

github-advanced-security

shellcheck found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

WalterKolczynski-NOAA · 2023-07-13T18:05:30Z

@NeilBarton-NOAA @jiandewang Hold off on reviewing until all the checks passed and Rahul or I have given the go-ahead

aerorahul

See first cut at the review comments.

aerorahul · 2023-07-13T18:05:05Z

ush/ocnpost.py

+    try:
+        import esmfwghtinterp_f90subroutine
+    except ImportError or ModuleNotFoundError:
+        print('Weighted interpolation fortran subroutine has not been compiled. Compiling...')
+        np.f2py.compile(fsource1, modulename='esmfwghtinterp_f90subroutine', verbose=False)
+        import esmfwghtinterp_f90subroutine


One cannot be compiling everytime this script is run.

I agree. As of now, a compilation is only performed if the .so file does not exist in any Python path. However, the compile step can be removed completely from the code if that is preferred.

I don't understand why this can't just be this python function:

def bilin(src, S, row, col, frac_b, dst_len): dst = [0.0] * dst_len # Initialize destination field to 0.0 # Apply weights for i in range(len(row)): dst[row[i]] += S[i] * src[col[i]] # Apply fraction correction for i in range(dst_len): if frac_b[i] != 0.0: dst[i] /= frac_b[i] return dst

I tried something very similar to this. The program ran less efficiently when this was a python function.

There are efficient bilinear interpolation methods in numpy. Did you try any of those?
If this is the meat and potatoes of this program, IMO, it makes sense for this to be a Fortran program. The rest is just IO.

I meant something from numpy or scipy e.g. https://numpy.org/devdocs/reference/generated/numpy.interp.html

To my understanding, np.interp does not have an option for a weighted bilinear interpolation. This Fortran subroutine, which uses the ESMF regridding weights in a bilinear or conservative interpolation is referenced here.

I would personally avoid numpy and/or scipy. I have also tried these methods and they are slow and do not take into account issues such as landmask and fractional grids.

To reiterate, once you have the ESMF weights, all you need to do is add a short subroutine to read in the source grid variable and remap it to the destination grid. It should be short (on the order of 10 lines).

@HenryWinterbottom-NOAA Thank you for your feedback. Just to confirm, is this the short subroutine that you are referring to?

! Initialize destination field to 0.0 do i=1,dst_len dst(i)=0.0 enddo ! Apply weights do i=1, n_s dst(row(i))=dst(row(i))+S(i)*src(col(i)) enddo do i=1, dst_len if (frac_b(i) .ne. 0.0) then dst(i)=dst(i)/frac_b(i) endif enddo

@EricSinsky-NOAA Yes.

aerorahul · 2023-07-13T18:05:21Z

ush/ocnpost.py

+        import esmfwghtinterp_f90subroutine
+
+def esmfmanualregrid(src,S,row,col,frac_b,n_s,varmeth,slatd,slond,dlatd,dlond):
+    import esmfwghtinterp_f90subroutine 


all imports must be at the top

aerorahul · 2023-07-13T18:05:51Z

ush/ocnpost.py

+
+def main():
+
+    print('Main program has started')


all new python programs must use wxflow to setup logging, execution etc.

aerorahul · 2023-07-13T18:06:37Z

ush/ocnpost.py

+    if len(sys.argv) == 6:
+        infiles0=sys.argv[1]
+        outfiles0=sys.argv[2]
+        nemsrc=sys.argv[3]
+        parmfile=sys.argv[4]
+        dstgrds=[sys.argv[5]]
+    else:
+        print('Incorrect number of arguments.')
+        exit()


This needs to be handled with argparse
Use raise for exception handling and exit with an error message that is clear.

Sure. I wasn't entirely sure if argparse was supported on WCOSS2. I will use argparse instead.

argparse is core python

aerorahul · 2023-07-13T18:07:38Z

ush/ocnpost.py

+    if infiles0.endswith('.nc') and outfiles0.endswith('.nc'):
+        print('Input and output files are netcdf.')
+        infiles1=[sys.argv[1]]
+        outfiles1=[sys.argv[2]]
+        dimfiles=1
+    else:
+        with open(infiles0) as f1:
+            infiles1 = f1.read().splitlines()
+        with open(outfiles0) as f2:
+            outfiles1 = f2.read().splitlines()


Please provide comments or describe what the code/section is doing.
Break code into methods for readability and testing.
Add tests.

aerorahul · 2023-07-13T18:08:07Z

ush/ocnpost.py

+    #the following NCO command will be issued at the end 
+    #to rename the variable mld to ePBL if the variable mld is found
+
+    ncocmd=['ncrename -O -v MLD_003,mld']


Where is this used? Remove unused codes

Thanks for catching this. I will remove this since it will not be needed.

aerorahul · 2023-07-13T18:09:37Z

ush/ocnpost.py

+    if 'SST' in ncvarlist:
+        model='MOM'
+    elif 'Tsfc_d' in ncvarlist or 'Tsfc_h' in ncvarlist:
+        model='CICE'
+        if 'Tsfc_d' in ncvarlist:
+            samplevar='Tsfc_d'
+        if 'Tsfc_h' in ncvarlist:
+            samplevar='Tsfc_h' 
+    else:
+        print('Product not supported. Exiting program...')
+        exit()


This should be turned into methods for testing and extensibility

aerorahul · 2023-07-13T18:10:04Z

ush/ocnpost.py

+    if model == 'MOM':
+        dimxh=ocnf.dimensions['xh'].size
+        dimyh=ocnf.dimensions['yh'].size
+    if model == 'CICE':
+        dimxh=ocnf.dimensions['ni'].size
+        dimyh=ocnf.dimensions['nj'].size


Can model be both MOM and CICE at the same time?

This program can only process MOM and CICE one at a time, therefore model can only be either MOM or CICE.

In that case, this program needs to be written in a way that leverages OOP.

The program should be agnostic to the model. All it requires if the ESMF coefficients and a FORTRAN bit to remap the variables. ESMF is also agnostic to the model. All it cares about is mapping points from a source grid to a destination grid. The only scenario that I can think of in which you would need to specify the model is if you are generating a MOM6 or CICE tri-polar projection from scratch.

aerorahul · 2023-07-13T18:10:27Z

ush/ocnpost.py

+    if model == 'MOM':
+        if 'sin_rot' in ncvarlist:
+            sinrot=ocnf['sin_rot']
+        else:
+            sinrot=ocnf['sinrot']
+        if 'cos_rot' in ncvarlist:
+            cosrot=ocnf['cos_rot']
+        else:
+            cosrot=ocnf['cosrot']
+
+        z_l=ocnf['z_l']
+        z_i=ocnf['z_i']
+        nlevs=len(z_l)
+
+    if model == 'CICE':
+        angleT=ocnf['ANGLET'] 


Handle MOM in MOM related methods and CICE likewise.

This is not necessary. Both MOM6 and CICE are on the same tri-polar grid for the UFS. Further, there is no rotation necessary for CICE since the variables are all Arakawa-C mass points.

Can we confirm with @DeniseWorthen who wrote the original NCL scripts. While I don't remember specific details, I do know that Denise is an expert on the CICE grid and carefully confirmed every variable was being interpolated the correct way and I'm assuming Eric is following what was done in the original NCL scripts here. Although maybe somethings changed since then.

PS Thank you @EricSinsky-NOAA for this PR. I know this has been a long standing need (#923 was created a year ago) and it's greatly appreciated.

Thanks @JessicaMeixner-NOAA. The vector rotation is being performed the same way as in Denise's NCL code. It is possible, however, that something has changed since then (as you pointed out).

aerorahul · 2023-07-13T18:13:32Z

ush/ocnpost.py

@@ -0,0 +1,646 @@
+#------------------------------------------------------------------


There are too many questions from this script.

Is this really efficient? Is python the right choice for doing this or is a compiled language (e.g. Fortran) better suited for the operations of this size.

There are no tests for this. Every python program that is added must include tests and pass standards.

Why are we using those parm files? Can that information not be described in a better form e.g. yaml or namelist (if using Fortran).

I definitely agree that Fortran is more efficient with computation, which is why most of the intense computation is performed using a Fortran to Python interface generator (f2py) provided by Numpy.

Sure, formal tests will be performed.

There's no particular reason. I can change the format of the parm files to yaml.

From my experience, FORTRAN is going to be the most efficient way to do this. Further, there is no need to wrap the FORTRAN with f2py. If you have the weights already, you should only need to loop over the coefficients to do the remapping. Using Python only complicates things and makes the application less portable.

A simple unit test would be using a static ESMF coefficients file, remapping from 5p0 to a comparable Gaussian grid nominal resolution and taking the difference of the fields. I would expect the different to be on the order of 0 aside from rounding error.

This is the perfect scenario for which to use a YAML configuration. The configuration file you provide is difficult to interpret. A YAML variation would be much more readable and easier to digest.

aerorahul · 2023-07-13T18:33:50Z

ush/ocnpost.py

+        wgtsfile1 = nemsrc+'/'+'tripole.mx025.Ct.to.'+dsttype[0]+dstgrds[jj]+'.bilinear.nc'        
+        wgtsfile2 = nemsrc+'/'+'tripole.mx025.Ct.to.'+dsttype[0]+dstgrds[jj]+'.conserve.nc'
+        if model == 'MOM':
+            wgtsfile3 = nemsrc+'/'+'tripole.mx025.Cu.to.Ct.bilinear.nc'    
+            wgtsfile4 = nemsrc+'/'+'tripole.mx025.Cv.to.Ct.bilinear.nc'
+        if model == 'CICE':
+            wgtsfile3 = nemsrc+'/'+'tripole.mx025.Bu.to.Ct.bilinear.nc'    
+            wgtsfile4 = nemsrc+'/'+'tripole.mx025.Bu.to.Ct.bilinear.nc'


these seem to be hard-wired here. Why?

Yes, I will fix this so that the filenames not hard-wired. Thanks for catching this.

All these things that are different for MOM and CICE should ideally be moved to a yaml file. The python shouldn't need to know what it is working on, just a yaml with all the settings.

Agreed. yaml files will be used instead of the preliminary parm files that are being used now.

aerorahul · 2023-07-13T18:34:40Z

ush/ocnpost.py

+        rgrdf1=nc.Dataset(wgtsfile1)
+        S1=rgrdf1['S'][:].copy()
+        row1=rgrdf1['row'][:]
+        col1=rgrdf1['col'][:]
+        frac_b1=rgrdf1['frac_b'][:]
+        n_s1=len(S1)
+
+        rgrdf2=nc.Dataset(wgtsfile2)
+        S2=rgrdf2['S'][:].copy()
+        row2=rgrdf2['row'][:]
+        col2=rgrdf2['col'][:]
+        frac_b2=rgrdf2['frac_b'][:]
+        n_s2=len(S2)
+
+        rgrdf3=nc.Dataset(wgtsfile3)
+        S3=rgrdf3['S'][:].copy()
+        row3=rgrdf3['row'][:]
+        col3=rgrdf3['col'][:]
+        frac_b3=rgrdf3['frac_b'][:]
+        n_s3=len(S3)
+
+        rgrdf4=nc.Dataset(wgtsfile4)
+        S4=rgrdf4['S'][:].copy()
+        row4=rgrdf4['row'][:]
+        col4=rgrdf4['col'][:]
+        frac_b4=rgrdf4['frac_b'][:]
+        n_s4=len(S4)


netCDF4 affords lazy-loading of the dataset. Why are we extracting that data upfront? This doesn't seem like a good use of the library.

aerorahul · 2023-07-13T18:35:29Z

ush/ocnpost.py

+            rgmask3dn = f.createVariable('rgmask3d', 'f4', ('time', 'z_l','lat', 'lon')) 
+        time = f.createVariable('Time', 'i4', 'time')
+
+        longitude[:] = lond    


these are things you would do in a Fortran code.
This is not proper use of the python language.

aerorahul · 2023-07-13T18:36:09Z

ush/ocnpost.py

+        #Create Mask NETCDF File#########      
+        testfile='masks_'+dstgrds[jj]+'.nc'
+        os.system('rm -vf '+testfile)
+        f = nc.Dataset(testfile,'w', format='NETCDF4') 


use contextmanagers

aerorahul · 2023-07-13T18:36:34Z

ush/ocnpost.py

+        #Create output regridded netcdf file 
+            FILENAME_REGRID = outfile0
+            os.system('rm -f '+FILENAME_REGRID)
+            outcdf = nc.Dataset(FILENAME_REGRID,'w', format='NETCDF3_CLASSIC')


why are we writing out NETCDF3 version filed?

I believe NETCDF3 was the format that was used in the ocean and ice post NCL programs. However, there is no particular reason we need to keep it this way. I'll change this to NETCDF4, or this can be a user-configurable option in a yaml file.

I believe NETCDF3 was the format that was used in the ocean and ice post NCL programs. However, there is no particular reason we need to keep it this way. I'll change this to NETCDF4, or this can be a user-configurable option in a yaml file.

it is netcdf4

Got it, thanks Jiande.

netCDF3 is an artifact of NGGPS.

Additional code that is not needed in ocnpost has been removed.

HenryRWinterbottom · 2023-07-14T17:30:03Z

ush/ocnpost.py

Anytime that you need to run over large loops in Python, consideration should be given to using a lower-level language (e.g., FORTRAN). Even with efficient use of Python list-comprehensions FORTRAN will still be a faster solution.

HenryRWinterbottom

I think this application can be leaned out considerably.

Use YAML files as noted throughout the comments.
Take the remapping step out. If you already have the ESMF remapping coefficient files, all you need is a FORTRAN program to read the source variable files, do the interpolation, and write the remapped variable out.
The MOM6 velocity vector Earth -> grid and grid-> Earth relative rotations can also be done more efficiently in FORTRAN.
As is, this application limits portability.

Unused code that was used for preliminary testing purposes has been removed.

EricSinsky-NOAA and others added 10 commits July 1, 2023 16:13

Add ocnpost Python code to replace NCL code

aeeeab3

Ocnpost and icepost NCL code has been ported to Python so that the ocean and ice post can be supported on WCOSS2.

Modify run_regrid script to accommodate updated ocnpost

59031dd

The script that calls the ocnpost program has been modifed so that the ocnpost python program can be executed instead of the obsolete ocnpost and icepost NCL programs.

Add ocnpost and icepost parm files for new ocnpost

bb877fc

Ocean and ice post parm files that the ocnpost Python program requires have been added.

Clean up run_regrid script

1178237

Unneeded lines have been removed from the run_regrid script and the script has been polished.

Clean up ocnpost

524bb3d

Code that has been commented out has been removed from ocnpost.

Account for daily and hourly CICE files in ocnpost

79a93ab

The ocnpost Python code has been updated to account for both daily and hourly raw CICE NetCDF input files.

Modify ocnpost to process 40 levels from MOM6

27e6126

The ocnpost Python program has been updated so that all 40 levels from MOM6 can be processed.

Update ocnpost parm files

e6e653b

The ocnpost parm files have been updated for MOM6 and CICE6 to include all needed ocean and ice variables.

Modify input netcdf filenames for ocnpost program

5558b55

The input netcdf filename that is needed for ocnpost has been modified so that it agrees with the names of the netcdf files produced from MOM6 and CICE6.

Merge branch 'develop' into feature/ocnicepost_wcoss2_port

ae74918

github-advanced-security bot found potential problems Jul 13, 2023

View reviewed changes

WalterKolczynski-NOAA requested review from WalterKolczynski-NOAA, NeilBarton-NOAA, jiandewang and aerorahul July 13, 2023 18:02

aerorahul requested changes Jul 13, 2023

View reviewed changes

aerorahul reviewed Jul 13, 2023

View reviewed changes

eric sinsky and others added 2 commits July 14, 2023 12:08

Merge branch 'develop' into feature/ocnicepost_wcoss2_port

a5a764c

Remove unused code

4d835d8

Additional code that is not needed in ocnpost has been removed.

HenryRWinterbottom reviewed Jul 14, 2023

View reviewed changes

Remove more unused code

e91b3fe

Unused code that was used for preliminary testing purposes has been removed.

JessicaMeixner-NOAA mentioned this pull request Jul 17, 2023

Port NCL scripts to python #923

Closed

HenryRWinterbottom closed this Oct 20, 2023

		@@ -0,0 +1,646 @@
		#------------------------------------------------------------------

Replace ocean/ice post NCL programs with Python program #1736

Replace ocean/ice post NCL programs with Python program #1736

Conversation

EricSinsky-NOAA commented Jul 13, 2023 • edited Loading

github-advanced-security bot left a comment

Choose a reason for hiding this comment

WalterKolczynski-NOAA commented Jul 13, 2023

aerorahul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

EricSinsky-NOAA Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aerorahul Jul 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HenryRWinterbottom left a comment

Choose a reason for hiding this comment

EricSinsky-NOAA commented Jul 13, 2023 •

edited

Loading

EricSinsky-NOAA Jul 13, 2023 •

edited

Loading

aerorahul Jul 13, 2023 •

edited

Loading