Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need a process to subset ESMF mesh files from global ones for regional grids #1513

Closed
ekluzek opened this issue Oct 7, 2021 · 23 comments · Fixed by #1892
Closed

Need a process to subset ESMF mesh files from global ones for regional grids #1513

ekluzek opened this issue Oct 7, 2021 · 23 comments · Fixed by #1892
Assignees
Labels
enhancement new capability or improved behavior of existing capability

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 7, 2021

With the update to NUOPC we'll need a way for users to create ESMF mesh files from domain files (or from their grid) for any regional grids they create. This needs to be done for regional cases for CTSM, but also when using LILAC to run WRF with CTSM.

The current subset_data process creates domain files as well as surface datasets for regional areas. We'll need some steps to convert the domain file into an ESMF mesh. Or possibly we should directly create an ESMF mesh file, or maybe the global ESMF mesh file should also be subset (but because ESMF mesh files are 1D vector files and not 2D, this may be difficult).

This isn't a problem with single point simulations as there we provide the latitude and longitude with PTS_LAT and PTS_LON so a mesh file isn't used. But, it is a problem for regional grids of greater than a single point.

@ekluzek ekluzek added the enhancement new capability or improved behavior of existing capability label Oct 7, 2021
@billsacks
Copy link
Member

This likely isn't the way we want to go long-term, but in case it's helpful, here is the process I inherited from Mariana to convert a scrip grid file to an esmf mesh file:

#+begin_src shell
#!/bin/bash
#PBS -A P93300606
#PBS -N job_name
#PBS -j oe
#PBS -q share
#PBS -l walltime=05:00
#PBS -l select=1:mpiprocs=2
#PBS -o log.out

module load esmf_libs/8.1.1
module load esmf-8.1.1-ncdfio-mpiuni-O
export TMPDIR=/glade/scratch/$USER/temp
mkdir -p $TMPDIR
file_i="/glade/p/cesmdata/cseg/inputdata/glc/cism/griddata/SCRIPgrid_antarctica_8km_epsg3031_c210613.nc"
file_o="antarctica_8km_epsg3031_ESMFmesh_c210613.nc"
mpiexec_mpt -np 2 ESMF_Scrip2Unstruct $file_i $file_o  0 ESMF
#+end_src

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 22, 2021

Here's a script that @swensosc created to create a mesh based on the surface dataset that was created. This bypasses domain files, and bypasses subsetting a SCRIP grid file, so is a good approach to take. It should work for unstructured grids as well since the surface dataset is just a 1D vector in that case. But, we should test it on unstructured grids as well.

#! /usr/bin/env python
import sys
import string
import subprocess
import time
import argparse
import numpy as np
import netCDF4 as netcdf4

doTimer = False

# input file
infile = '/fs/cgd/csm/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_c190214.nc'

# output directory
dir_output='./'

# creation date  
command='date "+%y%m%d"'
timetag=(subprocess.run(command,stdout=subprocess.PIPE,shell='True').stdout.decode('utf-8')).strip()

# output file
outfile = dir_output+'ESMFmesh_ctsm_0.9x1.25_'+timetag+'.nc'

#--  create mesh file from surface data file
# (other file types may have different coordinate variables)
f1 = netcdf4.Dataset(infile, 'r')
lon_vname = 'LONGXY'
lat_vname = 'LATIXY'

xc=np.asfarray(f1.variables[lon_vname][:,],np.float64)
yc=np.asfarray(f1.variables[lat_vname][:,],np.float64)
xc_units=f1.variables[lon_vname].units
xc_longname=f1.variables[lon_vname].long_name
yc_units=f1.variables[lat_vname].units
yc_longname=f1.variables[lat_vname].long_name
f1.close()
delx = np.abs(xc[0,1] - xc[0,0])
dely = np.abs(yc[1,0] - yc[0,0])

#--  convert coordinates to 1d
lon = np.asarray(xc[0,:])
lat = np.asarray(yc[:,0])
im,jm = lon.size,lat.size

#--  subset coordinates
tagnum = 1
if tagnum == 1: # NE.US
    ln1=284.
    ln2=296.
    lt1=44.
    lt2=53.

xind=np.where((lon >= ln1) & (lon <= ln2))[0]
yind=np.where((lat >= lt1) & (lat <= lt2))[0]

lonc=np.copy(xc) ; latc=np.copy(yc)
lonc=lonc[yind,:]; lonc=lonc[:,xind]
latc=latc[yind,:]; latc=latc[:,xind]
    
#--  Compute grid information
num_corners = 4
nj, ni = lonc.shape
grid_size = ni * nj

grid_center_lat = np.reshape(latc,grid_size)
grid_center_lon = np.reshape(lonc,grid_size)

latn = grid_center_lat+0.5*dely
lats = grid_center_lat-0.5*dely
lone = grid_center_lon+0.5*delx
lonw = grid_center_lon-0.5*delx

grid_corner_lat = np.empty((grid_size,num_corners))
grid_corner_lat[:,0]=latn
grid_corner_lat[:,1]=latn
grid_corner_lat[:,2]=lats
grid_corner_lat[:,3]=lats

grid_corner_lon = np.empty((grid_size,num_corners))
grid_corner_lon[:,0]=lone
grid_corner_lon[:,1]=lonw
grid_corner_lon[:,2]=lonw
grid_corner_lon[:,3]=lone

grid_imask = np.ones(grid_size, dtype = int)

grid_dims = np.array((ni,nj), dtype = int)
nm = grid_dims.shape[0]

#------------------------------------------
coordDim = 2
elementCount    = grid_center_lat.size
maxNodePElement = grid_corner_lon.shape[1]
numElementConn  = [maxNodePElement for _ in range(elementCount)]

numElementConn = np.asarray(numElementConn,dtype='b')

# create list of all nodes
all_nodes_list = []
for i in range(elementCount):
    for j in range(maxNodePElement):
        all_nodes_list.append((grid_corner_lon[i,j],grid_corner_lat[i,j]))

num_all_nodes = len(all_nodes_list)
all_nodes = np.asarray(all_nodes_list)

if doTimer:
    stime = time.time()

# Identify unique nodes
dtr = np.pi/180
index_radius = 1e-6
all_unique_nodes = []
for i in range(num_all_nodes):
    if np.mod(i,1000) == 0:
        print('i in num_all_nodes ',i,num_all_nodes)

    dlon = all_nodes[i+1:,0] - all_nodes[i,0]
    dlat = all_nodes[i+1:,1] - all_nodes[i,1]
    dlon = dtr*dlon
    dlat = dtr*dlat
    dist = np.power(np.sin(dlat/2),2) + np.cos(dtr*all_nodes[i+1:,1]) * np.cos(dtr*all_nodes[i,1]) * np.power(np.sin(dlon/2),2)
    dist = np.arctan2( np.sqrt(dist), np.sqrt(1-dist))
    npoints = np.sum(np.where(dist < index_radius,1,0))
    # if no duplicate points found, add to list
    if npoints < 1:
        all_unique_nodes.append(all_nodes[i,:])

nodeCount = len(all_unique_nodes)
print('nodeCount,elementCount, maxNodePElement ', nodeCount,elementCount, maxNodePElement)
#stop

if doTimer:
    etime = time.time()
    print('Time to create all_unique_nodes: ', etime-stime)
    print('')

if doTimer:
    stime2 = time.time()

nodeCoords = np.zeros((nodeCount, coordDim))
for i in range(nodeCount):
    pt = all_unique_nodes[i]
    nodeCoords[i,:] = [pt[0],pt[1]]
    
# Create array describing connectivity (1-based indices)
elementConn  = np.zeros((elementCount, maxNodePElement),dtype='i')
elementConn[:,] = int(-1)
elementArea  = np.zeros((elementCount),dtype='d')
elementMask  = np.ones((elementCount),dtype='i')
centerCoords = np.zeros((elementCount, coordDim))
dtr = np.pi/180
for i in range(elementCount):
    centerCoords[i,:] = [grid_center_lon[i],grid_center_lat[i]]
    dlon = np.max(grid_corner_lon[i,:]) - np.min(grid_corner_lon[i,:])
    dlat = np.max(grid_corner_lat[i,:]) - np.min(grid_corner_lat[i,:])
    # convert to radians^2
    pArea = dlon*dlat*np.sin(dtr*grid_center_lat[i])*dtr*dtr
    elementArea[i] = pArea
    # record element as indices rather than points
    u_list = []
    for j in range(maxNodePElement):
        pt = [grid_corner_lon[i,j],grid_corner_lat[i,j]]

        dlon = nodeCoords[:,0] - grid_corner_lon[i,j]
        dlat = nodeCoords[:,1] - grid_corner_lat[i,j]
        dlon = dtr*dlon
        dlat = dtr*dlat
        dist = np.power(np.sin(dlat/2),2) + np.cos(dtr*nodeCoords[:,1]) * np.cos(dtr*grid_corner_lat[i,j]) * np.power(np.sin(dlon/2),2)
        dist = np.arctan2( np.sqrt(dist), np.sqrt(1-dist))
        u_list.append(np.argmin(dist)+1)

    # order of points should be counter-clockwise
    elementConn[i,:numElementConn[i]] = u_list
    #elementConn[i,:numElementConn[i]] = np.flip(u_list)

if doTimer:
    etime2 = time.time()
    print('Time to create elementConn: ', etime2-stime2)
    print('')

print('writing file: ', outfile, '\n')
#w = netcdf4.Dataset(outfile, 'w', format='NETCDF4')
w = netcdf4.Dataset(outfile, 'w', format='NETCDF3_64BIT')

w.creation_date = timetag 
w.source_file = infile
w.title = 'ESMF mesh file'
w.gridType = 'unstructured'
w.createDimension('nodeCount',int(nodeCount))
w.createDimension('elementCount',int(elementCount))
w.createDimension('maxNodePElement',int(maxNodePElement))
w.createDimension('coordDim',int(coordDim))

w_nodeCoords  = w.createVariable('nodeCoords','d',('nodeCount', 'coordDim'))
w_nodeCoords.units = 'degrees'

w_elementConn  = w.createVariable('elementConn','i',('elementCount', 'maxNodePElement'),fill_value=-1)
w_elementConn.long_name = 'Node Indices that define the element connectivity'
#w_elementConn._FillValue = -1

w_numElementConn = w.createVariable('numElementConn','b',('elementCount'))
w_numElementConn.long_name = 'Number of nodes per element'

w_centerCoords = w.createVariable('centerCoords','d',('elementCount', 'coordDim'))
w_centerCoords.units = 'degrees'

w_elementArea = w.createVariable('elementArea','d',('elementCount'))
w_elementArea.units = 'radians^2'
w_elementArea.long_name = 'area weights'

w_elementMask = w.createVariable('elementMask','i',('elementCount'),fill_value=-9999.)
#w_elementMask._FillValue = -9999.

# convert longitude to [0-360] if needed
tmp = nodeCoords[:,0]
tmp[tmp < 0] += 360
nodeCoords[:,0] = tmp
tmp = centerCoords[:,0]
tmp[tmp < 0] += 360
centerCoords[:,0] = tmp

# write to file  --------------------------------------------
w_nodeCoords[:,]   = nodeCoords
w_elementConn[:,]  = elementConn
w_numElementConn[:,] = numElementConn
w_centerCoords[:,] = centerCoords
w_elementArea[:,]  = elementArea
w_elementMask[:,]  = elementMask

w.close()

print(outfile,' created')

@billsacks
Copy link
Member

That's great – thanks @swensosc and @ekluzek ! From looking quickly through this, I have a few thoughts. Sorry, I feel like I'm jumping to some detailed critiques, but overall this seems like a fantastic thing to have! I just can't help myself from noticing some details like this when I read through code:

(1) Regarding unstructured grids: It looks like this assumes that the corners are equidistant from the center. My understanding is that that is not always the case for irregular grids. For example, see the note about corners here https://www.ncl.ucar.edu/Document/Functions/ESMF/curvilinear_to_SCRIP.shtml (which refers to HOMME grids specifically).

(2) What is the performance like? In particular, I see that there is an order N^2 loop over corners (i.e., a doubly nested loop over corners) to identify the unique corners, and I'm wondering how that performs on large grids. If the point of this script is just for small grids, this may be a non-issue, but it would be good to know. If having an O(N^2) loop is unavoidable but leads to too-bad performance in some situations, I'm wondering if this could be reworked a bit to use built-in numpy operations like the rounding-then-unique solution from https://stackoverflow.com/questions/5426908/find-unique-elements-of-floating-point-array-in-numpy-with-comparison-using-a-d.

(3) Should the tolerance for finding identical corners be based somehow on the typical grid cell size?

(4) Most importantly, it would be great if we could ask the ESMF group to provide something like this so that they are the ones to be responsible for maintaining it, ensuring it's robust in a variety of cases, etc. That way we don't have to be the ones to deal with things like (2) and (3).

@swensosc
Copy link
Contributor

swensosc commented Oct 22, 2021 via email

@billsacks
Copy link
Member

I should have clarified, too, that my comments were not really meant as criticisms of the script as is: for internal and possibly some external use, this seems like a FANTASTIC thing to have! It was more that I was mentally jumping a step ahead to, "Could this be our standard, supported mechanism for creating mesh files?", so was thinking about what might be needed to get to that point. My mental context was some discussions we've been having in CSEG about the process for mesh file generation, so I was thinking, "could this be a good general process?" And my main thought at this point is that this script could very possibly help in many cases with the general mesh file creation process, even if it needs some adjustments to make it robust in a wider variety of scenarios. Anyway, thank you for sharing this!

@mvertens @uturuncoglu - FYI - see the discussion above.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Oct 22, 2021

@swensosc shared with me another similar script to be used for unstructured grids. So our first use of this might have both scripts in place, and the user will have to figure out which one to use.

@mvertens
Copy link

@billsacks @ekluzek @swensosc - I think having the script that Sean targeted for regular grids would be very useful. However, I think we need to proceed very cautiously when we talk about unstructured grids and this requires input from a much larger group. Particularly as CAM already has a process for creating meshes for refined grids that could then be leveraged by CTSM. I believe that whatever unstructured mesh CTSM would require would also be required by CAM. When would this not be the case?

@billsacks
Copy link
Member

@mvertens and I talked about this issue this morning. We both feel that, in general, it makes sense to stick with our current, two-step process:

  1. Create a SCRIP grid file that defines the grid
  2. Convert the SCRIP grid file to an ESMF mesh file

The reason for splitting this is that step (1) is really grid-dependent, particularly in how you define the corners, so for example, there would be a different script used for CAM-SE grids than for the hydrological basin grids that @swensosc has been experimenting with. Step (2), on the other hand, is more generic, but also needs more software engineering to do it in a robust and performant way (e.g., my comments above), so it makes sense to have this be done by a generic tool maintained by the ESMF group. Then you can think of the SCRIP grid file as a relatively simple intermediate representation that is used to communicate between these two steps.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 3, 2021

@adrifoster and @jkshuman here's the issue about creating mesh files from domain files. There's a bunch of discussion here that talks about short as well as longer term desires. I think we need a plan that includes a short-term stop gap measure that allows people to do their regional science they need to do now, as well as steps for the longer term goal we want to make it towards.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 3, 2021

We are going to have a group meet next Wednesday afternoon to go over this and come up with a plan for a short term solution that will eventually merge into the longer term CESM wide solution.

@mvertens
Copy link

mvertens commented Dec 4, 2021 via email

@ekluzek ekluzek changed the title Need a process to create ESMF mesh files from domain files for regional grids Need a process to subset ESMF mesh files from global ones for regional grids Dec 9, 2021
@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 9, 2021

OK, we had a group of us meet to discuss this including: Ufuk, @mvertens @billsacks @negin513 @adrifoster, @swensosc @jkshuman. We decided the long-term CESM wide issue is creating mesh files that go with our surface and other datasets. A CTSM specific issue is being able to subset from those datasets (that we already need to ensure are compatible). @swensosc explained that he has a script that subsets from a mesh file. This is a robust process because the connectivity information and the corners are kept by just finding the indices of the center coordinates. This is actually more robust than using a domain file for example, because you don't have to define the corners. So the above script isn't the one that we would use, but a separate one that he developed that works on ESMF mesh files.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Dec 9, 2021

OK, here's the script from @swensosc that subsets the ESMF mesh file. This is the one that we want to bring in for use with this process.

# Open original mesh file
    f1 = netcdf4.Dataset(meshfile, 'r')

    elementCount   = len(f1.dimensions['elementCount'])
    elementConn    = f1.variables['elementConn'][:,]
    numElementConn = f1.variables['numElementConn'][:,]
    centerCoords   = f1.variables['centerCoords'][:,]
    nodeCount      = len(f1.dimensions['nodeCount'])
    nodeCoords     = f1.variables['nodeCoords'][:,]

    subsetElement = []
    cnt = 0
    for n in range(elementCount):
        endx = elementConn[n,:numElementConn[n]]
        endx[:,] -= 1# convert to zero based index
        nlon = nodeCoords[endx,0]
        nlat = nodeCoords[endx,1]

        l1 = np.logical_or(nlon <= ln1,nlon >= ln2)
        l2 = np.logical_or(nlat <= lt1,nlat >= lt2)
        if np.any(np.logical_or(l1,l2)):
            #print(nlon,nlat)
            pass
        else:
            subsetElement.append(n)
            cnt+=1

    subsetNode = []
    connDict = {}
    cnt = 1
    for n in range(nodeCount):
        nlon = nodeCoords[n,0]
        nlat = nodeCoords[n,1]

        l1 = np.logical_or(nlon <= ln1,nlon >= ln2)
        l2 = np.logical_or(nlat <= lt1,nlat >= lt2)
        if np.logical_or(l1,l2):
            connDict[n+1] = -9999
        else:
            subsetNode.append(n)
            connDict[n+1] = cnt
            cnt+=1

    print(np.min(nodeCoords[:,0]),np.max(nodeCoords[:,0]))
    print(np.min(nodeCoords[:,1]),np.max(nodeCoords[:,1]))
    # create new meshfile
    elementCount = len(subsetElement)
    nodeCount    = len(subsetNode)
    print('new elementCount, nodeCount ',elementCount, nodeCount)

    dimensions = f1.dimensions
    variables  = f1.variables
    global_attributes  = f1.ncattrs()

    #--  Open output file
    w = netcdf4.Dataset(meshfile2, 'w')

    #--  Set global attributes
    for ga in global_attributes:
        setattr(w,ga,f1.getncattr(ga))
    #--  Set dimensions of output file
    for dim in dimensions.keys():
        #print(dim)
        if dim == 'elementCount':
            w.createDimension(dim,elementCount)
        elif dim == 'nodeCount':
            w.createDimension(dim,nodeCount)
        else:
            w.createDimension(dim,len(dimensions[dim]))

    #--  Write variables
    for var in variables.keys():
        vdim   = f1.variables[var].dimensions
        vtype  = f1.variables[var].datatype
        vattrs = f1.variables[var].ncattrs()
        useFillValue = False
        if '_FillValue' in vattrs:
            useFillValue = True
            fill_value = f1.variables[var].getncattr('_FillValue')

        if verbose:
            print(var, vtype, vdim)

        mapVariable = False
        wvdim = copy.copy(vdim)

        if useFillValue:
            wvar = w.createVariable(var, vtype, wvdim, fill_value = fill_value)
        else:
            wvar = w.createVariable(var, vtype, wvdim)

        #--  Set attribute values
        for attname in vattrs:
            if attname == '_FillValue':
                pass
            else:
                if verbose:
                    print('attribute: ',attname,' value: ',f1.variables[var].getncattr(attname))
                w.variables[var].setncattr(attname,f1.variables[var].getncattr(attname))

        if 'nodeCount' in vdim:
            dindex = vdim.index('nodeCount')
            if dindex == 0:
                w.variables[var][:,] = f1.variables[var][subsetNode,]
            if dindex == 1:
                w.variables[var][:,] = f1.variables[var][:,subsetNode,]
            if dindex == 2:
                w.variables[var][:,] = f1.variables[var][:,:,subsetNode,]
        elif 'elementCount' in vdim:
            dindex = vdim.index('elementCount')
            if dindex == 0:
                w.variables[var][:,] = f1.variables[var][subsetElement,]
            if dindex == 1:
                w.variables[var][:,] = f1.variables[var][:,subsetElement,]
            if dindex == 2:
                w.variables[var][:,] = f1.variables[var][:,:,subsetElement,]
        else:
            w.variables[var][:,] = f1.variables[var][:,]

        if var == 'elementConn':
            for n in range(elementCount):
                for m in range(len(f1.dimensions['maxNodePElement'])):
                    if w.variables[var][n,m] != -1:
                        ndx = int(w.variables[var][n,m])
                        if connDict[ndx] < 0:
                            print('conndict ',connDict[ndx])
                        w.variables[var][n,m] = connDict[ndx]
    #--  Close output file
    w.close()
    f1.close()
    print(meshfile2,' created')

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 2, 2022

Polly and Sanjiv from the community are willing to be guinea pigs for this work on regional scripts.

@wwieder
Copy link
Contributor

wwieder commented Apr 21, 2022

This issue needs to be addressed before we end mct support for CTSMS.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Apr 21, 2022

In our software meeting we thought that @negin513 and @adrifoster should work together on this.

@wwieder
Copy link
Contributor

wwieder commented Apr 21, 2022

also likely a good idea to loop in @slevisconsulting at the outset for a discussion to see if any of the tools he's been working on are relevant / helpful.

@billsacks billsacks removed the nuopc label Apr 21, 2022
@slevis-lmwg
Copy link
Contributor

In the big-picture, the mesh_mask_modifier tool that I'm working on (#1677 ) relates to this issue in a similar way that the fsurdat_modifier tool relates to subset_data. I.e. neither mesh_mask_modifier nor fsurdat_modifier perform subsetting.

I'm happy to discuss more.

@wwieder
Copy link
Contributor

wwieder commented Jun 2, 2022

@adrifoster and @negin513 is this work getting close enough to merge #1735 and start working on documentation for running regional cases? @ckoven would like to try this out for a high resolution CA-FATES case.

@billsacks billsacks moved this from Todo ~ weeks to In Progress in CESM: infrastructure / cross-component SE priorities Jun 30, 2022
@slevis-lmwg
Copy link
Contributor

In stand-up meeting 2022/7/26 @jkshuman recommended that I share this post which contains a 2-line method for generating mesh files:

Generate SCRIP file from a lat/lon file:
ncks --rgr infer --rgr scrip=wrfinput_d01_scrip.nc wrfinput_d01_2d_lat_lon.nc wrfinput_d01_foo.nc
*foo.nc contains metadata and will not be used.
Generate mesh file from the SCRIP grid file:
/glade/u/apps/ch/opt/esmf-netcdf/8.0.0/intel/19.0.5/bin/bing/Linux.intel.64.mpiuni.default/ESMF_Scrip2Unstruct wrfinput_d01_scrip.nc wrfinput_d01_unstruct.nc 0

@ekluzek
Copy link
Collaborator Author

ekluzek commented Jan 9, 2023

#1892 handles this issue.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Apr 26, 2024

This is completed with #1892 it will autoclose when b4b-dev is next merged to master.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Nov 12, 2024

This was supposed to auto-close but didn't, so closing now.

@ekluzek ekluzek closed this as completed Nov 12, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done (or no longer holding things up) in CESM: infrastructure / cross-component SE priorities Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability
Projects
Status: Done (or no longer holding things up)
Development

Successfully merging a pull request may close this issue.

8 participants