-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a process to subset ESMF mesh files from global ones for regional grids #1513
Comments
This likely isn't the way we want to go long-term, but in case it's helpful, here is the process I inherited from Mariana to convert a scrip grid file to an esmf mesh file:
|
Here's a script that @swensosc created to create a mesh based on the surface dataset that was created. This bypasses domain files, and bypasses subsetting a SCRIP grid file, so is a good approach to take. It should work for unstructured grids as well since the surface dataset is just a 1D vector in that case. But, we should test it on unstructured grids as well. #! /usr/bin/env python
import sys
import string
import subprocess
import time
import argparse
import numpy as np
import netCDF4 as netcdf4
doTimer = False
# input file
infile = '/fs/cgd/csm/inputdata/lnd/clm2/surfdata_map/release-clm5.0.18/surfdata_0.9x1.25_hist_16pfts_Irrig_CMIP6_simyr2000_c190214.nc'
# output directory
dir_output='./'
# creation date
command='date "+%y%m%d"'
timetag=(subprocess.run(command,stdout=subprocess.PIPE,shell='True').stdout.decode('utf-8')).strip()
# output file
outfile = dir_output+'ESMFmesh_ctsm_0.9x1.25_'+timetag+'.nc'
#-- create mesh file from surface data file
# (other file types may have different coordinate variables)
f1 = netcdf4.Dataset(infile, 'r')
lon_vname = 'LONGXY'
lat_vname = 'LATIXY'
xc=np.asfarray(f1.variables[lon_vname][:,],np.float64)
yc=np.asfarray(f1.variables[lat_vname][:,],np.float64)
xc_units=f1.variables[lon_vname].units
xc_longname=f1.variables[lon_vname].long_name
yc_units=f1.variables[lat_vname].units
yc_longname=f1.variables[lat_vname].long_name
f1.close()
delx = np.abs(xc[0,1] - xc[0,0])
dely = np.abs(yc[1,0] - yc[0,0])
#-- convert coordinates to 1d
lon = np.asarray(xc[0,:])
lat = np.asarray(yc[:,0])
im,jm = lon.size,lat.size
#-- subset coordinates
tagnum = 1
if tagnum == 1: # NE.US
ln1=284.
ln2=296.
lt1=44.
lt2=53.
xind=np.where((lon >= ln1) & (lon <= ln2))[0]
yind=np.where((lat >= lt1) & (lat <= lt2))[0]
lonc=np.copy(xc) ; latc=np.copy(yc)
lonc=lonc[yind,:]; lonc=lonc[:,xind]
latc=latc[yind,:]; latc=latc[:,xind]
#-- Compute grid information
num_corners = 4
nj, ni = lonc.shape
grid_size = ni * nj
grid_center_lat = np.reshape(latc,grid_size)
grid_center_lon = np.reshape(lonc,grid_size)
latn = grid_center_lat+0.5*dely
lats = grid_center_lat-0.5*dely
lone = grid_center_lon+0.5*delx
lonw = grid_center_lon-0.5*delx
grid_corner_lat = np.empty((grid_size,num_corners))
grid_corner_lat[:,0]=latn
grid_corner_lat[:,1]=latn
grid_corner_lat[:,2]=lats
grid_corner_lat[:,3]=lats
grid_corner_lon = np.empty((grid_size,num_corners))
grid_corner_lon[:,0]=lone
grid_corner_lon[:,1]=lonw
grid_corner_lon[:,2]=lonw
grid_corner_lon[:,3]=lone
grid_imask = np.ones(grid_size, dtype = int)
grid_dims = np.array((ni,nj), dtype = int)
nm = grid_dims.shape[0]
#------------------------------------------
coordDim = 2
elementCount = grid_center_lat.size
maxNodePElement = grid_corner_lon.shape[1]
numElementConn = [maxNodePElement for _ in range(elementCount)]
numElementConn = np.asarray(numElementConn,dtype='b')
# create list of all nodes
all_nodes_list = []
for i in range(elementCount):
for j in range(maxNodePElement):
all_nodes_list.append((grid_corner_lon[i,j],grid_corner_lat[i,j]))
num_all_nodes = len(all_nodes_list)
all_nodes = np.asarray(all_nodes_list)
if doTimer:
stime = time.time()
# Identify unique nodes
dtr = np.pi/180
index_radius = 1e-6
all_unique_nodes = []
for i in range(num_all_nodes):
if np.mod(i,1000) == 0:
print('i in num_all_nodes ',i,num_all_nodes)
dlon = all_nodes[i+1:,0] - all_nodes[i,0]
dlat = all_nodes[i+1:,1] - all_nodes[i,1]
dlon = dtr*dlon
dlat = dtr*dlat
dist = np.power(np.sin(dlat/2),2) + np.cos(dtr*all_nodes[i+1:,1]) * np.cos(dtr*all_nodes[i,1]) * np.power(np.sin(dlon/2),2)
dist = np.arctan2( np.sqrt(dist), np.sqrt(1-dist))
npoints = np.sum(np.where(dist < index_radius,1,0))
# if no duplicate points found, add to list
if npoints < 1:
all_unique_nodes.append(all_nodes[i,:])
nodeCount = len(all_unique_nodes)
print('nodeCount,elementCount, maxNodePElement ', nodeCount,elementCount, maxNodePElement)
#stop
if doTimer:
etime = time.time()
print('Time to create all_unique_nodes: ', etime-stime)
print('')
if doTimer:
stime2 = time.time()
nodeCoords = np.zeros((nodeCount, coordDim))
for i in range(nodeCount):
pt = all_unique_nodes[i]
nodeCoords[i,:] = [pt[0],pt[1]]
# Create array describing connectivity (1-based indices)
elementConn = np.zeros((elementCount, maxNodePElement),dtype='i')
elementConn[:,] = int(-1)
elementArea = np.zeros((elementCount),dtype='d')
elementMask = np.ones((elementCount),dtype='i')
centerCoords = np.zeros((elementCount, coordDim))
dtr = np.pi/180
for i in range(elementCount):
centerCoords[i,:] = [grid_center_lon[i],grid_center_lat[i]]
dlon = np.max(grid_corner_lon[i,:]) - np.min(grid_corner_lon[i,:])
dlat = np.max(grid_corner_lat[i,:]) - np.min(grid_corner_lat[i,:])
# convert to radians^2
pArea = dlon*dlat*np.sin(dtr*grid_center_lat[i])*dtr*dtr
elementArea[i] = pArea
# record element as indices rather than points
u_list = []
for j in range(maxNodePElement):
pt = [grid_corner_lon[i,j],grid_corner_lat[i,j]]
dlon = nodeCoords[:,0] - grid_corner_lon[i,j]
dlat = nodeCoords[:,1] - grid_corner_lat[i,j]
dlon = dtr*dlon
dlat = dtr*dlat
dist = np.power(np.sin(dlat/2),2) + np.cos(dtr*nodeCoords[:,1]) * np.cos(dtr*grid_corner_lat[i,j]) * np.power(np.sin(dlon/2),2)
dist = np.arctan2( np.sqrt(dist), np.sqrt(1-dist))
u_list.append(np.argmin(dist)+1)
# order of points should be counter-clockwise
elementConn[i,:numElementConn[i]] = u_list
#elementConn[i,:numElementConn[i]] = np.flip(u_list)
if doTimer:
etime2 = time.time()
print('Time to create elementConn: ', etime2-stime2)
print('')
print('writing file: ', outfile, '\n')
#w = netcdf4.Dataset(outfile, 'w', format='NETCDF4')
w = netcdf4.Dataset(outfile, 'w', format='NETCDF3_64BIT')
w.creation_date = timetag
w.source_file = infile
w.title = 'ESMF mesh file'
w.gridType = 'unstructured'
w.createDimension('nodeCount',int(nodeCount))
w.createDimension('elementCount',int(elementCount))
w.createDimension('maxNodePElement',int(maxNodePElement))
w.createDimension('coordDim',int(coordDim))
w_nodeCoords = w.createVariable('nodeCoords','d',('nodeCount', 'coordDim'))
w_nodeCoords.units = 'degrees'
w_elementConn = w.createVariable('elementConn','i',('elementCount', 'maxNodePElement'),fill_value=-1)
w_elementConn.long_name = 'Node Indices that define the element connectivity'
#w_elementConn._FillValue = -1
w_numElementConn = w.createVariable('numElementConn','b',('elementCount'))
w_numElementConn.long_name = 'Number of nodes per element'
w_centerCoords = w.createVariable('centerCoords','d',('elementCount', 'coordDim'))
w_centerCoords.units = 'degrees'
w_elementArea = w.createVariable('elementArea','d',('elementCount'))
w_elementArea.units = 'radians^2'
w_elementArea.long_name = 'area weights'
w_elementMask = w.createVariable('elementMask','i',('elementCount'),fill_value=-9999.)
#w_elementMask._FillValue = -9999.
# convert longitude to [0-360] if needed
tmp = nodeCoords[:,0]
tmp[tmp < 0] += 360
nodeCoords[:,0] = tmp
tmp = centerCoords[:,0]
tmp[tmp < 0] += 360
centerCoords[:,0] = tmp
# write to file --------------------------------------------
w_nodeCoords[:,] = nodeCoords
w_elementConn[:,] = elementConn
w_numElementConn[:,] = numElementConn
w_centerCoords[:,] = centerCoords
w_elementArea[:,] = elementArea
w_elementMask[:,] = elementMask
w.close()
print(outfile,' created') |
That's great – thanks @swensosc and @ekluzek ! From looking quickly through this, I have a few thoughts. Sorry, I feel like I'm jumping to some detailed critiques, but overall this seems like a fantastic thing to have! I just can't help myself from noticing some details like this when I read through code: (1) Regarding unstructured grids: It looks like this assumes that the corners are equidistant from the center. My understanding is that that is not always the case for irregular grids. For example, see the note about corners here https://www.ncl.ucar.edu/Document/Functions/ESMF/curvilinear_to_SCRIP.shtml (which refers to HOMME grids specifically). (2) What is the performance like? In particular, I see that there is an order N^2 loop over corners (i.e., a doubly nested loop over corners) to identify the unique corners, and I'm wondering how that performs on large grids. If the point of this script is just for small grids, this may be a non-issue, but it would be good to know. If having an O(N^2) loop is unavoidable but leads to too-bad performance in some situations, I'm wondering if this could be reworked a bit to use built-in numpy operations like the rounding-then-unique solution from https://stackoverflow.com/questions/5426908/find-unique-elements-of-floating-point-array-in-numpy-with-comparison-using-a-d. (3) Should the tolerance for finding identical corners be based somehow on the typical grid cell size? (4) Most importantly, it would be great if we could ask the ESMF group to provide something like this so that they are the ones to be responsible for maintaining it, ensuring it's robust in a variety of cases, etc. That way we don't have to be the ones to deal with things like (2) and (3). |
valid criticism; this script is not optimized for speed, and is meant to be
used on standard ctsm rectilinear grids.
…On Fri, Oct 22, 2021 at 2:18 PM Bill Sacks ***@***.***> wrote:
That's great – thanks @swensosc <https://github.com/swensosc> and @ekluzek
<https://github.com/ekluzek> ! From looking quickly through this, I have
a few thoughts. Sorry, I feel like I'm jumping to some detailed critiques,
but overall this seems like a fantastic thing to have! I just can't help
myself from noticing some details like this when I read through code:
(1) Regarding unstructured grids: It looks like this assumes that the
corners are equidistant from the center. My understanding is that that is
not always the case for irregular grids. For example, see the note about
corners here
https://www.ncl.ucar.edu/Document/Functions/ESMF/curvilinear_to_SCRIP.shtml
(which refers to HOMME grids specifically).
(2) What is the performance like? In particular, I see that there is an
order N^2 loop over corners (i.e., a doubly nested loop over corners) to
identify the unique corners, and I'm wondering how that performs on large
grids. If the point of this script is just for small grids, this may be a
non-issue, but it would be good to know. If having an O(N^2) loop is
unavoidable but leads to too-bad performance in some situations, I'm
wondering if this could be reworked a bit to use built-in numpy operations
like the rounding-then-unique solution from
https://stackoverflow.com/questions/5426908/find-unique-elements-of-floating-point-array-in-numpy-with-comparison-using-a-d
.
(3) Should the tolerance for finding identical corners be based somehow on
the typical grid cell size?
(4) Most importantly, it would be great if we could ask the ESMF group to
provide something like this so that they are the ones to be responsible for
maintaining it, ensuring it's robust in a variety of cases, etc. That way
we don't have to be the ones to deal with things like (2) and (3).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1513 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGRN57FJGQ5JERE6WTYAQOLUIHBIFANCNFSM5FSEKMQQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I should have clarified, too, that my comments were not really meant as criticisms of the script as is: for internal and possibly some external use, this seems like a FANTASTIC thing to have! It was more that I was mentally jumping a step ahead to, "Could this be our standard, supported mechanism for creating mesh files?", so was thinking about what might be needed to get to that point. My mental context was some discussions we've been having in CSEG about the process for mesh file generation, so I was thinking, "could this be a good general process?" And my main thought at this point is that this script could very possibly help in many cases with the general mesh file creation process, even if it needs some adjustments to make it robust in a wider variety of scenarios. Anyway, thank you for sharing this! @mvertens @uturuncoglu - FYI - see the discussion above. |
@swensosc shared with me another similar script to be used for unstructured grids. So our first use of this might have both scripts in place, and the user will have to figure out which one to use. |
@billsacks @ekluzek @swensosc - I think having the script that Sean targeted for regular grids would be very useful. However, I think we need to proceed very cautiously when we talk about unstructured grids and this requires input from a much larger group. Particularly as CAM already has a process for creating meshes for refined grids that could then be leveraged by CTSM. I believe that whatever unstructured mesh CTSM would require would also be required by CAM. When would this not be the case? |
@mvertens and I talked about this issue this morning. We both feel that, in general, it makes sense to stick with our current, two-step process:
The reason for splitting this is that step (1) is really grid-dependent, particularly in how you define the corners, so for example, there would be a different script used for CAM-SE grids than for the hydrological basin grids that @swensosc has been experimenting with. Step (2), on the other hand, is more generic, but also needs more software engineering to do it in a robust and performant way (e.g., my comments above), so it makes sense to have this be done by a generic tool maintained by the ESMF group. Then you can think of the SCRIP grid file as a relatively simple intermediate representation that is used to communicate between these two steps. |
@adrifoster and @jkshuman here's the issue about creating mesh files from domain files. There's a bunch of discussion here that talks about short as well as longer term desires. I think we need a plan that includes a short-term stop gap measure that allows people to do their regional science they need to do now, as well as steps for the longer term goal we want to make it towards. |
We are going to have a group meet next Wednesday afternoon to go over this and come up with a plan for a short term solution that will eventually merge into the longer term CESM wide solution. |
Ufuk already has a utility for this - so it would be very helpful to
include him in this email chain.
…On Fri, Dec 3, 2021 at 4:58 PM Erik Kluzek ***@***.***> wrote:
We are going to have a group meet next Wednesday afternoon to go over this
and come up with a plan for a short term solution that will eventually
merge into the longer term CESM wide solution.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1513 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4XCE34W44XH5DB36XFXYDUPFKQTANCNFSM5FSEKMQQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Dr. Mariana Vertenstein
CESM Software Engineering Group Head
National Center for Atmospheric Research
Boulder, Colorado
Office 303-497-1349
Email: ***@***.***
|
OK, we had a group of us meet to discuss this including: Ufuk, @mvertens @billsacks @negin513 @adrifoster, @swensosc @jkshuman. We decided the long-term CESM wide issue is creating mesh files that go with our surface and other datasets. A CTSM specific issue is being able to subset from those datasets (that we already need to ensure are compatible). @swensosc explained that he has a script that subsets from a mesh file. This is a robust process because the connectivity information and the corners are kept by just finding the indices of the center coordinates. This is actually more robust than using a domain file for example, because you don't have to define the corners. So the above script isn't the one that we would use, but a separate one that he developed that works on ESMF mesh files. |
OK, here's the script from @swensosc that subsets the ESMF mesh file. This is the one that we want to bring in for use with this process. # Open original mesh file
f1 = netcdf4.Dataset(meshfile, 'r')
elementCount = len(f1.dimensions['elementCount'])
elementConn = f1.variables['elementConn'][:,]
numElementConn = f1.variables['numElementConn'][:,]
centerCoords = f1.variables['centerCoords'][:,]
nodeCount = len(f1.dimensions['nodeCount'])
nodeCoords = f1.variables['nodeCoords'][:,]
subsetElement = []
cnt = 0
for n in range(elementCount):
endx = elementConn[n,:numElementConn[n]]
endx[:,] -= 1# convert to zero based index
nlon = nodeCoords[endx,0]
nlat = nodeCoords[endx,1]
l1 = np.logical_or(nlon <= ln1,nlon >= ln2)
l2 = np.logical_or(nlat <= lt1,nlat >= lt2)
if np.any(np.logical_or(l1,l2)):
#print(nlon,nlat)
pass
else:
subsetElement.append(n)
cnt+=1
subsetNode = []
connDict = {}
cnt = 1
for n in range(nodeCount):
nlon = nodeCoords[n,0]
nlat = nodeCoords[n,1]
l1 = np.logical_or(nlon <= ln1,nlon >= ln2)
l2 = np.logical_or(nlat <= lt1,nlat >= lt2)
if np.logical_or(l1,l2):
connDict[n+1] = -9999
else:
subsetNode.append(n)
connDict[n+1] = cnt
cnt+=1
print(np.min(nodeCoords[:,0]),np.max(nodeCoords[:,0]))
print(np.min(nodeCoords[:,1]),np.max(nodeCoords[:,1]))
# create new meshfile
elementCount = len(subsetElement)
nodeCount = len(subsetNode)
print('new elementCount, nodeCount ',elementCount, nodeCount)
dimensions = f1.dimensions
variables = f1.variables
global_attributes = f1.ncattrs()
#-- Open output file
w = netcdf4.Dataset(meshfile2, 'w')
#-- Set global attributes
for ga in global_attributes:
setattr(w,ga,f1.getncattr(ga))
#-- Set dimensions of output file
for dim in dimensions.keys():
#print(dim)
if dim == 'elementCount':
w.createDimension(dim,elementCount)
elif dim == 'nodeCount':
w.createDimension(dim,nodeCount)
else:
w.createDimension(dim,len(dimensions[dim]))
#-- Write variables
for var in variables.keys():
vdim = f1.variables[var].dimensions
vtype = f1.variables[var].datatype
vattrs = f1.variables[var].ncattrs()
useFillValue = False
if '_FillValue' in vattrs:
useFillValue = True
fill_value = f1.variables[var].getncattr('_FillValue')
if verbose:
print(var, vtype, vdim)
mapVariable = False
wvdim = copy.copy(vdim)
if useFillValue:
wvar = w.createVariable(var, vtype, wvdim, fill_value = fill_value)
else:
wvar = w.createVariable(var, vtype, wvdim)
#-- Set attribute values
for attname in vattrs:
if attname == '_FillValue':
pass
else:
if verbose:
print('attribute: ',attname,' value: ',f1.variables[var].getncattr(attname))
w.variables[var].setncattr(attname,f1.variables[var].getncattr(attname))
if 'nodeCount' in vdim:
dindex = vdim.index('nodeCount')
if dindex == 0:
w.variables[var][:,] = f1.variables[var][subsetNode,]
if dindex == 1:
w.variables[var][:,] = f1.variables[var][:,subsetNode,]
if dindex == 2:
w.variables[var][:,] = f1.variables[var][:,:,subsetNode,]
elif 'elementCount' in vdim:
dindex = vdim.index('elementCount')
if dindex == 0:
w.variables[var][:,] = f1.variables[var][subsetElement,]
if dindex == 1:
w.variables[var][:,] = f1.variables[var][:,subsetElement,]
if dindex == 2:
w.variables[var][:,] = f1.variables[var][:,:,subsetElement,]
else:
w.variables[var][:,] = f1.variables[var][:,]
if var == 'elementConn':
for n in range(elementCount):
for m in range(len(f1.dimensions['maxNodePElement'])):
if w.variables[var][n,m] != -1:
ndx = int(w.variables[var][n,m])
if connDict[ndx] < 0:
print('conndict ',connDict[ndx])
w.variables[var][n,m] = connDict[ndx]
#-- Close output file
w.close()
f1.close()
print(meshfile2,' created') |
Polly and Sanjiv from the community are willing to be guinea pigs for this work on regional scripts. |
This issue needs to be addressed before we end mct support for CTSMS. |
In our software meeting we thought that @negin513 and @adrifoster should work together on this. |
also likely a good idea to loop in @slevisconsulting at the outset for a discussion to see if any of the tools he's been working on are relevant / helpful. |
In the big-picture, the mesh_mask_modifier tool that I'm working on (#1677 ) relates to this issue in a similar way that the fsurdat_modifier tool relates to subset_data. I.e. neither mesh_mask_modifier nor fsurdat_modifier perform subsetting. I'm happy to discuss more. |
@adrifoster and @negin513 is this work getting close enough to merge #1735 and start working on documentation for running regional cases? @ckoven would like to try this out for a high resolution CA-FATES case. |
In stand-up meeting 2022/7/26 @jkshuman recommended that I share this post which contains a 2-line method for generating mesh files: Generate SCRIP file from a lat/lon file: |
#1892 handles this issue. |
This is completed with #1892 it will autoclose when b4b-dev is next merged to master. |
This was supposed to auto-close but didn't, so closing now. |
With the update to NUOPC we'll need a way for users to create ESMF mesh files from domain files (or from their grid) for any regional grids they create. This needs to be done for regional cases for CTSM, but also when using LILAC to run WRF with CTSM.
The current subset_data process creates domain files as well as surface datasets for regional areas. We'll need some steps to convert the domain file into an ESMF mesh. Or possibly we should directly create an ESMF mesh file, or maybe the global ESMF mesh file should also be subset (but because ESMF mesh files are 1D vector files and not 2D, this may be difficult).
This isn't a problem with single point simulations as there we provide the latitude and longitude with PTS_LAT and PTS_LON so a mesh file isn't used. But, it is a problem for regional grids of greater than a single point.
The text was updated successfully, but these errors were encountered: