Refactoring mksurfdata_esmf tool to support user-defined urban surface property dataset #2807

Charlotte1891 · 2024-10-02T16:17:06Z

Charlotte1891
Oct 2, 2024

Background and motivation

The current urban surface property dataset (Jackson et al., 2010; Oleson and Feddema, 2020) used in CLMU clusters the global urban areas into 33 distinct regions of similar climates, socio-economic characteristics, and architectural practices, with properties defined within each region for up to four urban density classes: low density (LD), medium density (MD), high density (HD), and tall building district (TBD). The dataset then prescribes uniform surface properties to each density type within a region. These simplistic, coarse-grained, and region-based urban property constraints impede its application in resolving the true heterogeneity of cities and their interactions with background climate, especially relevant for high-resolution urban climate modeling. To address this long-standing urban representation challenge at large scales and to facilitate next-generation kilometer-scale (k-scale) urban-resolving Earth system modeling, we develop a first-of-its-kind global high-resolution (1 km) urban surface property dataset, namely U-Surf, to support urban climate modeling across scales. Using the urban canopy model (UCM) in CTSM as a base model for developing dataset requirements, U-Surf leverages the latest advances in remote sensing, machine learning, and cloud computing to provide the most relevant urban surface biophysical parameters, including radiative, morphological, and thermal properties, for UCMs at the facet- and canopy-level. The descriptive preprint is available at https://essd.copernicus.org/preprints/essd-2024-416/.

Functions of the new tool

Incorporating U-Surf, or more broadly, any user-supplied local/regional high-resolution urban surface property dataset, into the surface dataset generation process will require significant refactoring of the current mksurfdata_esmf tool. At least two major functions need to be added while ensuring the tool retains the capability to generate surfdata at user-defined resolutions.

1. Support for user-defined spatially-continuous urban surface property dataset

The current method for reading in urban surface property dataset involves iterating through each density class, calling lookup_2d_netcdf for each class and filling the corresponding slice of the data array (mkurbanparMod.F90). However, when working with spatially continuous datasets (even if not at high resolution), the use of lookup table becomes unnecessary. Instead, each gridcell should be directly assigned property values from the original input, ensuring that the data reflects the spatial granularity without interpolation through density classes.

2. Support for high-resolution urban fraction data for high-resolution simulations

Currently, the highest spatial resolution that can be used as urban input (mksrf_furban) for the mksurfdata_esmf tool is 0.05 degree. That is to say, for simulations conducted at finer resolutions (e.g., 1 km), the urban fraction data (PCT_URBAN) at 1 km resolution is derived by downscaling the 0.05 deg data, which in turn, has been aggregated from original 1 km data. This would introduce additional bias and uncertainty due to double interpolation. To minimize the information loss, it is necessary to allow the tool to directly digest high-resolution urban fraction data without prior aggregation.

Technical challenges

1. Computational cost

Shifting from a lookup table approach to directly ingesting high-resolution, spatially continuous datasets may significantly increase the computational cost of generating surface datasets. Additionally, efficiently mapping grid-level properties across large domains is also challenging, making it crucial to optimize the workflow for scalability with larger datasets. However we think leveraging parallelization on Derecho, since regions can be processed independently, could potentially help reduce the computational burden.

2. Memory issue

Directly ingesting high-resolution input, rather than aggregating it to 0.05 deg beforehand, may also lead to potential memory issues. For example, if the urban properties input mksrf_furban is at 1km resolution, it would require approximately 25 (5x5) times more storage during processing.

We would like to hear everyone’s input or feedback on these proposed changes, particularly if there are additional factors that you believe should be considered. Any thoughts on how best to approach these issues, or suggestions for alternative solutions, would be greatly appreciated!

ekluzek · 2024-10-02T21:14:29Z

ekluzek
Oct 2, 2024
Maintainer

Pinging @olyson and @wwieder

0 replies

ekluzek · 2024-10-02T21:21:57Z

ekluzek
Oct 2, 2024
Maintainer

I want to note an issue that's the opposite of this one with the goal mainly to save disk space on surface datasets by leaning into the lookup table for urban data. #633

This is about bringing in higher resolution datasets which would give better data for gridcells. Since the data is already at the model resolution for surface datasets, it doesn't increase data size. But, it would have better data and wouldn't be storing smooth data at high resolution which is a waste of disk.

0 replies

olyson · 2024-10-10T23:05:20Z

olyson
Oct 10, 2024
Collaborator

Thanks for this @Charlotte1891 .
We did have problems using mksurfdata_map with a 1km input urban dataset and that's why we went with a 0.05deg file, but maybe the mksurfdata_esmf tool could handle 1km input? I wonder if it would be useful to test a 1km input dataset, at least one with PCT_URBAN dimensioned as such. One could be constructed by combining the lat/lon dimensioned variables in the original 1km urban file:

/glade/campaign/cgd/tss/people/oleson/urban_sfcdata/Feddema_urban_data_080410/HIGH_RES_VERSION2/URBAN_PROPERTIES_THESIS_TOOL/JUN_22_2018_Release/urban_properties_data.1km.191226-201120.nc

with the (region, density_class) dimensioned variables in the new urban input files, e.g.,

/glade/campaign/cesm/cesmdata/inputdata/lnd/clm2/rawdata/gao_oneill_urban/historical/urban_properties_GaoOneil_05deg_ThreeClass_1850_cdf5_c20220910.nc

The URBAN_DENSITY_CLASS variable in the original 1km file would need to be converted to PCT_URBAN and "byte" would need to be changed to "int" for some variables, to be compatible with the new tool.
One would also need a new grid file, similar to the urban one currently being used but at 1km:

/glade/campaign/cesm/cesmdata/inputdata/lnd/clm2/mappingdata/grids/UNSTRUCTgrid_3x3min_nomask_cdf5_c200129.nc

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring mksurfdata_esmf tool to support user-defined urban surface property dataset #2807

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Refactoring mksurfdata_esmf tool to support user-defined urban surface property dataset #2807

Charlotte1891 Oct 2, 2024

Replies: 3 comments

ekluzek Oct 2, 2024 Maintainer

ekluzek Oct 2, 2024 Maintainer

olyson Oct 10, 2024 Collaborator

Charlotte1891
Oct 2, 2024

ekluzek
Oct 2, 2024
Maintainer

ekluzek
Oct 2, 2024
Maintainer

olyson
Oct 10, 2024
Collaborator