Refactoring mksurfdata_esmf tool to support user-defined urban surface property dataset #2807
Replies: 3 comments
-
I want to note an issue that's the opposite of this one with the goal mainly to save disk space on surface datasets by leaning into the lookup table for urban data. #633 This is about bringing in higher resolution datasets which would give better data for gridcells. Since the data is already at the model resolution for surface datasets, it doesn't increase data size. But, it would have better data and wouldn't be storing smooth data at high resolution which is a waste of disk. |
Beta Was this translation helpful? Give feedback.
-
Thanks for this @Charlotte1891 . /glade/campaign/cgd/tss/people/oleson/urban_sfcdata/Feddema_urban_data_080410/HIGH_RES_VERSION2/URBAN_PROPERTIES_THESIS_TOOL/JUN_22_2018_Release/urban_properties_data.1km.191226-201120.nc with the (region, density_class) dimensioned variables in the new urban input files, e.g., /glade/campaign/cesm/cesmdata/inputdata/lnd/clm2/rawdata/gao_oneill_urban/historical/urban_properties_GaoOneil_05deg_ThreeClass_1850_cdf5_c20220910.nc The URBAN_DENSITY_CLASS variable in the original 1km file would need to be converted to PCT_URBAN and "byte" would need to be changed to "int" for some variables, to be compatible with the new tool. /glade/campaign/cesm/cesmdata/inputdata/lnd/clm2/mappingdata/grids/UNSTRUCTgrid_3x3min_nomask_cdf5_c200129.nc |
Beta Was this translation helpful? Give feedback.
-
The current urban surface property dataset (Jackson et al., 2010; Oleson and Feddema, 2020) used in CLMU clusters the global urban areas into 33 distinct regions of similar climates, socio-economic characteristics, and architectural practices, with properties defined within each region for up to four urban density classes: low density (LD), medium density (MD), high density (HD), and tall building district (TBD). The dataset then prescribes uniform surface properties to each density type within a region. These simplistic, coarse-grained, and region-based urban property constraints impede its application in resolving the true heterogeneity of cities and their interactions with background climate, especially relevant for high-resolution urban climate modeling. To address this long-standing urban representation challenge at large scales and to facilitate next-generation kilometer-scale (k-scale) urban-resolving Earth system modeling, we develop a first-of-its-kind global high-resolution (1 km) urban surface property dataset, namely U-Surf, to support urban climate modeling across scales. Using the urban canopy model (UCM) in CTSM as a base model for developing dataset requirements, U-Surf leverages the latest advances in remote sensing, machine learning, and cloud computing to provide the most relevant urban surface biophysical parameters, including radiative, morphological, and thermal properties, for UCMs at the facet- and canopy-level. The descriptive preprint is available at https://essd.copernicus.org/preprints/essd-2024-416/.
Incorporating U-Surf, or more broadly, any user-supplied local/regional high-resolution urban surface property dataset, into the surface dataset generation process will require significant refactoring of the current
mksurfdata_esmf
tool. At least two major functions need to be added while ensuring the tool retains the capability to generatesurfdata
at user-defined resolutions.1. Support for user-defined spatially-continuous urban surface property dataset
The current method for reading in urban surface property dataset involves iterating through each density class, calling
lookup_2d_netcdf
for each class and filling the corresponding slice of the data array (mkurbanparMod.F90
). However, when working with spatially continuous datasets (even if not at high resolution), the use of lookup table becomes unnecessary. Instead, each gridcell should be directly assigned property values from the original input, ensuring that the data reflects the spatial granularity without interpolation through density classes.2. Support for high-resolution urban fraction data for high-resolution simulations
Currently, the highest spatial resolution that can be used as urban input (
mksrf_furban
) for the mksurfdata_esmf tool is 0.05 degree. That is to say, for simulations conducted at finer resolutions (e.g., 1 km), the urban fraction data (PCT_URBAN
) at 1 km resolution is derived by downscaling the 0.05 deg data, which in turn, has been aggregated from original 1 km data. This would introduce additional bias and uncertainty due to double interpolation. To minimize the information loss, it is necessary to allow the tool to directly digest high-resolution urban fraction data without prior aggregation.1. Computational cost
Shifting from a lookup table approach to directly ingesting high-resolution, spatially continuous datasets may significantly increase the computational cost of generating surface datasets. Additionally, efficiently mapping grid-level properties across large domains is also challenging, making it crucial to optimize the workflow for scalability with larger datasets. However we think leveraging parallelization on Derecho, since regions can be processed independently, could potentially help reduce the computational burden.
2. Memory issue
Directly ingesting high-resolution input, rather than aggregating it to 0.05 deg beforehand, may also lead to potential memory issues. For example, if the urban properties input
mksrf_furban
is at 1km resolution, it would require approximately 25 (5x5) times more storage during processing.We would like to hear everyone’s input or feedback on these proposed changes, particularly if there are additional factors that you believe should be considered. Any thoughts on how best to approach these issues, or suggestions for alternative solutions, would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions