Rework CTSM so that parameters can come in easily with different dimensionality: scalar, by-PFT, by gridcell, by-gridcell-and-PFT #2395

ekluzek · 2024-02-29T16:43:03Z

ekluzek
Feb 29, 2024
Maintainer

An idea from @dlawrenncar from the LMWG meeting today, is to be able to easily try an experiment where a parameter comes in with a different dimensionality than is hardcoded in the parameter file. And specifically to allow a parameter to come in as a map by gridcell. With that it seems like you might as well allow it by gridcell and PFT as well. This would be something that would be experimental and just allow you to easily try different dimensionalities for Perturbed Parameter Ensemble (PPE) type experiments. Or just for work with specific parameters. With experimentation the default parameter might be tuned to work with a specific dimensionality, and kept at the level for future default work.

ekluzek · 2024-02-29T16:48:21Z

ekluzek
Feb 29, 2024
Maintainer Author

@katiedagon @djk2120 @adrifoster and others working in the PPE space, how useful does this all sound for your work? Especially my addition I added above where I go beyond @dlawrenncar suggestion so that the map is dimensioned by PFT? I also generalized his request so that dimensionality could be easily moved between any dimensionality (including scalar and by PFT)

0 replies

adrifoster · 2024-02-29T16:52:29Z

adrifoster
Feb 29, 2024
Maintainer

This would be very useful! But in terms of time scale of usage I'm not sure . For my part I think it might be a future need rather than a short-term one. I will let others chime in on when they might be interested in using this

2 replies

ekluzek Feb 29, 2024
Maintainer Author

This would definitely have to be post CESM3. But, it would be good to gauge how soon after CESM3 it would be useful to bring in.

linniahawkins Mar 5, 2024
Collaborator

Echoing Adrianna, the PPE posse (thanks Rosie) is definitely interested in this capability. It would open up alternative calibration methods (e.g., CARDAMOM) and enable new science questions (what can we learn from emergent gradients of 'optimal' parameter settings?). Matt Williams gave a presentation at the ILMF parameter estimation webinar that highlights some of the scientific value of this capability. Recording here.

ekluzek · 2024-02-29T17:18:41Z

ekluzek
Feb 29, 2024
Maintainer Author

OK, looking at how parameters are handled now, I think there would need to be some redesign brought in to make this easy and flexible.

There's paramUtilMod.F90 that is sometimes used to read in parameters of different dimensions. It recognizes the dimension level, and can read in scalar, 1D, or 2D level variables. Only FATES uses it for beyond scalar's. And it's not used consistently in the code.

readParams calls all the subroutines that reads in parameters for different physics packages. Each package has their list of variables and dimensionality explicitly defined as native FORTRAN types.

1 reply

ekluzek Feb 29, 2024
Maintainer Author

Using paramUtilMod consistently in the code seems to me as a good first step. Another part of that would be to have a parameter type that can be of different dimensionality and the user can choose at run time what that dimensionality of each variable would be.

ekluzek · 2024-02-29T17:24:56Z

ekluzek
Feb 29, 2024
Maintainer Author

One way to do this would be to have four different parameter files. One for pure scalars, one for PFT, a streams file by gridcell, a 3D streams file by gridcell and PFT. The user then moves variables to the other file to change the dimension.

I suspect this might NOT be liked by scientists though, so maybe there is one file and you change the dimension of the variables on the file? Going from 1D to 2D seems like a big shift to me though. So maybe there has to be two one for scalar and by-PFT, and another for stream files of either 2D or 3D?

Thoughts on this idea?

0 replies

ekluzek · 2024-02-29T17:31:24Z

ekluzek
Feb 29, 2024
Maintainer Author

I'm thinking the way to go about this would be to figure out a design to work towards, and then take small refactoring steps toward the full design. The first steps wouldn't add new functionality and only do some refactoring to make future steps easier. Later steps would start to allow flexibility between scalar and by-PFT. Then an extension for 2D and 3D maps on the model resolution. And then an extension for 2D streams would be a later step, with the possibility of adding 3D streams later.

I'm thinking we'd use OO classes to have a Base parameter class, a scalar class, a by-PFT class, a 2D fixed resolution class, a 3D fixed resolution class, a 2D stream class, and then a 3D stream class. There would need to be some infrastructure for it to decide what dimensionality each parameter actually is. I think it would just read in the different files and use the dimension from the file it's on, and only fail if it can't find a parameter on any of the files.

11 replies

glemieux Mar 2, 2024
Collaborator

Another clarifying question @ekluzek: is there any particular benefit (aside from code reuse, which to be clear is important) in CTSM reading in netcdf file formats? I'm assuming there is a performance and portability benefit over reading in human-readable formats.

I ask as I'm wondering if there is a case to be made for facilitating @rgknox suggestion through the development of intermediate tooling that could be both user-facing as well as callable during the build process? Perhaps it could align with work towards #2126 (and maybe even #585)? But maybe this my question is getting out of scope of this issue.

ekluzek Mar 2, 2024
Maintainer Author

This is a good question and something I was going to ask as well. Way back we decided at one point to move ALL scalar parameters to the namelist, and leave the netCDF parameter file for parameters by PFT. But, the feedback from scientists at the time was that is was easier to have everything on one netCDF file. So then we made efforts to move scalar parameters back to the netCDF file. So now nearly everything is on the same netCDF file (both scalar and by-PFT). We also thought that with 78 PFT's it was better to have this in netCDF over having long lists in for example namelist format.

One thing we thought of in the context of #585 that you point out above is that we thought we'd have an ASCII interface for users to modify the values in the parameter file. So it would be something like a user_nl_clm_params file that works similarly to user_nl_clm for example. We'd probably have it in a more advanced config file format (YAML, paramGen, JSON, XML etc.). Then CTSM would take that file and modify the parameter file accordingly and write it out in netCDF for you. And we'd probably have a mode where you could use the user defined netCDF parameter file as well. This might be getting at what you saying above @glemieux and @rgknox.

But, yes @adrifoster @linniahawkins @katiedagon @djk2120 and anyone else -- is it still desirable to have ALL parameters on the same netCDF file? And do you have a preferred format for parameters?

ekluzek Mar 2, 2024
Maintainer Author

@billsacks has a comment in the above issue that outlines some of the thinking we had about this.

#585 (comment)

@billsacks when you have a chance if you could chime in one what you remember about some of this that would be great.

rosiealice Mar 2, 2024
Collaborator

All parameters in the same format is v. much more useful than having them spread across different formats for the (pleasingly) expanding PPE posse.

78 parameter long lists seem to store neatly in netcdf but maybe there are other formats where it wouldn't be super awful? Idk...

Having the means to easily script modifications to the parameter files, e.g. using python is quite critical. Can one do that with xml?

I find that having the default parameter file as part of the fates codebase (admittedly as a .cdl) is VERY very useful from a development and communication perspective.

Unsure where the Venn diagram intersects here, but that's my feedback anyway. :)

glemieux Mar 2, 2024
Collaborator

@rosiealice Ryan has facilitated this for the fates parameters with the UpdateParamAPI python script with the use of the xml "patch" files (example).

@ekluzek yep, that's what I was thinking. Thanks for the fleshing out background info on this subject!

dlawrenncar · 2024-03-01T00:21:21Z

dlawrenncar
Mar 1, 2024
Collaborator

I think the hydrology community would be very interested in this. Not having this capability is actually deterring use by some on the hydrology community, as far as I understand. And, we saw the example this week about how it could likely be possibly useful for the crop model, essentially accounting for different crop varieties. Dave

…

On Thu, Feb 29, 2024 at 10:34 AM Erik Kluzek ***@***.***> wrote: @adrifoster <https://github.com/adrifoster> @samsrabin <https://github.com/samsrabin> @slevis-lmwg <https://github.com/slevis-lmwg> @glemieux <https://github.com/glemieux> and @rgknox <https://github.com/rgknox> do you have thoughts on this as it's the more SE oriented issue. — Reply to this email directly, view it on GitHub <#2395 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFABYVEQJP42LUDX6TOOAZDYV5TBHAVCNFSM6AAAAABEAI4CXWVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DMMZUGAYDA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

3 replies

rosiealice Mar 1, 2024
Collaborator

Just a +1 from me on what Dave said. I have had several conversations in the not so distant past with people interested in this type of thing. (Mortality rates, leaf reflectance, whatever). Though it further muddies the water on what a parameter is etc. (feel sure there is a musical context I could put that in, but I can't quite grasp it...)

ekluzek Mar 1, 2024
Maintainer Author

@adrifoster pointed out to me that the same species of an evergreen tree lives in Mexico and Alaska, but acts a lot different. So the parameters should really be different for it by region. I'll let her add to this but that makes sense to me for a reason for this to happen.

adrifoster Mar 5, 2024
Maintainer

not the same species of tree, but the same PFT! And the "same" species of tree ( specifically I was talking about AK birch vs. paper birch) lives in Alaska (AK birch) and also across Canada (paper birch). We can make those different species (they are really sub-species) but where do we draw the line between the two ranges? At the Alaska-Canada border? I have never heard of trees obeying border control.

So essentially allowing for geographical variance of parameters would allow us to actually simulate a gradation between AK birch and paper birch.

samsrabin · 2024-03-01T16:55:24Z

samsrabin
Mar 1, 2024
Maintainer

@ekluzek I think what you wrote in terms of incremental development makes a lot of sense, as does extending paramUtilMod. As @rosiealice mentioned, it's starting to blur the lines between what's a parameter and what's just a regular model input. I could imagine a "parameter" like crop maturity requirements that's 4D: latitude, longitude, CFT, and time. I don't think that's necessarily a bad thing, though.

One thing I'd like us to shoot for is to avoid having to write much new code whenever a new parameter is added. The more we can make this automatic, the better.

1 reply

ekluzek Mar 1, 2024
Maintainer Author

One thing I'd like us to shoot for is to avoid having to write much new code whenever a new parameter is added. The more we can make this automatic, the better.

Exactly that's one of the important design requirements we want to do as well as we can. Part of my idea of having a parameter type that has different classes, is so the dimensionality of the parameter is NOT known to the code outside of the parameter class. That way the parameter can be defined with any dimensionality by the user based on the input files, and the code just does the right thing based on what you give it.

Another design consideration though is going to make sure this doesn't make the code dog slow. Depending on how we design the accessors to the parameter data there could be a slow down. But, I also think that some slow down might be worth it for the flexibility it gives scientists in experimentation and using ML in fitting parameters.

samsrabin · 2024-03-02T17:26:39Z

samsrabin
Mar 2, 2024
Maintainer

This might be a separate discussion, but it's been nagging at me and seems relevant here.

Lots of crop development ideas (e.g., having one patch in a corn/soy rotation, or ) either require (a) rethinking how we handle about CFTs/patches or (b) making new CFTs. The latter is more realistic given engineering resource constraints, but it's currently a pain to make new CFTs. Each requires a new integer identifier like nirrig_rice to be defined and used in various places. It's also error-prone and annoying on principle to have to mess around with code just because you want to add a new CFT.

We should make this a lot more flexible. PFTs should be defined—not just parameterized, but defined—in model inputs, not the code itself.

This would require changing everything that uses PFT identifiers (e.g., nirrig_rice, nbrdlf_evr_shrub, etc.) to instead use new PFT-specific parameters. However, in addition to easing the introduction of new PFTs, it would make the code a lot safer—resolving this long-standing issue for instance.

2 replies

rosiealice Mar 2, 2024
Collaborator

@samsrabin noting that lots of this would be easier in. FATES, which has neither of the issues (patch/PFT correspondence being definitive, nor inflexible pft definitions. You are right it is a bigger/different discussion though!

ekluzek Mar 2, 2024
Maintainer Author

Yep, you are definitely correct here @samsrabin. That is totally something that would really help to work on. A nice thing about FATES for example is that the FATES parameter file does define your PFT's so it's easy to change them. The CTSM parameter file also has an assumed order that trees are first, shrubs next, grasses next, and then crops. I think the code will abort if you get this wrong, but you have to create parameter files with that ordering. And that means you can't do the simple thing of adding a new PFT at the end of the file for example (unless it's a crop). So it does really limit the science that can be done in CTSM. And it's both a bad software design as well as limiting the science, so bad from all angles.

Basically you are advocating for a data driven approach rather than a code driven approach which is far easier to understand as you point out. It's easier for everyone to understand what the code is doing by just looking at the input files.

In terms of this initial idea and in terms of software I would call this a separate discussion. But, I could also see where this effort could grow into that space. With the idea that you don't know what dimensionality a parameter is -- you have to access it via an accessor method. So you have to pass in the indices you want. Initially you'd have to pass in those PFT identifiers (just because that's what the current code looks like). But, eventually that parameter reading class could take over what pftconMod.F90 does and have ways of querying the parameter object to do everything in pftconMod (is_tree, is_shrub, etc.). I think that makes sense as a way to go. And we might as well think about the initial design to at least assess if that makes sense as something to do eventually. So there is a reason to talk about this here.

andywood · 2024-07-26T14:26:52Z

andywood
Jul 26, 2024

Hi, I just discovered this thread (thanks Erik). We hydrologists have indeed discussed upgrading parameter-handling in CTSM, as calibrating models (aka 'history matching' and 'iterative updating' in ESM world) is part of our DNA, going back to the 1960s.

There is an effective and straightforward example of this functionality in SUMMA (ie, https://summa.readthedocs.io/en/latest/input_output/SUMMA_input/#attribute-and-parameter-files). SUMMA is an NCAR-developed modeling framework and notably was the design template invoked in the proposal to evolve CLM to CTSM, mainly related to ideas about numerical solver separation from physics. I use SUMMA for nearly all my agency stakeholder-oriented research and applications, and I'm trying to bring CTSM into that arena.

Basically, all meaningful parameters SUMMA are hierarchically exposed to different levels of specification. The default (for users with no information) is a global specification, which is provided in a text file listing of parameters (applied identically everywhere) and their theoretical ranges. These are always read in (as default), but are immediately overwritten by the same parameters if specified in an index/type library (eg soil types, veg types ie PFTs) that has a suite of process parameters attached by index to a type number. The libraries can also be text files (they're not large), but are complex enough to warrant netcdf file format. They just have a squirrely structure because the type libraries can offer multiple options, eg IGBP-MODIS for a veg type parameter classification systems, not all of which include consistent parameters. I think this is about as far as CTSM goes, currently, in terms of parameter control. That's not uncommon -- a lot of global models stop about here in terms of user control. Note, parameters are distinct from geophysical attributes (eg elevation, slope, veg type, soil type) which are always distributed (and in netcdf). Anyway, following this type-based 'read/overwrite', all parameters are again overwritten by a fully distributed parameter file for those parameters a user seeks to calibrate or feels they have information to support going beyond the type libraries. These local parameters are stored in fully distributed 'trialParameter' files (by cell/grid/polygon). This hierarchy is extremely useful. We decide which parameters to adjust, at what level of granularity, then the trialParameter file (or library, or global file) can be updated in PPE or optimization runs. Nearly all parameters have a potentially globally distributed scope, whether it's invoked or not by the source the user specifies. Internal data array allocation can be done in various ways depending on preference (for efficiency vs code complexity). It mainly takes i/o engineering for a variety of static inputs and types. I think some of us who work on other fortran models (SUMMA, Noah-MP) could probably implement it if we had Erik or a CTSM code expert looking over our shoulders for guidance, tips and QC.

Like 'history matching', this capability probably doesn't need a lot of re-conceptualizing and invention, versus getting a jump by leveraging what exists. Hydrologists have good examples and long experience with parameter optimization in models. The functionality seems a small lift for CTSM to become a more usable model for water security studies, and I presume the same is true for all the other applications (carbon, climate, ecology, etc). Guoqiang Tang has just completed a powerful emulator-based parameter estimation workflow for CTSM hydrology -- but its global application and value will be limited in CTSM by not having this kind of flexibility. Hence my (renewed) interest! :)

It would be great to have an initial high-level discussion to scope out an effort to do this. We can think about how to map what SUMMA does, for instance, into CTSM.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework CTSM so that parameters can come in easily with different dimensionality: scalar, by-PFT, by gridcell, by-gridcell-and-PFT #2395

{{title}}

Replies: 9 comments 20 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Rework CTSM so that parameters can come in easily with different dimensionality: scalar, by-PFT, by gridcell, by-gridcell-and-PFT #2395

ekluzek Feb 29, 2024 Maintainer

Replies: 9 comments · 20 replies

ekluzek Feb 29, 2024 Maintainer Author

adrifoster Feb 29, 2024 Maintainer

ekluzek Feb 29, 2024 Maintainer Author

linniahawkins Mar 5, 2024 Collaborator

ekluzek Feb 29, 2024 Maintainer Author

ekluzek Feb 29, 2024 Maintainer Author

ekluzek Feb 29, 2024 Maintainer Author

ekluzek Feb 29, 2024 Maintainer Author

glemieux Mar 2, 2024 Collaborator

ekluzek Mar 2, 2024 Maintainer Author

ekluzek Mar 2, 2024 Maintainer Author

rosiealice Mar 2, 2024 Collaborator

glemieux Mar 2, 2024 Collaborator

dlawrenncar Mar 1, 2024 Collaborator

rosiealice Mar 1, 2024 Collaborator

ekluzek Mar 1, 2024 Maintainer Author

adrifoster Mar 5, 2024 Maintainer

samsrabin Mar 1, 2024 Maintainer

ekluzek Mar 1, 2024 Maintainer Author

samsrabin Mar 2, 2024 Maintainer

rosiealice Mar 2, 2024 Collaborator

ekluzek Mar 2, 2024 Maintainer Author

andywood Jul 26, 2024

ekluzek
Feb 29, 2024
Maintainer

Replies: 9 comments 20 replies

ekluzek
Feb 29, 2024
Maintainer Author

adrifoster
Feb 29, 2024
Maintainer

ekluzek Feb 29, 2024
Maintainer Author

linniahawkins Mar 5, 2024
Collaborator

ekluzek
Feb 29, 2024
Maintainer Author

ekluzek Feb 29, 2024
Maintainer Author

ekluzek
Feb 29, 2024
Maintainer Author

ekluzek
Feb 29, 2024
Maintainer Author

glemieux Mar 2, 2024
Collaborator

ekluzek Mar 2, 2024
Maintainer Author

ekluzek Mar 2, 2024
Maintainer Author

rosiealice Mar 2, 2024
Collaborator

glemieux Mar 2, 2024
Collaborator

dlawrenncar
Mar 1, 2024
Collaborator

rosiealice Mar 1, 2024
Collaborator

ekluzek Mar 1, 2024
Maintainer Author

adrifoster Mar 5, 2024
Maintainer

samsrabin
Mar 1, 2024
Maintainer

ekluzek Mar 1, 2024
Maintainer Author

samsrabin
Mar 2, 2024
Maintainer

rosiealice Mar 2, 2024
Collaborator

ekluzek Mar 2, 2024
Maintainer Author

andywood
Jul 26, 2024