-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
History Refactor Wish-list #630
Comments
Also: All history diagnostic written by FATES, should have a "FATES_" namespace prefix to the variable. No output variables should have a "COL" or "PA" to indicate if it is patch or column, because the refactor will remove patch level averaging, everything is implicitly a calculation of fates columns. EDIT (9/20): As far as suffixes go, all variables will be implicitly "site" level, and thus for consistency sake, we should either include "_si" suffix for all variables or not. If a variable has more than one dimension, such as size-class for instance, we would append si onto the existing suffix, whereas _sc -> _sisc, or _scpf -> _siscpf. I'm in favor of just dropping the "_si" since it is now implied. |
@rgknox per your comment here about the _pa history variables. about half of these are fire-related. I am going through some of the fire history variables right now and will add the task of moving all of those current indexed directly by patch to the standard way we handle site-level variables to that effort. |
Noting here that #640 and #642 have taken care of this particular item. |
To do: Review the variable list in both E3SM and CTSM tests. In CTSM we have a test called "AllVars" which sets the flag hist_empty_htapes = .false. When this flag is set to false, all variables in the history that are defined with use_default='active" are included. But we have many variables that are set 'inactive" because they have a large second dimension (e.g. everything with _scpf). This list of inactive variables needs to be updated (periodically) in testmods_dirs/clm/FatesAllVars/user_nl_clm |
If we do not fix the FATES namespacing of history variables, we at least have to make sure we are not using the same diagnostics that CLM/ELM are using. "NEP" for instance is part of the starndard output for ELM/CLM. |
Units on several logging history variables are wrong or incorrectly constructed: See here, the cohort level mortality terms are fractions that died for that day. But the diagnostic has temporal period of per year: See here, logging mortality terms (fractions died that day) are being multiplied by seconds_per_year. Same as previous issue, need to correct time units. |
@KatieMurenbeeld tagging you to take note of these issues with logging output |
just a note: I wanted to try to put all of our current history variables into a spreadsheet to better assess how to make all of our history variables more uniform in terms of units, namespacing, also what is there and not there, which ones are on by default, etc. After some emacs-wrangling, I turned the relevant fortran into this: should be useful as a reference and starting point for a more systematic overhaul. |
Just commenting here to note that per a conversation at a CTSM SE meeting and a follow-up conversation with @rgknox I am tackling some of these history variable update requests. Right now I am going through and updating all the output variable names to have 'FATES_' as a prefix, and also renaming them to have suffixes according to their scale (see below). We also discussed changing the flushvar to be all the hlm_hio_ignore_val, so @rgknox and I discussed needing to go through where the history variables are actually counted/summed/etc. and making sure that things are 0'd where they need to be. Suffixes: |
Also, in going through this I am noticing that it is fairly tedious to update all these calls to set_history_var, and there are a few typos/incorrect units/etc. that I am already encountering. I was thinking it might be easier to have all of these in some sort of easily read and manipulated input file (csv, etc.?) that we would then parse inside the define_history_vars subroutine. We could then just call the set_history_var in a loop using the list/array of these input arguments (i.e. variable name, units, longname, etc.), that get read in. e.g.:
It may work to have some kind of hash map so we can still index the variable locations. Do we think this would be more manageable for future changes and further variable updates? |
@adrifoster thanks for doing this. It's an interesting idea and certainly the current code has a lot of repetition. I guess I see there being some tradeoffs with the proposal of moving this to some sort of csv file. First would be that we would no longer have the metadata and data (or I guess more specifically what the calculation of each variable is) together in one file as we do now, which would then make it trickier to search within that one file to see what a given variable actually is and how it is calculated. A second issue is if you were to simultaneously do both the updates that you are making and a switch to some other file, then we'd lose the git continuity and provenance to be able to keep track of things like changing units to see what all the changes are. A third and possibly biggest one though is related to @rgknox's point 3 in #630 (comment) -- ideally we want to go more in the direction of how CLM & ELM handle this stuff and try to move the history writes from the current monolithic block of code that is all called at the end of the timestep to a set of more modular history update subroutines that would be called at different times, and I feel like pulling out the variable definitions seems like it would be in conflict with that goal? |
@adrifoster the idea of having an external file to read in for history variables is something that I've thought about as well and felt it would be a good way to do for CTSM. So I totally agree with this approach and if you adopt it, I hope it'll make it easier to bring into CTSM. I was thinking it should be something like XML as I think csv doesn't handle enough complexity. You could also have it in XML for example though and have build scripts that convert to a different format as needed by whatever needs to read it in. The XML format would for example help out master history list out. But, for the FORTRAN it might be best to have that simple csv file. I might have some notes on this idea in a CTSM issue so I'll look for that. |
@adrifoster I'm trying to envision the code that connects the text string to the actual named integer that is used index for each variable. Since the named integers, (i.e. ih_npp_leaf_si, etc) live in the module itself, won't it get messy trying to maintain these definitions in two places, in the module and in a csv file? Also, I'm just not sure how you can take a string and use that to choose a named variable (I suppose it can be done, just not sure how). The alternative is to do away with the named integers altogether, and use the variable's text name to identify the variable location in memory, however that would slow down the model, because we would need to do text searches every time we want to fill a variable. |
I'm also thinking that we don't really need the following 3 arguments to all of those set_history_var() calls: keep?: Another to consider: |
@rgknox Yes, I think we would need to redesign the code a bit. I would suggest using a hash map so that the current index would turn into the key. Though perhaps this would introduce more fragility because we would need to have the correct key string in the code. |
@ckoven that makes sense, I am forging ahead with making these variable name/units updates and we can decide if we want to just push that first and come back to the potential redesign later, or do it all at once. |
@rgknox I don't know of a way to turn the variable name into a string -- unless you have some code preprocessing happen, which could be done. We do that in some cases with files with names *.F90.in. But, I actually don't think that's the way to go here. What I think you do is you separate the setting of the variable with the name, from the metadata surrounding it (the long name, units, etc.). So in one place you read the metadata from the input file and read it into based on the history fieldname. Elsewhere you assign the history fieldname to the array that contains the data. I'll illustrate how this would work for CTSM, since I'm more familiar with it. A current history call looks like this... call hist_addfld (fname='GRAINC_TO_FOOD', units='gC/m^2/s', &
avgflag='A', long_name='grain C to food', &
ptr_patch=this%grainc_to_food_patch) Changing it would split it up so the call to assign the pointer would be like this... call hist_addfld (fname='GRAINC_TO_FOOD', ptr_patch=this%grainc_to_food_patch) There would be another line that reads in the meta-data file that contains the attributes for GRAINC_TO_FOOD. You would also have error checking to make sure all fields that have a "hist_addfld" call also have meta-data associated with it, and vice-versa. I don't know that a hash helps here. You do need the list of fieldnames to be unique, and think it's OK to use character comparisons to go through the list of fields. There are hundreds of fields, but not thousands or millions where having a hash would be critically important. @adrifoster does that picture work for the kind of thing you are envisioning? |
That makes sense @ekluzek |
Back to a point about whether or not to maintain averaging options, I would
definitely support maintaining that capability, which it seems like is the
way you are going.
Additionally, for those that are not aware, in CTSM we have brought in a
new capability to get data (and average) for a particular local solar time
(like leaf water potential at 8am). See below CTSM issue. This may not
fit into this particular rework of FATES history output, but this seems
like a feature that would also be quite useful for FATES.
ESCOMP/CTSM#1374
…On Fri, Sep 10, 2021 at 11:42 AM Ryan Knox ***@***.***> wrote:
That makes sense @ekluzek <https://github.com/ekluzek>
I don't know if I'm a fan of splitting it up though. I like the original
method because everything that describes the variable, name, units, and
data pointer, are in one place.
I think I'm more in favor of us trying to stream-line the way we currently
do things, like removing unecessary arguments and clutter.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#630 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVBHWFUSEKFJMFGW7TDUBI7RBANCNFSM4MFBGCCA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
This is great @adrifoster , thanks. I've also noticed quite a bit of inconsistency in units. Both in how they are labeled (the same units have different character strings for different variables) and in the units themselves for the same variable types. e.g. fluxes can be in per second or per day and both fluxes and pools can be in per m2 or per ha. Maybe something to bring some consistency to too if you're taking the time to overhaul the history code. |
@walkeranthonyp that is a really good point and one of things that we had discussed a while ago. I had started to put all the current variables together in this spreadsheet that I linked to earlier in the thread (#630 (comment)) for precisely this purpose, but ran into a conceptual roadblock that I didn't know how to deal with. Ideally we should be using the standard CMIP conventions for units (so everything in kg, m, s, etc; and just the number 1 for dimensionless quantities like LAI). But I think that some of the more forestry-oriented or population-focused variables are in conflict with that goal. E.g. there is no SI unit for individual trees, so do we report plant populations in units of "m-2"? or report basal area in units of "1"? We probably need some discussion on this point. |
My sense is that people who think about dimensions and their standardization for a living, may give me a slap on the wrist for the following, but... The dimensions appear to cancel out, but should they really? It is m2 of one quantity (leaf) per m2 of another quantity (ground), same concept with BA. It seems most appropriate to describe as m2 / m2. |
I see what you're saying Ryan. I'm also happy with dimensionless units but don't feel strongly one way or another. Just so long as we're consistent. My main concern is the per meter squared is represented at least three ways |
I agree with @rgknox on the basal area front, normally (in forestry, etc. lit) I see those reported in units of cm2/m2 or m2/ha, and since I think we do want to standardize our per area to m2, I think it is fine to say the units are m2/m2. @ckoven I think for numbers most forestry applications do "stems/m2", but I have been writing "nindivs/m2" since that seemed to be used most often in the code already. @walkeranthonyp, @rgknox and I had also discussed standardizing the units (area always in m2, mass always in g, etc.). I will be doing this as well as I go through the file. As for the exponent/etc, I also agree that should be standardized. So far I have been changing everything to |
I think there are two slightly separate unit issues here: one is consistency and standardization in how we report units, and the other is consistency and standardization in how we calculate things. So for the second one, right now we have area in both m2 and ha, we have time in sec, day, and year, we have mass in g and kg. But for the first one, as Anthony says, we report things inconsistently, and in general make use of nonstandard conventions. e.g. we report individuals variously as N, indiv, and plants; in reality none of those are meaningful units from a convention standpoint and if we adhere to those standards than we should omit any reference to population in the units at all (closest example I could find are aerosol number densities which are just in m-2 here: https://cfconventions.org/Data/cf-standard-names/60/build/cf-standard-name-table.html) So I see a few possibilities:
|
n.b. -- in looking quickly through the CF conventions in the link above, i already see they do some things differently than we do. e.g. they want burned area in actual area (which requires knowing the size of the gridcell) or alternately in burned fraction but without per-time units, so that means that we'd need to integrate burned fraction over the history interval rather than averaging in time as we do now... |
PS the burned fraction solution is easy if we want to follow the standard, we just switch the avgflag to sum instead of average... |
If it's easy to get the sum, then it makes more sense to report burned area fraction. Otherwise it could be confusing with the fire front ROS. |
On the FATES call today, we had good discussion of this topic and made some decisions via zoom polls about how to proceed on a few issues:
Thanks to all who participated! |
Adding another suffix: |
Hey all, I want to confirm that these ED_ variables can be deleted if we already have a FATES_ version of them. See fates/main/FatesHistoryInterfaceMod.F90 Line 2285 in 70a3e13
|
yes, I think you are right |
I have to look it up every time, so I support moving away from the numeric indexing. |
@adrifoster agree that giving these mortality terms human readable names would be great. |
Yes! |
A question about some variables for @ckoven, @rgknox, @jkshuman, @rosiealice or anyone else: This variable, which is called
It's calculated as in both cases here
However, we have another
However this is calculated without the multiplication by the
I understand the units for the |
Hi @adrifoster the intention here is just to be able to weight the trim output variable by size and canopy layers, since we expect it to definitely vary strongly between canopy and understory, and possibly within the canopy by size (and really by PFT too, but this doesn't even get into that). So the idea is to then divide by In practice the |
@ckoven okay that makes sense, how should we define the long name for those? 'trimming term weighted by the plant density" ? |
Another question about units: The variables
But it seems like it should actually be dimensionless (calculated here and here)? :
|
And more questions (nearing the end here...)
I'm not sure about what these should be converted to... |
So @billsacks brought up the conflict that CTSM/CLM has its mass units in grams, whereas with this switch to the CF table here that we voted on at the FATES call a on September 16, FATES will be going with kilograms for our output variables. I am working on making these updates here, and am currently in testing phase. However, given what @billsacks and @ekluzek were pointing out, I want to confirm that we will move forward with this switch to the CF conventions table for FATES output. Please note that these changes are only for the output, and will not affect any internal variables within FATES or the HLM. Tagging people here in the issue so we can capture discussion on this change: |
Thanks for pointing out this discrepancy, @adrifoster. I know there's been some movement in CESM component models to move towards greater CF compliance. Having FATES help us move towards this goal seems appropriate. I'm not sure how much time we want to spend doing a parallel effort for analogous output from the big-leaf model, especially if FATES is eventually going to replace the older (non-CF compliant) scheme. I guess we want to avoid confusing users who may be using both FATES and big-leaf models in the short/medium term. |
@wwieder and others: to clarify, I don't think this units question is a question of CF compliance. CF gives canonical units (which are SI units), but I believe doesn't dictate what units you use to be CF compliant. To be CF compliant, you just need to explicitly specify your units in a particular format. I'm not positive about this, but that is the understanding of @ekluzek and myself. |
Thanks for clarifying @billsacks. I am not a CF expert, but from the FATES conversation I took away that units of kg, m^2, and s were preferred for mass, area, and time, respectively. @ckoven may have a more informed perspective here. Separately, @jenniferholm, I'd assume this is potentially an issue for E3SM too? |
Right @billsacks, CF specifies "canonical units" which are also the units that must be used when reporting to mips such as CMIP, trendy, etc. Certainly it is in principle fine to continue doing what CESM has always been doing, which is to use units that are convertible to but different than the canonical ones in the model history fields, and then just convert the units during the CMOR-ization process. But since (a) standardization of units is a headache, particularly for mips with less infrastructural support than CMIP, and (b) FATES hasn't been keeping to any specific convention, but we want to do so before everyone's scripts get baked in, we decided that we'd standardize to the canonical CF units, which has the benefit that it removes the unit-conversion step when we participate in mips. |
Thanks for the clarification @ckoven . That makes sense - so this is more about removing this CMOR-ization step (or similar for other standardization for inter-model comparisons) rather than CMIP compliance per se. |
For cross-reference with a relevant CTSM issue, I just remembered this one that @ekluzek started last year: ESCOMP/CTSM#973 |
There are a few things that bother me about history output.
My position is that if the output variable is on the fates side of the code, the diagnostic is inherently relevant only to naturally vegetated columns within FATES, and all other columns should be ignored. If we want a diagnostic that uses zeros from other columns, or real information from other columns, this should be defined on the host-model side, and FATES should pass the boundary conditions necessary to make the calculation.
Get rid of these patch-scale diagnostic types, (ie _PA). These are confusing because it looks like we can have patch-scale output, but what is really happening is that we are just asking CLM/ELM to upscale from patches to grid for us. We already do this, and having two ways of going about the same thing is confusing.
The filling of all history diagnostics happens in 1 central location, FatesHistoryInterfaceMod. This makes things complicated, particularly because the daily calls happen after all of our dynamics completes. If we want a diagnostic on growth or mortality rates (for example) , we have to then upscale the growth rates and save them to a site level intermediary variable, so that the information does not get messed-up during disturbance. Another example, if a cohort becomes terminated, we can't diagnose its rates because it ceases to exist. Long story short, if we were allowed to immediately update the history pointer arrays that are defined in FatesHistoryInterfaceMod when the salient variable is meaningfully updated, we would be able to remove all of these site-level intermediary arrays.
The text was updated successfully, but these errors were encountered: