Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History Refactor Wish-list #630

Closed
rgknox opened this issue Apr 9, 2020 · 52 comments · Fixed by #802
Closed

History Refactor Wish-list #630

rgknox opened this issue Apr 9, 2020 · 52 comments · Fixed by #802

Comments

@rgknox
Copy link
Contributor

rgknox commented Apr 9, 2020

There are a few things that bother me about history output.

  1. It is both not clear to the user, and not implemented with consistency, how we treat non-fates columns in the upscaling to grid-cell level output. All of our output in the history file, ultimately gets upscaled to the grid-scale. This issue is roughly the same as change history flushing behavior #535, because how we flush values on non-fates columns essentially controls how/if those values are part of the final averaging. In some cases we tell non fates columns to be zero (adds a zero to the average), and in some cases we tell it to use an ignore flag.

My position is that if the output variable is on the fates side of the code, the diagnostic is inherently relevant only to naturally vegetated columns within FATES, and all other columns should be ignored. If we want a diagnostic that uses zeros from other columns, or real information from other columns, this should be defined on the host-model side, and FATES should pass the boundary conditions necessary to make the calculation.

  1. Get rid of these patch-scale diagnostic types, (ie _PA). These are confusing because it looks like we can have patch-scale output, but what is really happening is that we are just asking CLM/ELM to upscale from patches to grid for us. We already do this, and having two ways of going about the same thing is confusing.

  2. The filling of all history diagnostics happens in 1 central location, FatesHistoryInterfaceMod. This makes things complicated, particularly because the daily calls happen after all of our dynamics completes. If we want a diagnostic on growth or mortality rates (for example) , we have to then upscale the growth rates and save them to a site level intermediary variable, so that the information does not get messed-up during disturbance. Another example, if a cohort becomes terminated, we can't diagnose its rates because it ceases to exist. Long story short, if we were allowed to immediately update the history pointer arrays that are defined in FatesHistoryInterfaceMod when the salient variable is meaningfully updated, we would be able to remove all of these site-level intermediary arrays.

@rgknox
Copy link
Contributor Author

rgknox commented Apr 22, 2020

Also:

All history diagnostic written by FATES, should have a "FATES_" namespace prefix to the variable.

No output variables should have a "COL" or "PA" to indicate if it is patch or column, because the refactor will remove patch level averaging, everything is implicitly a calculation of fates columns.

EDIT (9/20): As far as suffixes go, all variables will be implicitly "site" level, and thus for consistency sake, we should either include "_si" suffix for all variables or not. If a variable has more than one dimension, such as size-class for instance, we would append si onto the existing suffix, whereas _sc -> _sisc, or _scpf -> _siscpf. I'm in favor of just dropping the "_si" since it is now implied.

@ckoven
Copy link
Contributor

ckoven commented Apr 24, 2020

@rgknox per your comment here about the _pa history variables. about half of these are fire-related. I am going through some of the fire history variables right now and will add the task of moving all of those current indexed directly by patch to the standard way we handle site-level variables to that effort.

@glemieux
Copy link
Contributor

  1. Get rid of these patch-scale diagnostic types, (ie _PA). These are confusing because it looks like we can have patch-scale output, but what is really happening is that we are just asking CLM/ELM to upscale from patches to grid for us. We already do this, and having two ways of going about the same thing is confusing.

Noting here that #640 and #642 have taken care of this particular item.

@rgknox
Copy link
Contributor Author

rgknox commented Sep 18, 2020

To do: Review the variable list in both E3SM and CTSM tests.

In CTSM we have a test called "AllVars" which sets the flag hist_empty_htapes = .false. When this flag is set to false, all variables in the history that are defined with use_default='active" are included. But we have many variables that are set 'inactive" because they have a large second dimension (e.g. everything with _scpf). This list of inactive variables needs to be updated (periodically) in testmods_dirs/clm/FatesAllVars/user_nl_clm

@rgknox
Copy link
Contributor Author

rgknox commented Sep 29, 2020

If we do not fix the FATES namespacing of history variables, we at least have to make sure we are not using the same diagnostics that CLM/ELM are using. "NEP" for instance is part of the starndard output for ELM/CLM.

@rgknox
Copy link
Contributor Author

rgknox commented Oct 13, 2020

Units on several logging history variables are wrong or incorrectly constructed:

See here, the cohort level mortality terms are fractions that died for that day. But the diagnostic has temporal period of per year:
https://github.com/NGEET/fates/blob/master/main/FatesHistoryInterfaceMod.F90#L2437

See here, logging mortality terms (fractions died that day) are being multiplied by seconds_per_year. Same as previous issue, need to correct time units.
https://github.com/NGEET/fates/blob/master/main/FatesHistoryInterfaceMod.F90#L2548-L2549

@jkshuman
Copy link
Contributor

@KatieMurenbeeld tagging you to take note of these issues with logging output

@ckoven
Copy link
Contributor

ckoven commented Feb 5, 2021

just a note: I wanted to try to put all of our current history variables into a spreadsheet to better assess how to make all of our history variables more uniform in terms of units, namespacing, also what is there and not there, which ones are on by default, etc. After some emacs-wrangling, I turned the relevant fortran into this:
https://docs.google.com/spreadsheets/d/1gLhbJV45-Ptd65iI9W2LZtZj6JcEJSTxaicYfpGBiaM/edit#gid=0

should be useful as a reference and starting point for a more systematic overhaul.

@adrifoster
Copy link
Contributor

Just commenting here to note that per a conversation at a CTSM SE meeting and a follow-up conversation with @rgknox I am tackling some of these history variable update requests. Right now I am going through and updating all the output variable names to have 'FATES_' as a prefix, and also renaming them to have suffixes according to their scale (see below). We also discussed changing the flushvar to be all the hlm_hio_ignore_val, so @rgknox and I discussed needing to go through where the history variables are actually counted/summed/etc. and making sure that things are 0'd where they need to be.

Suffixes:
_SI: site-level (should we still do this?, I see a note above that says this should be implicit - I am not including _SI on
variables that are averaged at a finer scale (e.g. GPP_PF, not GPP_SIPF)
_PF: PFT
_SZ: size
_HT: height
_AP: patch/disturbance age
_AC: cohort/plant age
_FC: fuel class
_EL: element
_CL: canopy layer
_LL: leaf layer
_DC: coarse woody debris size
_SL: soil layer

@adrifoster
Copy link
Contributor

adrifoster commented Sep 9, 2021

Also, in going through this I am noticing that it is fairly tedious to update all these calls to set_history_var, and there are a few typos/incorrect units/etc. that I am already encountering.

I was thinking it might be easier to have all of these in some sort of easily read and manipulated input file (csv, etc.?) that we would then parse inside the define_history_vars subroutine. We could then just call the set_history_var in a loop using the list/array of these input arguments (i.e. variable name, units, longname, etc.), that get read in.

e.g.:

! Some code here that reads in and parses input file with variable names, etc.
!....
!....
!....

do i = 1, n_history_vars
    call this%set_history_var(vname=config(i)%varname, units=config(i)%units, &
         long=config(i)%longname, use_default=config(i)%use_default, &
         avgflag=config(i)%avflag, vtype=config(i)%vtype, hlms=config(i)%hlms, &
         flushval=hlm_hio_ignore_val, upfreq=config(i)%upfreq, ivar=i, &
         initialize=initialize_variables, index=hash_index)

end do

It may work to have some kind of hash map so we can still index the variable locations.

Do we think this would be more manageable for future changes and further variable updates?

@ckoven
Copy link
Contributor

ckoven commented Sep 9, 2021

@adrifoster thanks for doing this. It's an interesting idea and certainly the current code has a lot of repetition. I guess I see there being some tradeoffs with the proposal of moving this to some sort of csv file. First would be that we would no longer have the metadata and data (or I guess more specifically what the calculation of each variable is) together in one file as we do now, which would then make it trickier to search within that one file to see what a given variable actually is and how it is calculated. A second issue is if you were to simultaneously do both the updates that you are making and a switch to some other file, then we'd lose the git continuity and provenance to be able to keep track of things like changing units to see what all the changes are. A third and possibly biggest one though is related to @rgknox's point 3 in #630 (comment) -- ideally we want to go more in the direction of how CLM & ELM handle this stuff and try to move the history writes from the current monolithic block of code that is all called at the end of the timestep to a set of more modular history update subroutines that would be called at different times, and I feel like pulling out the variable definitions seems like it would be in conflict with that goal?

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 9, 2021

@adrifoster the idea of having an external file to read in for history variables is something that I've thought about as well and felt it would be a good way to do for CTSM. So I totally agree with this approach and if you adopt it, I hope it'll make it easier to bring into CTSM. I was thinking it should be something like XML as I think csv doesn't handle enough complexity. You could also have it in XML for example though and have build scripts that convert to a different format as needed by whatever needs to read it in. The XML format would for example help out master history list out. But, for the FORTRAN it might be best to have that simple csv file.

I might have some notes on this idea in a CTSM issue so I'll look for that.

@rgknox
Copy link
Contributor Author

rgknox commented Sep 10, 2021

@adrifoster I'm trying to envision the code that connects the text string to the actual named integer that is used index for each variable. Since the named integers, (i.e. ih_npp_leaf_si, etc) live in the module itself, won't it get messy trying to maintain these definitions in two places, in the module and in a csv file? Also, I'm just not sure how you can take a string and use that to choose a named variable (I suppose it can be done, just not sure how). The alternative is to do away with the named integers altogether, and use the variable's text name to identify the variable location in memory, however that would slow down the model, because we would need to do text searches every time we want to fill a variable.

@rgknox
Copy link
Contributor Author

rgknox commented Sep 10, 2021

I'm also thinking that we don't really need the following 3 arguments to all of those set_history_var() calls:
flushval: this will soon be implicitly set to ignore in all cases
ivar, initialize: I think the routine is called at the machine level, and not the thread level, so we can use scalar globals for these

keep?:
avgflag: Right now, we don't seem to do anything other than average, but there is no reason we couldn't use the other average types in the future (maxes and mins, right?), I see no reason not to continue including this

Another to consider:
upfreq: If we removed this argument, we would be changing much of how we do things. This argument only controls when we flush the output arrays. Some we want to flush at the model timestep prior to filling (photosynthesis,hydraulics), some we want to flush daily prior to filling (dynamics, etc..upfreq=1). So at the model timestep, we loop through all variables and flush the ones with upfreq=2, and at the daily timestep we loop through and flush everything with upfreq=1. An alternative would be to have a generic flush routine, that when called, has an array of the history indices passed as an argument. I'm on the fence over which system I like better. The latter option is a little more verbose, which is good in that it is more obvious what is being flushed, when.

@adrifoster
Copy link
Contributor

@adrifoster I'm trying to envision the code that connects the text string to the actual named integer that is used index for each variable. Since the named integers, (i.e. ih_npp_leaf_si, etc) live in the module itself, won't it get messy trying to maintain these definitions in two places, in the module and in a csv file? Also, I'm just not sure how you can take a string and use that to choose a named variable (I suppose it can be done, just not sure how). The alternative is to do away with the named integers altogether, and use the variable's text name to identify the variable location in memory, however that would slow down the model, because we would need to do text searches every time we want to fill a variable.

@rgknox Yes, I think we would need to redesign the code a bit. I would suggest using a hash map so that the current index would turn into the key. Though perhaps this would introduce more fragility because we would need to have the correct key string in the code.

@adrifoster
Copy link
Contributor

A second issue is if you were to simultaneously do both the updates that you are making and a switch to some other file, then we'd lose the git continuity and provenance to be able to keep track of things like changing units to see what all the changes are.

@ckoven that makes sense, I am forging ahead with making these variable name/units updates and we can decide if we want to just push that first and come back to the potential redesign later, or do it all at once.

@ekluzek
Copy link
Collaborator

ekluzek commented Sep 10, 2021

@rgknox I don't know of a way to turn the variable name into a string -- unless you have some code preprocessing happen, which could be done. We do that in some cases with files with names *.F90.in. But, I actually don't think that's the way to go here.

What I think you do is you separate the setting of the variable with the name, from the metadata surrounding it (the long name, units, etc.). So in one place you read the metadata from the input file and read it into based on the history fieldname. Elsewhere you assign the history fieldname to the array that contains the data.

I'll illustrate how this would work for CTSM, since I'm more familiar with it.

A current history call looks like this...

          call hist_addfld (fname='GRAINC_TO_FOOD', units='gC/m^2/s', &
               avgflag='A', long_name='grain C to food', &
               ptr_patch=this%grainc_to_food_patch)

Changing it would split it up so the call to assign the pointer would be like this...

          call hist_addfld (fname='GRAINC_TO_FOOD',  ptr_patch=this%grainc_to_food_patch)

There would be another line that reads in the meta-data file that contains the attributes for GRAINC_TO_FOOD. You would also have error checking to make sure all fields that have a "hist_addfld" call also have meta-data associated with it, and vice-versa.

I don't know that a hash helps here. You do need the list of fieldnames to be unique, and think it's OK to use character comparisons to go through the list of fields. There are hundreds of fields, but not thousands or millions where having a hash would be critically important.

@adrifoster does that picture work for the kind of thing you are envisioning?

@rgknox
Copy link
Contributor Author

rgknox commented Sep 10, 2021

That makes sense @ekluzek
I don't know if I'm a fan of splitting it up though. I like the original method because everything that describes the variable, name, units, and data pointer, are in one place.
I think I'm more in favor of us trying to stream-line the way we currently do things, like removing unecessary arguments and clutter.

@dlawrenncar
Copy link

dlawrenncar commented Sep 10, 2021 via email

@walkeranthonyp
Copy link

walkeranthonyp commented Sep 13, 2021

This is great @adrifoster , thanks.

I've also noticed quite a bit of inconsistency in units. Both in how they are labeled (the same units have different character strings for different variables) and in the units themselves for the same variable types. e.g. fluxes can be in per second or per day and both fluxes and pools can be in per m2 or per ha. Maybe something to bring some consistency to too if you're taking the time to overhaul the history code.

@ckoven
Copy link
Contributor

ckoven commented Sep 13, 2021

@walkeranthonyp that is a really good point and one of things that we had discussed a while ago. I had started to put all the current variables together in this spreadsheet that I linked to earlier in the thread (#630 (comment)) for precisely this purpose, but ran into a conceptual roadblock that I didn't know how to deal with. Ideally we should be using the standard CMIP conventions for units (so everything in kg, m, s, etc; and just the number 1 for dimensionless quantities like LAI). But I think that some of the more forestry-oriented or population-focused variables are in conflict with that goal. E.g. there is no SI unit for individual trees, so do we report plant populations in units of "m-2"? or report basal area in units of "1"? We probably need some discussion on this point.

@rgknox
Copy link
Contributor Author

rgknox commented Sep 13, 2021

My sense is that people who think about dimensions and their standardization for a living, may give me a slap on the wrist for the following, but... The dimensions appear to cancel out, but should they really? It is m2 of one quantity (leaf) per m2 of another quantity (ground), same concept with BA. It seems most appropriate to describe as m2 / m2.

@walkeranthonyp
Copy link

I see what you're saying Ryan. I'm also happy with dimensionless units but don't feel strongly one way or another. Just so long as we're consistent.

My main concern is the per meter squared is represented at least three ways /m2, /m^2, and m-2. I think udunits uses the later and is my preference. Also we should probably convert hectare to m2 and have fluxes over consistent time steps.

@adrifoster
Copy link
Contributor

adrifoster commented Sep 14, 2021

I agree with @rgknox on the basal area front, normally (in forestry, etc. lit) I see those reported in units of cm2/m2 or m2/ha, and since I think we do want to standardize our per area to m2, I think it is fine to say the units are m2/m2. @ckoven I think for numbers most forestry applications do "stems/m2", but I have been writing "nindivs/m2" since that seemed to be used most often in the code already.

@walkeranthonyp, @rgknox and I had also discussed standardizing the units (area always in m2, mass always in g, etc.). I will be doing this as well as I go through the file.

As for the exponent/etc, I also agree that should be standardized. So far I have been changing everything to /m2 , but should we do m-2 instead?

@ckoven
Copy link
Contributor

ckoven commented Sep 14, 2021

I think there are two slightly separate unit issues here: one is consistency and standardization in how we report units, and the other is consistency and standardization in how we calculate things. So for the second one, right now we have area in both m2 and ha, we have time in sec, day, and year, we have mass in g and kg. But for the first one, as Anthony says, we report things inconsistently, and in general make use of nonstandard conventions. e.g. we report individuals variously as N, indiv, and plants; in reality none of those are meaningful units from a convention standpoint and if we adhere to those standards than we should omit any reference to population in the units at all (closest example I could find are aerosol number densities which are just in m-2 here: https://cfconventions.org/Data/cf-standard-names/60/build/cf-standard-name-table.html)

So I see a few possibilities:

  1. we adhere to the CF conventions as closely as possible
  2. we chooses some other convention and adhere to it
  3. we adhere mathematically to the CF or other convention but leave more detailed info in the units (e..g we keep a reference to population, we say m2 of what per m2 of what, etc)
  4. we adhere to CF or other convention strictly, and keep a more verbose description of units in the long name

@ckoven
Copy link
Contributor

ckoven commented Sep 14, 2021

n.b. -- in looking quickly through the CF conventions in the link above, i already see they do some things differently than we do. e.g. they want burned area in actual area (which requires knowing the size of the gridcell) or alternately in burned fraction but without per-time units, so that means that we'd need to integrate burned fraction over the history interval rather than averaging in time as we do now...

@ckoven
Copy link
Contributor

ckoven commented Sep 14, 2021

PS the burned fraction solution is easy if we want to follow the standard, we just switch the avgflag to sum instead of average...

@mpaiao
Copy link
Contributor

mpaiao commented Sep 14, 2021

If it's easy to get the sum, then it makes more sense to report burned area fraction. Otherwise it could be confusing with the fire front ROS.

@ckoven
Copy link
Contributor

ckoven commented Sep 16, 2021

On the FATES call today, we had good discussion of this topic and made some decisions via zoom polls about how to proceed on a few issues:

  • We decided to use the CMIP conventions (https://cfconventions.org/) as the basis of what we do, so report unit denominators in, e.g., m-2 versus /m2 or m^-2, and that in general to use, e.g., kg for mass, m for length, and s for time
  • We decided to allow for a specific carve-out to the use of s-1 for demographic rates to be reported in y-1 because those are closer to their generally-used units. And another specific carve-out for GDD in C d rather than K s. But otherwise we want to keep such carve-outs as limited as possible, and keep to the convention.
  • We decided to keep the description in the units field concise, and to put a more verbose and unambiguous descriptions in the long name field where needed. So, e.g. we will report populations as m-2 rather than something like nindivs m-2, but then specify in the long name that it is population numbers per square meter. Or, e.g., refer to aboveground biomass as in kg m-2 but then say in the long name something like Aboveground biomass in kg of carbon per m2 land area.
  • We decided to allow for reporting unitless quantities in, e.g., m2 m-2 rather than the CMIP convention of 1.

Thanks to all who participated!

@adrifoster
Copy link
Contributor

Just commenting here to note that per a conversation at a CTSM SE meeting and a follow-up conversation with @rgknox I am tackling some of these history variable update requests. Right now I am going through and updating all the output variable names to have 'FATES_' as a prefix, and also renaming them to have suffixes according to their scale (see below). We also discussed changing the flushvar to be all the hlm_hio_ignore_val, so @rgknox and I discussed needing to go through where the history variables are actually counted/summed/etc. and making sure that things are 0'd where they need to be.

Suffixes:
_SI: site-level (should we still do this?, I see a note above that says this should be implicit - I am not including _SI on
variables that are averaged at a finer scale (e.g. GPP_PF, not GPP_SIPF)
_PF: PFT
_SZ: size
_HT: height
_AP: patch/disturbance age
_AC: cohort/plant age
_FC: fuel class
_EL: element
_CL: canopy layer
_LL: leaf layer
_DC: coarse woody debris size
_SL: soil layer

Adding another suffix:
_AF: for fuel age

@adrifoster
Copy link
Contributor

Hey all,

I want to confirm that these ED_ variables can be deleted if we already have a FATES_ version of them. See hio_bstore_si and ih_storec_si. I believe they are the same?

hio_bstore_si(io_si) = hio_bstore_si(io_si) + n_perm2 * store_m * g_per_kg

@rgknox
Copy link
Contributor Author

rgknox commented Sep 27, 2021

yes, I think you are right

@adrifoster
Copy link
Contributor

adrifoster commented Sep 28, 2021

Hey @rgknox, @ckoven and others, is there a reason to keep the mortality variable names M1_SZPF, M2_SZPF, etc? Or can/should we rename them to reflect what they actually are? (i.e. MORTALITY_BACKGROUND, etc.).

@ckoven
Copy link
Contributor

ckoven commented Sep 28, 2021

I have to look it up every time, so I support moving away from the numeric indexing.

@jkshuman
Copy link
Contributor

@adrifoster agree that giving these mortality terms human readable names would be great.

@adrifoster
Copy link
Contributor

Okay great thanks @ckoven and @jkshuman. Unless anyone else objects I'm going to give them verbose names.

@rosiealice
Copy link
Contributor

Yes!

@adrifoster
Copy link
Contributor

adrifoster commented Oct 8, 2021

A question about some variables for @ckoven, @rgknox, @jkshuman, @rosiealice or anyone else:

This variable, which is called TRIMMING_CANOPY_SCLS and TRIMMING_UNDERSTORY_SCLS is given as units indiv/ha (code here)

call this%set_history_var(vname='TRIMMING_CANOPY_SCLS', units = 'indiv/ha',               &
    long='trimming term of canopy plants by size class', use_default='inactive',   &
    avgflag='A', vtype=site_size_r8, hlms='CLM:ALM',     &
    upfreq=1, ivar=ivar, initialize=initialize_variables, index = ih_trimming_canopy_si_scls )

It's calculated as in both cases here

hio_trimming_canopy_si_scls(io_si,scls) = hio_trimming_canopy_si_scls(io_si,scls) + &
    ccohort%n * ccohort%canopy_trim

However, we have another TRIMMING variable with units 1 :

    call this%set_history_var(vname='FATES_TRIMMING', units=1,                 &
         long='degree to which canopy expansion is limited by leaf economics (0-1)', &
         use_default='active', avgflag='A', vtype=site_r8, hlms='CLM:ALM',     &
         upfreq=1, ivar=ivar, initialize=initialize_variables,                 &
         index=ih_trimming_si)

However this is calculated without the multiplication by the cohort%n

hio_trimming_si(io_si) = hio_trimming_si(io_si) + cpatch%tallest%canopy_trim * cpatch%area * AREA_INV

I understand the units for the TRIMMING_CANOPY_SCLS because we are multiplying [1] by [indiv/ha] but I guess I just want to make sure the long name is the most accurate/descriptive. What actually are we calculating with that variable? The number of trimmed canopy plants?

@ckoven
Copy link
Contributor

ckoven commented Oct 8, 2021

Hi @adrifoster the intention here is just to be able to weight the trim output variable by size and canopy layers, since we expect it to definitely vary strongly between canopy and understory, and possibly within the canopy by size (and really by PFT too, but this doesn't even get into that). So the idea is to then divide by nplant_scls in postprocessing to be bale to get mean values of the trim parameter. For the site-level one, it is specifically getting just the tallest cohort in each patch, and then area-weighting that by that patch area, so I guess the idea is to just get the tallest-plant trim level as a proxy for the canopy trimming variable.

In practice the TRIMMING_CANOPY_SCLS and TRIMMING_UNDERSTORY_SCLS are probably over-informative (except that they don't resolve PFTs) and the FATES_TRIMMING is probably under-informative, since what we would really want are just a single number for each PFT and each canopy layer, since those should explain the most variation in cohort-level trim. So there's probably a better way to do it, perhaps by weighting each cohort by crown area and dividing by that PFT's total crown area in canopy and understory.

@adrifoster
Copy link
Contributor

@ckoven okay that makes sense, how should we define the long name for those?

'trimming term weighted by the plant density" ?
units: "m-2" ?

@adrifoster
Copy link
Contributor

Another question about units:

The variables STOREN_TFRAC_CANOPY_SCPF and STOREN_TFRAC_UNDERSTORY_SCPF are set as having units kgN/ha (code here):

       call this%set_history_var(vname='STOREN_TFRAC_CANOPY_SCPF', units='kgN/ha', &
            long='storage nitrogen fraction of target,in canopy, by size-class x pft', use_default='inactive', &
            avgflag='A', vtype=site_size_pft_r8, hlms='CLM:ALM',     &
            upfreq=1, ivar=ivar, initialize=initialize_variables, index = ih_storentfrac_canopy_scpf )

       call this%set_history_var(vname='STOREN_TFRAC_UNDERSTORY_SCPF', units='kgN/ha', &
            long='storage nitrogen fraction of target,in understory, by size-class x pft', use_default='inactive', &
            avgflag='A', vtype=site_size_pft_r8, hlms='CLM:ALM',     &
            upfreq=1, ivar=ivar, initialize=initialize_variables, index = ih_storentfrac_understory_scpf )

But it seems like it should actually be dimensionless (calculated here and here)? :

                     if (ccohort%canopy_layer .eq. 1) then
                        this%hvars(ih_storentfrac_canopy_scpf)%r82d(io_si,i_scpf) = & 
                             this%hvars(ih_storentfrac_canopy_scpf)%r82d(io_si,i_scpf) + store_m/store_max * ccohort%n
                     else
                        this%hvars(ih_storentfrac_understory_scpf)%r82d(io_si,i_scpf) = & 
                             this%hvars(ih_storentfrac_understory_scpf)%r82d(io_si,i_scpf) + store_m/store_max * ccohort%n
                     end if

                  if(  hio_nplant_canopy_si_scpf(io_si,i_scpf)>nearzero ) then
                        this%hvars(ih_storentfrac_canopy_scpf)%r82d(io_si,i_scpf) = &
                             this%hvars(ih_storentfrac_canopy_scpf)%r82d(io_si,i_scpf) / &
                             hio_nplant_canopy_si_scpf(io_si,i_scpf)
                     end if
                     
                     if( hio_nplant_understory_si_scpf(io_si,i_scpf)>nearzero ) then
                        this%hvars(ih_storentfrac_understory_scpf)%r82d(io_si,i_scpf) = &
                             this%hvars(ih_storentfrac_understory_scpf)%r82d(io_si,i_scpf) / &
                             hio_nplant_understory_si_scpf(io_si,i_scpf)
                     end if

@adrifoster
Copy link
Contributor

And more questions (nearing the end here...)

       call this%set_history_var(vname='FATES_ITERH1_SZPF', units='count/indiv/step', &
             long='water balance error iteration diagnostic 1', &
             use_default='inactive', &
             avgflag='A', vtype=site_size_pft_r8, hlms='CLM:ALM',     &
             upfreq=4, ivar=ivar, initialize=initialize_variables, index = ih_iterh1_scpf )

       call this%set_history_var(vname='FATES_ITERH2_SCPF', units='count/indiv/step', &
             long='water balance error iteration diagnostic 2', &
             use_default='inactive', &
             avgflag='A', vtype=site_size_pft_r8, hlms='CLM:ALM',     &
             upfreq=4, ivar=ivar, initialize=initialize_variables, index = ih_iterh2_scpf )

I'm not sure about what these should be converted to...

@adrifoster
Copy link
Contributor

So @billsacks brought up the conflict that CTSM/CLM has its mass units in grams, whereas with this switch to the CF table here that we voted on at the FATES call a on September 16, FATES will be going with kilograms for our output variables.

I am working on making these updates here, and am currently in testing phase.

However, given what @billsacks and @ekluzek were pointing out, I want to confirm that we will move forward with this switch to the CF conventions table for FATES output. Please note that these changes are only for the output, and will not affect any internal variables within FATES or the HLM.

Tagging people here in the issue so we can capture discussion on this change:
@rgknox @glemieux @ckoven @dlawrenncar @billsacks @ekluzek @danicalombardozzi @wwieder @klindsay28

@wwieder
Copy link

wwieder commented Oct 12, 2021

Thanks for pointing out this discrepancy, @adrifoster. I know there's been some movement in CESM component models to move towards greater CF compliance. Having FATES help us move towards this goal seems appropriate. I'm not sure how much time we want to spend doing a parallel effort for analogous output from the big-leaf model, especially if FATES is eventually going to replace the older (non-CF compliant) scheme. I guess we want to avoid confusing users who may be using both FATES and big-leaf models in the short/medium term.

@billsacks
Copy link
Member

@wwieder and others: to clarify, I don't think this units question is a question of CF compliance. CF gives canonical units (which are SI units), but I believe doesn't dictate what units you use to be CF compliant. To be CF compliant, you just need to explicitly specify your units in a particular format. I'm not positive about this, but that is the understanding of @ekluzek and myself.

@wwieder
Copy link

wwieder commented Oct 12, 2021

Thanks for clarifying @billsacks. I am not a CF expert, but from the FATES conversation I took away that units of kg, m^2, and s were preferred for mass, area, and time, respectively. @ckoven may have a more informed perspective here.

Separately, @jenniferholm, I'd assume this is potentially an issue for E3SM too?

@ckoven
Copy link
Contributor

ckoven commented Oct 12, 2021

Right @billsacks, CF specifies "canonical units" which are also the units that must be used when reporting to mips such as CMIP, trendy, etc. Certainly it is in principle fine to continue doing what CESM has always been doing, which is to use units that are convertible to but different than the canonical ones in the model history fields, and then just convert the units during the CMOR-ization process. But since (a) standardization of units is a headache, particularly for mips with less infrastructural support than CMIP, and (b) FATES hasn't been keeping to any specific convention, but we want to do so before everyone's scripts get baked in, we decided that we'd standardize to the canonical CF units, which has the benefit that it removes the unit-conversion step when we participate in mips.

@billsacks
Copy link
Member

Thanks for the clarification @ckoven . That makes sense - so this is more about removing this CMOR-ization step (or similar for other standardization for inter-model comparisons) rather than CMIP compliance per se.

@billsacks
Copy link
Member

For cross-reference with a relevant CTSM issue, I just remembered this one that @ekluzek started last year: ESCOMP/CTSM#973

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.