-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference periods for variables derived from climatology #70
Comments
Dear @davidhassell and @sethmcg David, thanks for reminding us about #188, which does seem to be the answer to the question - would you agree, @sethmcg, or have I misunderstood? In fact, #27 mentions
For example
indicates that It occurs to me that it's not recorded what sort of climatology this was: is it annual means, monthly means, monthly maxima, maxima for the day of the year? This would be apparent from the Best wishes Jonathan |
Ah, perfect! Yes, that's exactly what I was looking for. Thanks much @davidhassell for the pointer to those issues and @JonathanGregory for the excellent example usage! I note that #27 is still open, so I could go jump in there and add a couple more standard names for consideration to have that reference epoch sentence added to their description. That might also jumpstart the conversation and move the issue forward. For the specific use case that prompted this issue (reference period for calculating KBDI), I can follow Jonathan's example and we're all set. (And that applies regardless of whether and when we update the However, I have thought of two use cases for other data that I work with that have me wondering whether we need to add something to the Conventions about reference epochs in general, and not just have it mentioned in the description of certain standard names. First: bias-correction. When you apply a bias-correction to model output, you're doing some kind of transformation (maybe simple, maybe hideously complex) to make your model dataset look (statistically) more like your observational dataset. That observational dataset has some finite temporal coverage, and Second -- and I've actually been doing this recently -- if you're looking at something like the change in average zg500 anomaly on rainy days, you want to track two reference periods: one for the past and one for the future. So do you need two such variables with different names? Do you need to add a categorical dimension (past, future) to the variable? Something else entirely? In any case, I think it warrants some consideration. What's the best way to proceed from here? Close this ticket and bring these issues up in #27? Open a new ticket with a better name? Keep the discussion here going and cross-reference it over there? |
Hi @sethmcg However, for you second use case, do you really need two reference periods? Wouldn't it be enough to make the first period the Regarding your last question, would it be possible to split your ideas an suggestions so that what is directly relevant for #27 goes there, and then we can we can continue the the broader conversation here? Regarding your first use case (on bias adjustment), I have been thinking about this for some time now and the reference period ( |
I've added the FAQ label to this issue to mark it for inclusion in the FAQ when we have finished answering it. |
@larsbarring I like your suggestion to move the issues of the standard names to #27 and continue the general discussion here. I will do that. You're right that for the second use case, a On the more complex topic of bias-correction, one point we could start with is whether the different components can be separated, or whether a solution should encompass all of them. I think they can and should be separated, so that e.g., we address the issue of the @JonathanGregory Thanks. The FAQ does seem like the right place to put that explanation. Is there anything in particular we should make sure to address to generate a good Frequently Provided Answer? |
Dear @sethmcg
No, there are no guidelines for this. Anything which you, as a questioner, might find useful in the answer! We haven't been adding to the FAQ, and I feel that it would be useful to do so. Best wishes and thanks Jonathan |
Dear @JonathanGregory , In your answer Jul 19th you note:
Isn't the issue that the reference_epoch variable is tasked to record the time period, and not how the reference value (aka climatology) was computed? If we look at the way CF allows to store climatologies we see that the :cell_methods attribute to describe how the climatology was computed is in the variable storing the climatological value, while the time period on which the climatology is computed is encoded in the time variable with attribute :climatology. Should the variable holding the anomaly also stores how the climatology it refers to was prepared? For example a reference_cell_methods, in addition to the reference_epoch? At a more general level, I am surprised that the way to store an anomaly (standard_name x_anomaly + reference_epoch) and the way to store climatologies (attribute :climatology) have so little in common. Anomalies and climatologies are not the same thing, but their descriptions in the CF world could probably be more streamlined. Best wishes, |
Dear Thomas @TomLav Thanks for your comment. It might be helpful to distinguish two meanings of "climatology". (1) The reference data variable that was used to compute the anomalies stored in another data variable. (2) A data variable which has a climatological time dimension. The The climatology in sense (1) doesn't haven't to be a climatology in sense (2), but it can be. For instance, you could calculate anomalies for monthly means wrt a 30-year mean or wrt 30-year monthly means. The latter is a climatology in sense (2) and the former is not. The solution we've discussed in this issue proposes to use the If the above is correct (I'm not sure it is!) then to be clear about the climatology that was used to calculate the anomalies, we need the time bounds and the
In this example, I've kept the Does this makes sense and would it be sufficient? It's just a suggestion. Best wishes Jonathan |
I hate to muddy the waters, but the more I think about this problem, the messier it gets. I started working on an example to move #27 forward (before @JonathanGregory beat me to it above) and realized that there are a number of different cases we need to consider, and I'm not sure whether or not the proposed solution can handle all of them. (Plus, as @TomLav points out, there's a bit of a disconnect between how we store the anomaly / derived value and the reference it's relative to.) So maybe it will be useful to spell them out, so that we can at least be clear on the cases we need to address. I have actual use cases for all of them, so this is not a purely theoretical exercise. You can calculate statistics relative to:
Okay, so far so good. Cases 1, 2, and 4 are what the I think Case 5 is also covered, although it requires quite a lot of metadata to record something that can be expressed very simply as "±15 days". (A 2xN array the full length of the original time coordinate.) Cases 3 and 6 point out the ugly wrinkle, which is that that the operations used to construct the 'climatology' may not all be the same and can be compounded. If I want to look at the change from past to future in the standard deviation of the monthly average daily maximum temperature, which is not a tremendously exotic quantity to consider, that's an anomaly using 2 reference periods that have been aggregated in 3 different ways at 3 different frequencies. We have examples doing complex aggregations with the existing machinery in section 7.4 (though some of them are kind of hard to understand), but they're only ever for 2 aggregations. Can we handle 3 or more aggregations, plus an anomaly-type operation where you're calculating against two different periods, and still retain all the relevant info? |
Upon further thought, I think this can almost be handled by the existing machinery, but that it needs one or two additional pieces. Consider a quantity like the standard deviation of monthly average daily maximum temperatures. I think we can clearly record that in a way that's self-contained and easy to interpret (by both humans and computers) if we add the frequency/spacing that we're aggregating to. In other words, in addition to the aggregation method, Currently, if you encounter (That syntax could also be straightforwardly extended to include moving windows, like: However, there's still one piece missing, which is combining Would it make sense to consider the possibility of uniting climatology and cell_methods under a single umbrella, maybe something like |
Hello, I would like to re-open this issue / question as I do not see that we concluded, and I do need an answer for some datasets we are preparing. I will first try to summarize what I understand the status is, then propose a way forward and ask questions. (Partial) summaryThe thread started with @sethmcg asking how to store the reference period an anomaly value was computed against. @JonathanGregory answered that the current mechanism in CF is to use the (Suggested) way forward for this threadThis thread identified (the issue at hand) that the current solution ( Think of the difference between a 20th September maximum temperature wrt a reference_epoch of 1990-2019: is it a difference from the 30-year (yearly) mean, from the 30-year September mean, from the 30-year mean of 20th September, or even from a 20th September maximum daily temperature over 30-year, etc...? Although I do have questions, I think the solution suggested by Jonathan of introducing a dummy climatology variable is an elegant one. Particularly because it brings closer the definition of climatology and anomaly in the CF-world, and reuses the concept of dummy variables (e.g. thinking of My approach would be to start from files structures where the anomaly variable and the climatology variable are both present (in the same file). Once we decide how to link the anomaly variable to the corresponding climatology variable, we would think about how to "empty" the climatology variable to only keep its definition (and turn it into a dummy climatology variable). As far as I am aware, even the start situation (the anomaly and climatology variables both in the file and the anomaly pointing to the climatology variable) is not described in the convention today. I admit I do not understand the implications of your final posts in the thread above, @sethmcg. It seems to me that they could lead to a much larger overhaul of 7.4 and cells methods, but correct me if I am wrong. I have no idea what the efforts would be to solve all the cases you raise in your two final posts. I would however be interested in first fixing the issue identified above (a fix for the Questions:
|
Thanks for the excellent summary, @TomLav . I'd like to comment, but will wait until we decide where. I vote for starting a new thread in "Disucssions", seeded with this summary (and linking back to here). But I don't mind. |
Thanks @davidhassell . I like a idea of starting / continuing in a Discussion. |
Dear @TomLav I'm also grateful for your clear and comprehensive summary. I'm sorry I didn't answer @sethmcg's two most recent contributions, because of lack of time to think about it. I agree that it is of interest to devise a solution for definite use-cases that you have, and that we don't currently have a way to distinguish the cases you describe. I agree with @davidhassell that it would be sensible to continue this in a new Discussion. Best wishes Jonathan |
I have now opened a Discussion where we can follow this up : #305 . |
This issue is considered to have been converted into a Discussion (although not reflected by GitHub as a new standalone Discussion post was created for the purpose). Therefore this issue will now be closed, as conversation is continuing on at the following link: https://github.com/orgs/cf-convention/discussions/305. The issue history and contributions will still be visible. |
There are some variables, like
air_temperature_anomaly
andkeetch_byram_drought_index
that use a climatological variable (like monthly average air temperature or average annual precip) as an input.The climatology itself has its reference period recorded in a bounds variable referenced by the
climatology
attribute, but that information gets lost when you calculate these derived variables.Does CF has a mechanism for recording these climatological reference periods associated with non-climatological variables? And if not, do we need one?
I think the answers are "no" and "yes". Can anyone think of a way to do it within the existing rules? And if not, are there other use cases to consider?
The only thing I've been able to come up with is that if you had an anomaly-type variable with both
bounds
andclimatology
attributes, you could assume that bounds applies to the anomaly variable and climatology to its input, but that's confusing, and I think CF forbids having both attributes on a single variable anyway.Thoughts?
The text was updated successfully, but these errors were encountered: