Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify naming and attributes of time variables on history files to be consistent with other CESM components #75

Open
phillips-ad opened this issue Mar 30, 2022 · 23 comments

Comments

@phillips-ad
Copy link

These changes are being requested to improve the ease-of-access of CESM data and to normalize the look of the metadata across CESM components. See ESCOMP/CESM#194 and especially the google doc referenced from ESCOMP/CESM#194 (comment).

Current CISM time array output:

double time(time) ;
	time:long_name = "Model time" ;
	time:standard_name = "time" ;
	time:units = "common_year since 0000-01-01 0:0:0" ;
	time:calendar = "noleap" ;

Proposed CISM time array output:

double time(time) ;
	time:long_name = "time" ;
	time:units = "days since 0001-01-01 00:00:00" ;
	time:calendar = "noleap" ;
	time:bounds = "time_bounds" ;
double time_bounds(time, nbnd) ;
	time_bounds:long_name = "time interval endpoints" ;
	time_bounds:units = "days since 0001-01-01 00:00:00" ;
        time_bounds:calendar = "noleap" ;

Changes made: Add time_bounds variable, alter time@units to match other components, add time@bounds, remove time@standard_name, change time@long_name to “time”.

@billsacks @phillips-ad @strandwg are filing similar issues for each component.

@billsacks
Copy link
Member

@whlipscomb @gunterl @Katetc - see issue above. @phillips-ad thanks for opening it!

Regarding the units on time, I remember that we had long discussions on this in the lead-up to CESM2. If I remember correctly, the main hesitation at the time about changing the units to "days since" is that it would require changes to many of CISM's post-processing scripts.

@whlipscomb
Copy link
Contributor

@phillips-ad, Thanks for detailing these changes. In the coming weeks, I can work with @gunterl and @Katetc to make CISM consistent with the other components.

@billsacks, CISM has always given the time in years, and I hesitate to switch to days for all applications, given the typical time scales for ice sheets. However, we already have two time variables: 'time' and 'internal_time'. Maybe we could make 'time' the CESM-friendly variable and 'internal_time' the ice-sheet-friendly variable, with appropriate unit conversions. I'm open to other suggestions.

@billsacks
Copy link
Member

I agree that it probably makes sense to keep internal_time as is (I remember how challenging it was for us to get this working a few years ago) and just change the 'time' variable that – if I remember correctly – is written but not read by CISM.

When we talked about this back in 2016, one other possibility under consideration was allowing "time" to be in either common_years or days. Then it could be in days for CESM simulations and common_years for standalone CISM runs if you wanted. The advantage would be that it would be in the most fitting units for each application, but the disadvantage is that output files from CESM runs would differ from standalone runs in this respect, which might be overly problematic for post-processing scripts.

I imagine that the implementation of this conversion may need to be revisited once CISM supports running with leap years. (I'll add a note about this to ESCOMP/CISM#12 .)

@whlipscomb
Copy link
Contributor

@billsacks, I think this option is worth considering. I'll ask @gunterl and @Katetc: For post-processing and diagnostics, do you think it's better to have a single 'time' variable that can have two different units, or two variables ('time' and 'internal_time') with different units (days and common years, where 1 common year = 365 days)?

@strandwg
Copy link

I recommend using "days since" as the units for "time" since that's the standard for all the other CESM components. Also, if CESM is run with leap years, then CISM's "common year" unit will need to be changed anyway. If the use of "internal time" makes CISM postprocessing tools work, that's completely fine.

@Katetc
Copy link
Contributor

Katetc commented Mar 31, 2022

I think for post processing scripts it would definitely be better to have two different variables with different, but consistent units. So CESM-based scripts could always expect to work with one variable, and CISM-standalone scripts would always be expected to work with the second, and everything would always be expected to work (and not fail in certain cases).

@billsacks
Copy link
Member

If I remember correctly, "internal_time" is needed in CESM cases (and I think standalone cases, unless some code changes are mode) no matter what we do with "time". @whlipscomb I'm not sure if you were suggesting a possibility of getting rid of that. Sorry if my suggestion was unclear on this: the alternative I suggested implied keeping internal_time, but having the conventions / units on "time" potentially differ for standalone vs. coupled cases.

@whlipscomb
Copy link
Contributor

@billsacks, thanks for clarifying that CESM still needs 'internal_time'. Following Kate's suggestion, I was imagining that standalone cases could use 'internal_time', with units of common years, but I can see that this would have issues too.

Since I don't fully understand the requirements, I wonder if we should schedule a call where we can ask questions and brainstorm.

@billsacks
Copy link
Member

Since I don't fully understand the requirements, I wonder if we should schedule a call where we can ask questions and brainstorm.

I don't remember all the details either, but I can be part of a discussion and share what I remember or can dig up from old emails.

@gunterl
Copy link
Contributor

gunterl commented Mar 31, 2022 via email

@billsacks
Copy link
Member

Looking back at some old emails, it looks like I introduced internal_time in response to a request to make the time variable have a reference point of year 0 (i.e., time since year 0), and I wasn't able to get CISM to restart properly with that change, at least in some circumstances. So the motivation was slightly different from what is suggested in this thread, although its existence can support this, too – basically, it can support any use case where the conventions wanted for the time variable on output files differs from what CISM needs for its own internal operation.

@hgoelzer
Copy link

hgoelzer commented Apr 3, 2022

Possibly I am confusing something or doing something wrong, but in my understanding the time recording in current CISM is off by a year between model and output. Say I run an experiment with SMB forcing starting in 1960, the first entry in the forcing file contains annual average values for the year 1960 that would typically be registered to July 1960. In the config I have tstart = 1960 and write_init = F. The model reads the forcing for 1960, runs a year and writes output at the end of 1960 (or rather 1961-01-01). In scalars.nc, I get the first entry as 1961. However with interpretation "common_year since 0001-01-01 0:0:0" I think this translates to (1961+0001)-01-01, which is the end of 1961 (or rather 1962-01-01).
I am not sure at what point this would be fixed (how the forcing is read or how the time is counted). But if we want tstart = 1960 to mean "start with the beginning of the year 2016", wouldn't that imply a calendar baseline year 0000 with "common_year since 0000-01-01 0:0:0"?

@billsacks
Copy link
Member

@hgoelzer - when running CISM inside CESM, on the CISM output (history) files, I see time:units = "common_year since 0000-01-01 0:0:0". Are you seeing "since 0001-01-01 0:0:0" somewhere? (It sounds like maybe you're referring to a file that appears in a standalone CISM run and not in a CESM-CISM run, in which case I should let others respond since I'm not familiar with that setup.)

@hgoelzer
Copy link

hgoelzer commented Apr 3, 2022

@billsacks - yes, I was referring to standalone CISM output.
I find the definition of baseline_year in CISM/libglimmer/glimmer_ncio.F90
subroutine glimmer_nc_createfile(outfile, model, baseline_year)
The default value of 1 seems to be used in all calls.

@whlipscomb
Copy link
Contributor

@hgoelzer, Thanks for joining the conversation. Do you have a preference for the definition(s) of 'time' in CISM? This seems like a good time to make sure our time conventions are consistent among ESMs (CESM and NorESM), and also consistent with best practice for standalone models (if you think there is a best practice). We have a call scheduled Monday April 4 at 10:30 am MT, if you're available and would like to join.

@hgoelzer
Copy link

hgoelzer commented Apr 4, 2022

I have a couple of thoughts, but no firm recommendation due to lack of expertise.
I think there are at least two aspects to this. 1) physical conservation, 2) writing meaningful netcdfs.
As a component of CESM I think it would be necessary for CISM to know how long each year is it is simulating. That could be either by pushing that information to CISM when it is being called or having CISM know about calendars itself. The latter is more work, but with the perspective of one day running monthly time steps probably useful to develop. To be able to run fully comparable standalone and coupled experiments this is also needed, I think.
Because CISM may also be used standalone, and over long timescales where calendar matters less, it may be also be good to have at least two different calendars to work with. One that can follow CESM and a noleap alternative for longterm. Happy to discuss this later today.

@whlipscomb
Copy link
Contributor

@hgoelzer, I just sent you a Zoom invitation to today's call, if you're able to join. I agree we should think about giving CISM some calendar capabilities, with two different calendars to work with.

@strandwg
Copy link

strandwg commented Apr 4, 2022

There's a post on the CESM Forum regarding an error reading the time:units attribute for CISM output:

https://bb.cgd.ucar.edu/cesm/threads/time-decoding-error-in-reading-cesm-outputs-netcdf-files.7229

@billsacks
Copy link
Member

@strandwg it's unclear to me from that post what the fundamental issue is: is the problem the use of common_year since, the use of a baseline date of year 0, both, or something else? Do you know?

@strandwg
Copy link

strandwg commented Apr 4, 2022

It looks like "common_year" isn't well-understood by xarray, apparently. The data in question doesn't use year 0 as a baseline, e.g. "ValueError: unable to decode time units 'common_year since 1-1-1 0:0:0' with "calendar 'noleap'".

@billsacks
Copy link
Member

Notes from discussion today with Bill Sacks, Bill Lipscomb, Gunter Leguy, Kate Thayer-Calder, Heiko Goelzer, Gary Strand, Adam Phillips:

One desired change that might not have come through clearly is to change the baseline date for the time variable: using "since 0001-01-01" rather than "since 0000-01-01".

Plan for now is to leave internal_time as it currently is, but change the time variable to output in days rather than years.

One possible issue is with rounding errors in the time output: Bill S recalls that there are currently potential small errors in the time variable, which accumulate over time. This could potentially become a bigger issue when we convert to days, in that it will take less model run time before the errors push you to the adjacent day. Bill S's memory isn't great here, so it may not actually be an issue, but if it is, Bill L suggests having a once-per-year reconciliation to correct these rounding errors.

@billsacks
Copy link
Member

Actually, I was just following the code, and I think the potential rounding issue might not actually be an issue, at least in the CESM context – though it probably still is an issue for standalone CISM runs. I thought I remembered that the time output variable was derived from internal_time. But looking back at the code, I see that, for CESM runs, the time output variable actually comes from the CESM clock, not from CISM's internal clock. (There are three relevant time managers in CISM: The CESM driver-level clock, the CISM wrapper clock implemented in glc_time_management.F90, and CISM's internal time management.)

Here is the relevant code:

call glimmer_nc_checkwrite(oc, instance%model, forcewrite=.true., &
time=instance%glide_time, &
external_time = real(cesmYR, r8))

(I do remember Alice Bertini running into problems with CISM's time having rounding issues, but maybe that was just before I separated time and internal_time.)

Since cesmYR comes from an ESMF clock object, I assume that it is well-behaved in the sense of not accumulating rounding errors. The current code writes this cesmYR to the output time variable. I think this works currently, given a baseline year of 0, if we assume that output files are written only at the end of each year: if cesmYR is 2 (for example), then it is exactly 2 years since the start of year 0. So I think the solution when running in the CESM context is to change the meaning of this external_time argument to glimmer_nc_checkwrite: currently it is interpreted as the number of years since a baseline date of 0000-01-01; it should be changed to be the number of days since a baseline date of 0001-01-01.

One nice thing about the way this is currently done is that the conversion to days since a baseline date can happen in the CISM-wrapper layer. CISM-wrapper has access to the ESMF library, which it already uses for a variety of ESMF clock-related calls. I think we could use ESMF's TimeInterval class (see sections 42.4.4 and 43 here: https://earthsystemmodeling.org/docs/nightly/develop/ESMF_refdoc/node6.html#SECTION06033000000000000000 ) to get the number of days from the baseline of 0001-01-01 to the current date. This should be robust to whatever calendar is being used.

(Of course, that doesn't help if the same thing needs to be done in standalone CISM.)

@hgoelzer
Copy link

hgoelzer commented Apr 4, 2022

Yes, my afterthought from the meeting is along those lines, CISM-CESM and CISM-standalone may be very different in their time requirements. While CISM in CESM gets dictated what date it is and can respond to that, CISM-standalone has to make that up itself. I am wondering if and how they can be made consistent without CISM getting a full-blown time manager.

@billsacks billsacks moved this from Needs Prioritization to Todo ~ months in CESM: infrastructure / cross-component SE priorities Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

7 participants