-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpretation of negative years in the units attribute #298
Comments
@peterkuma, CF has never standardized the meaning of year zero or negative years. As you have discovered, different software developers have implemented various and conflicting interpretations. Best practice is to encode all timekeeping so that zero and negative are never encountered in either recorded times or in the reference time in the units attribute. Furthermore, when recording modern real world times, use only years later than 1582 in both positions, to avoid the Julian/Gregorian crossover. Please see CF section Time Coordinate for more details and caveats. For most applications, I recommend a reference date of January 1, hour zero, of the first year of your data domain, or some other round year number close to but earlier than the start. It seems like you have an application that is well aware of correct real world dates and times when the data are originally being recorded. Is there something that would prevent this application from encoding times in this unambiguous way, relative to a modern reference time? |
@Dave-Allured, thank you for your response. As you say the issue can be avoided by using a reference time after 1582, which is fine for end users who can choose the reference time. However, I think the situation for the developers of generic software which uses NetCDF is still unresolved, because they don't have a concrete guidance on how to use these reference times. My proposal would be to add text to the CF Conventions saying how negative years should be interpreted, even if it is something as short as: "When the year of the reference time is negative, year 0 is not counted in calculations involving this reference time." (or a similar statement), appended to the second paragraph of Section 4.4. Time Coordinate. To answer your question, I write relatively generic software for use with climate observations and climate modeling, which doesn't know what time range is going to be used by the user. I prefer to use Julian date everywhere in my code because it makes it easy to perform any calculations when all time variables have the same reference time. It would be great if NetCDF had a good support for this use case. |
@peterkuma, I share your interest in standardizing the treatment of zero and negative years. However, I am afraid your use case may not appropriate for this task. My experience so far is that almost all climate-related obs and model data sets that might use CF encoding are in the domain of only positive year numbers. I speculate that most climate model makers deliberately avoided zero and negative years because of this uncertainty. I might be wrong, I have not checked lately. You are writing generic software for climate obs and modeling. If you plan to use a fixed reference time for internal software purposes, then I suggest 1 January +0001 00:00, rather than the astronomical calendar base that you mentioned, to reduce problems. Take appropriate care with the Julian/Gregorian discontinuity. If that is not satisfactory, then I am glad to keep discussing a CF amendment. |
@Dave-Allured : I think this does deserve some further discussion because the study of climate in the distant past is a very important part of climate science, even if it is small in scale compared to the study of present day climate. Our problem is that the distant past is more important for our community than it is for many of the people who work on standards for dates and times. The NetCDF4 library may have resolved this for us, unless we make an active decision to depart from the treatment of negative reference times in NetCDF4. For example, if I create a file from the following CDL using
and then run |
This has been discussed in CF before but without conclusion. It would certainly be useful to adopt a convention for it, because there are use-cases, as @peterkuma demonstrates. It's not a problem with the reference year itself, but with the definition of the calendar. Given the lengthy debates about what "calendar" means when we were discussing leap-seconds, I use that word with some nervousness! I mean by "calendar" the set of valid dates (DD-MM-YYYY), which is implied by the choice of the In the standard calendar, as we all know, there is no year between 1 AD (CE) and 1 BC (BCE). I suppose it's because year 0 doesn't exist that COARDS chose year 0 to indicate climatological time. (CF supports that convention for compatibility with COARDS, which only deals with the real-world standard calendar.) I'm interested to see what @martinjuckes reports about NetCDF-4. If you accept 0 as a valid year number, it means you have to write 2 BC as year -1, 3 BC as year -2, etc. That seems rather confusing to me, and likely to lead to mistakes. However, it seems that this is what is done for the proleptic Julian calendar, which is used in astronomy. Wikipedia says, "year 1 of the Julian Period was 4713 BC (−4712)." It seems that there is a year 0 in that calendar. Is that correct? For model calendars, I guess that year zero probably does exist, because it's an inconvenience to arithmetic if you leave it out! If we decide there isn't a well-defined best answer, and there are divergent use-cases, we could define different CF calendars with and without year zero. |
I'm pretty sure that netCDF-C's support of dates is really just for minimal convenience and not intended to be any kind of standard. I'll poke @WardF and @lesserwhirls to chime in here... |
It might be reasonable to define the standard calendar as not permitting years less than 1. Whether or not year 0 exists is a choice for the proleptic calendars, I think. |
I initially liked Jonathan's idea of introducing a new proleptic calendar(s) to make it explicit when people care about the interpretation of negative years in the reference time, but, after thinking over the point discussed below, I would prefer to use a qualifier, e.g. If you take an etymological approach, I think The question, I think, is not "Is there a year zero in the calendar?" but rather "How do we encode a given calendar year in the time stamp?". Given that positive I'm not sure whether there is a good mnemonic term, but I suggest |
netCDF-C support for dates/times/calendars is limited to the |
I really appreciate that others have joined the discussion. From my understanding the ISO 8601 standard should be considered as a mapping to the underlying calendar, i.e. year 0 and -1 in ISO 8601 are mapped to year 1 BCE and 2 BCE, respectively. In that sense, there is no conflict between ISO 8601 and the calendar, even though it is slightly confusing. The ISO 8601 document itself is probably a better source of information than Wikipedia. In section B.1 (Date and time representations) of the ISO 8601:2004 version it has an example:
Other than that it is relatively short of good explanation about how negative years should be treated. Two other relevant fragments in the document are:
The newer version of the ISO standard from 2019 defines extensions in ISO 8601-2:2019:
Therefore, it looks like both ways of counting/not counting year 0 are supported by the standard, and they are distinguished by adding "YB" to the year number without leading zeros in the "explicit form". I think it would be desirable to follow ISO 8601 (but I am not sure about the "YB" form because it may be too complicated to parse all variations of the format), unless there are historical reasons not to, such as how the handling of negative years has been implemented in udunits. I will try to do a short survey of how existing NetCDF libraries implement zero and negative years. I have been in contact with the developer of Panoply (Robert B. Schmunk, @rschmunk - I hope that is his GitHub username), who said that handling of dates in Panoply follows standard Java classes, which is according to the ISO 8601 standard of including year 0 in the calculations. @Dave-Allured, I wouldn't be trying to only solve my use case here. I know I could work around it easily by choosing a reference time with a positive year. I am quite interested in solving the issue for all users of NetCDF, as much as I can contribute to the discussion. It looks like there is enough interest from others too. |
@peterkuma thanks for the detail, I was looking at ISO 8601-2014 (accessed in 2018) ... and the treatment of years -9999 to 0 has clearly be considerably enhanced in the 2019 version, especially in the extension (8601-2). I've just downloaded a new version. The NetCDF Java library has quite extensive calendar support .. but the only thing I could find in the NUG that our convention references was a link to the CDL functionality which I've illustrated above. I agree completely with @lesserwhirls that the NetCDF libraries, and libraries in general, should not be treated as a standard or convention, but we have to consider the consequences of recommending anything that conflicts with a widely used library. Using the new explicit ISO 8601 form for dates may help to make the distinction between the two mappings clearer, with |
Thank you, @peterkuma, for the useful information, and to @martinjuckes for his correction to the question - not "Is there a year zero in the calendar?" but rather "How do we encode a given calendar year in the time stamp?". I think these are linked in the CF convention. To be precise, I think we could say that the CF Therefore I suggest that for the default/
Any other calendar where the same ambiguity exists could have the same treatment, if there are use-cases which need them. This would be for Julian and proleptic Gregorian calendars, since the other two are model calendars in which I think we can assume year 0 is valid. |
Well I see that my feeble attempts to avoid the year zero issue have failed. I think that the focus in this conversation is good, and I hope we can reach a clear resolution for all six CF defined calendars. Thanks everyone so far, for your research and attention to detail. I support the ISO 8601 "expanded representation" approach for the interpretation of year numbering. ISO 8601 deals not specifically with calendar definitions, but rather with how to construct string representations. This is relevant for the reference time in the CF units string. As reported by @peterkuma, this representation puts year numbers on a mathematically normal integer time axis that includes negative years and year zero. @martinjuckes, I would prefer to stay with the familiar year-month-day string syntax, and avoid the new explicit ISO 8601 form that adds new designators such as "Y" and "YB". I think it will be sufficient to simply add explicit documentation for the proper CF treatment of negative and zero year numbers in the units string. I favor adding constraints for two of the CF calendars. This is a different way of avoiding the year zero issue, specifically an attempt to keep most common applications "safe" from crossover problems. The calendar named "Gregorian" should be restricted to only dates from 1582 October 15 forward. This would apply to both the reference date and all encoded dates. This should be fully compatible with existing data sets that have paid any attention to stated best practices for many years now. Likewise, the calendar named "Julian" should be restricted to only 0001 January 1 forward. As a result, the need to clarify negative and zero years would be reduced to only the remaining four CF calendars: 360_day, 365_day, 366_day, and proleptic_gregorian. |
Dear @Dave-Allured et al. We must have had a similar discussion some time ago - it feels familiar! While I appreciate wanting to avoid the Julian-Gregorian transition, I don't think we should disallow the default/ The Gregorian calendar is undefined before 1582. Possibly we could redefine However, in view of this point of Dave's, I'd like to change my proposal for new calendar names to:
For years>0, both of these calendars are the same as the default/ For For Best wishes Jonathan |
Dear @Dave-Allured , @JonathanGregory , I agree with Jonathan's point that someone may want to encode real world data from the 16th century (e.g. weather records from 16th century diaries) .. and so we should maintain the existing support for using actual the mixed Julian/Gregorian calendar. For I agree with Dave's recommendation to avoid the new ISO 8601-2 explicit form for dates ( There is currently a special meaning attached to reference dates of the form |
That troublesome usage of Section 7.4 also includes the only explicit treatment of the year zero concept in the entire document. |
Dear @martinjuckes and @Dave-Allured
Yes. I agree.
I do too.
Yes. Since CF generally supports udunits formats for units, we ought at least to allow what udunits does for time, but I haven't found out in the units documentation what formats it accepts. I think it will allow Y[-M[-D [h[:m[:s]]]]], where Y can be a large positive or negative number or zero. (NB udunits itself only handles the real-world calendar, but we use its format for the others.) I agree with @Dave-Allured that it's OK to continue to allow but deprecate the special use of year 0 in the real-world calendar.
I think that years<1 should be deprecated in these calendars, but not disallowed, because of backward compatibility. What do @martinjuckes and others think? Above (#298 (comment)) I have made proposals for new calendars, going before year 1, and asked whether Jonathan |
agree with @JonathanGregory on deprecating (rather than disallowing) years < 1 in I'm not sure about the proposal to redefine Concerning what udunits2 supports: the command line tool treats The library does accept arbitrary years and ISO basic format. This means that |
Dear @martinjuckes
Yes, it would apply to the year in the reference timestamp, and I think it would also mean deprecating any attempt to decode or encode a time before year 1. The CF checker would be able to detect such years in the time coordinate, and should give a warning about it, because their meaning would be unreliable.
My suggestion would be to make
Alternatively we could define what it means if you supply a date consisting of more than eight digits and no delimiters. But that would imply a requirement on software to support our interpretation. Maybe we could deprecate it instead of disallowing it. Jonathan |
👍 for this, and in particular for the last sentence. |
I think it would help to confine this discussion to the requested issue, which is zero and negative years in currently defined calendars. A refinement of the "Gregorian" label is a good topic, but I should not have injected it into this discussion. Also a full discussion of alternate date formats in the reference string is complicated. Can we please defer that to a future issue, when needed? I wholeheartedly support new calendar names that are explicit and mathematically well-defined. It is relevant to mention those ideas here. However, can we also put off their resolution to new issues, as needed? Let's see if we have some consensus so far on the following, acknowledging some previous agreement above.
Agreed so far? |
Dear @Dave-Allured Thanks for the summary. Yes, I agree with all those bullet points. I would like to add
Jonathan |
@JonathanGregory, I agree with your addition. That was my intention, I just did not fold that into the wording correctly. |
I have not much to add to this discussion, my earlier comment was only to express my support the suggestion to make the calendar names/terms clearer and more self-evident. For what it is worth, @Dave-Allured's summary and @JonathanGregory's addition looks good to me. |
I am coming around to favoring a partial ISO 8601-2:2019 approach as described above by @peterkuma. Both ways of either counting or not counting year 0 could be supported with some minimal extension of the reference date notation, as initially suggested above by @martinjuckes. I have a suggested notation that I would like hold for later. Let's continue to focus on the primary question of year numbers in the traditional CF format, without any new notation. By ISO 8601-2:2019, and if we agree, year zero and negative years are included. Now the interpretation for @JonathanGregory, you proposed that years before 1 should be deprecated for |
Hello @Dave-Allured
I also agree on the suggested approach to supporting zero and negative reference years with explicit specifications, and deprecating them in some cases. Also tend to favour allowing negative years in the |
Only the supposition that it might be not well-defined what year zero means. If the consensus is that year 0 is a normal year in the proleptic Gregorian calendar, I think that's good. We can allow zero and negative years for this calendar. The |
I believe that ISO8601 is as good a definition as we have for the datetime stamp, the Gregorian calendar, and the Proleptic Gregorian calendar. ISO8601 is explicit in the inclusion of year
CF is a good example of mutual agreement between partners. An expanded year representation [±YYYYY] is available, again by mutual agreement. and it must be prefixed with a + or − sign. |
@marqh, I am proposing only one idea from ISO8601, the mapping of zero and negative year numbers as you just showed. Year 0 = 1 BC, etc. I am not proposing a full adoption of an ISO8601 format. ISO8601 uses fixed length numbers, whereas the CF date/time stamp allows variable length numbers with delimiters. Also, CF does not use the plus sign. The delimited system is robust and has served us well for a long time. The CF delimited system accommodates ISO8601 fixed width formats when the standard delimiters including the "T" separator are used. E.g., YYYY-MM-DDTHH:MM:SS is correct under both systems. |
Yes, I agree, Martin's wording is good -- it is clear and succinct. A couple of minor comments:
|
Hello, A lot of suggestions have been made on this issue since the last update to the associated pull request (#315). I'm not sure if consensus has been reached, as the conversation has paused, but is it possible for someone to synthesise the suggestions made here during April 2021 so that the PR can be updated? (pinging @Dave-Allured for extra visibility as the PR owner). Many thanks, |
Dear @davidhassell et al. I will produce some text synthesising the comments made since the current pull request was written. Since then, the preamble on calendars has been modified as a result of the agreement of issue 313 on leap seconds. As a result, I suggest that the new sentence from the pull request of this issue should go in a different place. Below I reproduce the new text from the working draft, because I think it's useful in spelling out what "calendar" means in CF. I have inserted the extra sentence in bold where I'd suggest putting it. Cheers Jonathan 4.4.1 CalendarA date/time is the set of numbers which together identify an instant of time, namely its year, month, day, hour, minute and second, where the second may have a fraction but the others are all integer. A time coordinate value represents a date/time. In order to calculate a time coordinate value from a date/time, or the reverse, one must know the When a time coordinate value is calculated from a date/time, or the reverse, it is assumed that the coordinate value increases by exactly 60 seconds from the start of any minute (identified by year, month, day, hour, minute, all being integers) to the start of the next minute, with no leap seconds, in all CF calendars. This assumption has various consequences when real-world date/times from calendars which do contain leap seconds (such as UTC) are stored in time coordinate variables:
It is recommended that the calendar be specified by the [... to be continued] |
Dear all I have drafted a new version of the affected parts of the text of Sect 4.4, taking account of the comments made since the pull request was revised, mostly as suggested but not quite, as follows:
I think the reference to Best wishes Jonathan 4.4 Time coordinateVariables representing time must always explicitly include the
The acceptable units for time are listed in the udunits.dat file. The most commonly used of these strings (and their abbreviations) includes ... 4.4.1 CalendarA date/time is the set of numbers which together identify an instant of time, namely its year, month, day, hour, minute and second, where the second may have a fraction but the others are all integer. A time coordinate value represents a date/time. In order to calculate a time coordinate value from a date/time, or the reverse, one must know the units attribute of the time coordinate variable (containing the time unit of the coordinate values and the reference date/time) and the calendar. The choice of calendar defines the set of dates (year-month-day combinations) which are permitted, and therefore it specifies the number of days between the times of ... The values currently defined for
The ... Replace the paragraph in 7.4 beginning "The COARDS standard" with For compatibility with the COARDS standard, a climatological time coordinate in the default Modify conformance document section 4.4 recommendations:
Add to conformance document section 4.4.1 recommendations:
|
Dear @davidhassell Yes, you're right, this does cover #319. I hadn't thought of that, but it could be convenient if no-one objects to merging them. I didn't redefine the Jonathan |
It is fine by me.
Which I think works well! |
@JonathanGregory This is all very good, and merging the two seems logical, especially as #319 is an offshoot from this issue. A couple of minor comments:
|
Hello Lars, I like your suggestions. With regards Udunits, the time units are distributed across multiple XML files. For example, "The acceptable units for time are listed in the Udunits database [UDUNITS]." where [UDUNITS] is the existing reference in the Bibliography section, which provides a version-independent link (http://www.unidata.ucar.edu/software/udunits/) to the Udunits home page. The online viewable database for a particular Udunits version is easily findable from the given link (the latest is https://www.unidata.ucar.edu/software/udunits/udunits-2.2.28/udunits2.html#Database). Incidentally, what is the correct way to write Udunits? CF uses "Udunits", but Unidata documentation seems to favour "UDUNITS". Mentioning @semmerson here in case I've missed something. |
Dear @larsbarring and @davidhassell Taking Lars's points in reverse.
In this proposed text, I have been more explicit about the problems with the calendar and the recommendations to avoid them. I have moved the Gregorian leap-year rules from The existing text also says, "For time coordinates that do cross the discontinuity the Cheers Jonathan Current text: The mixed Gregorian/Julian calendar used by Udunits is explained in the following excerpt from the udunits(3) man page:
Due to problems caused by the discontinuity in the default mixed Gregorian/Julian calendar, we strongly recommend that this calendar should only be used when the time coordinate does not cross the discontinuity. For time coordinates that do cross the discontinuity the |
Hi David,
Incidentally, what is the correct way to write Udunits? CF uses "Udunits",
but Unidata documentation seems to favour "UDUNITS".
UDUNITS
|
Thanks, Jonathan. Seems clear and close now. With these changes we'll be able to close both 298 and 319! Real progress. |
Dear Jonathan, |
If everyone is content with the proposed text, I will make a new pull request for this issue. In the pull request, I will add @Dave-Allured to the CF authors in recognition of his raising the issue and the work he has done on it (unless he would prefer not). Jonathan |
Thanks @JonathanGregory - I'm happy with your latest proposed text; and thanks @semmerson for putting us right :) |
Thanks, @JonathanGregory. Your PR looks good to me. |
@zklaus has made the following comment on the PR
|
I realise that we agreed to delete the existing excerpt from the UDUNITS that describes the I agree with the point @zklaus makes about the deprecated units. Would the following be OK:
I propose to delete this text: "UDUNITS includes the following definitions for years: a Although this PR can eliminate |
I have made the above changes in #331 |
Thanks, Klaus and Jonathan - This alternation (bd7498d) looks good to me. |
Thanks, David. If @zklaus and others are also content with the new version, I suppose it can be accepted three weeks from three days ago, which makes 23rd July |
There have been no further comments for three weeks and sufficient support has been expressed, so this change is therefore accepted according to the rules. I have merged #331. Thanks to all contributors to the discussion, especially @peterkuma, who raised the issue, and @Dave-Allured, who worked on the text and has been added to the list of authors of the CF convention. |
I have encountered an issue with using a negative year in the units attribute of time variables in NetCDF files. The interpretation in cftime (Python) is that year zero does not exist, while in other software such as Panoply the interpretation is that year zero exists. This affects how the time variable is read and displayed, and effectively causes one year difference between the different implementations. The CF Conventions (Section 4.4. Time Coordinate) do not explicitly state how negative years should be treated, except for stating that year 0 has a special meaning. On the contrary, ISO 8601, seems to be more on the side of including year 0.
In particular, this issue comes up when using Julian date in NetCDF files, which has a reference time of 1 January 4713 BCE, 12:00 UTC. As of now it is impossible to use it in NetCDF files and get consistent results in Python (through the netCDF4 package) and Panoply.
I suppose there are multiple possible solutions to the problem. Either all implementations start using the same method of counting
negative years (and it would be helpful if the CF Conventions make this unambiguous), or there would have to be information about the year numbering convention included in the NetCDF file, such as a new attribute or an indicator included in the
units
orcalendar
attributes.Related issue in cftime: #200.
The text was updated successfully, but these errors were encountered: