NetCDF global attributes vs data variable local attributes #3325

bjlittle · 2019-06-05T15:03:01Z

Update Mon 3rd July:
this effort is now handled in it's own project
please see there for existing task breakdown + progress

At the moment iris takes a rational but naive approach to dealing with the local attributes of a NetCDF variable (a variable that becomes a cube) and the global attributes of the NetCDF file that the said variable comes from.

That is, the resultant cube.attributes will be a combination of both the local and global attributes, where the local attributes will take precedence, and overwrite, common global attributes.

From the inception of iris, and in the light of no use cases, this seemed like a reasonable thing to do. However, such an approach prevents preservation of the local and global attributes metadata. This is a major issue for many users, who require to preserve all attribute metadata.

We require to resolve this issue now in iris once and for all 😄

Note that, if a solution to this issue was implemented, then it would most likely be a breaking change - caution is needed here.

This is somewhat tangential related to #2352

The text was updated successfully, but these errors were encountered:

bjlittle · 2019-06-05T15:03:18Z

Ping @zklaus 👍

zklaus · 2019-06-05T15:09:44Z

A non-breaking way to implement that may be to introduce two new class members global_attributes and local_attributes (or var_attributes) responsible for the respective attributes with a property replacing the current attributes that can mimic the current behaviour for reading, overwrite existing entries in the respective dictionary, and otherwise defaulting to the local one.

bjlittle · 2019-06-06T18:36:32Z

@zklaus I was thinking along the same lines. In summary:

cube.local_attributes is a mutable dictionary for defining the attributes that are local to the associated NetCDF data variable of the cube when it's written to a NetCDF file
cube.global_attributes is a mutable dictionary for defining the attributes that are global to the associated NetCDF file that the data variable of the cube is written to
cube.attributes is a (stateless) combination of the cube.global_attributes and cube.local_attributes, akin to now, where the cube.local_attributes have priority over the cube.global_attributes

So for the case where there is a common attribute shared between cube.global_attributes and cube.local_attributes, the local value is shown in the cube.attributes. When saving such a cube to NetCDF, then the common local and global attribute is preserved (hoo-rah) i.e. the local on the NetCDF data variable of the cube, and the global in the global scope of the NetCDF file.

To ensure that there is a non-breaking behaviour here, then I think that I'm right in saying that if a user writes to the cube.attributes then this state is captured in the cube.global_attributes. Note that, cube.attributes is stateless, in the sense that it is simply derived on the fly at run-time from both cube.local_attributes and cube.global_attributes, with local having priority over global. This means that if a user wants to associate an attribute to the NetCDF data variable of the cube, then they must explicitly add the attribute to the cube.local_attributes, and not the cube.attributes.

Hmmm.... this make sense to me. Thoughts?

zklaus · 2019-06-11T14:41:43Z

Exactly what I was thinking!

jonseddon · 2019-08-01T10:08:12Z

We bumped into this ticket while looking at a related project. Can I check what would happen when you save a cubelist rather than a cube? If cubes in the cubelist had different values of the same global attribute, how would this be saved? If this saved netCDF file was loaded back into Iris would we get the same cubelist and if we didn't, would this matter?

bjlittle · 2019-08-01T10:50:38Z

@jonseddon Good question...

Clearly, if there is a global attribute conflict across Cubes in the CubeList, then it wouldn't be possible to save any such conflicted attribute to the netCDF file.

However, it begs the question whether a CubeList should also have state for overriding attributes on save. Again, careful consideration is required here to understand the overall behaviour and whether that's appropriate. In particular, consideration is required also for loading from multiple different netCDF files...

To be honest, I'd opt to separate concerns here. I'd see the debate about CubeList having attributes state as an extension to this proposal - but there should certainly be clarity for the behaviour when there is a conflict for global attributes of Cubes in a CubeList as it stands here.

ehogan · 2019-08-01T15:02:14Z

I had a question about this :) Does it make sense to add global attributes and variable attributes, which I would argue are netCDF-specific concepts, to the cube, which is meant to be format-agnostic?

zklaus · 2019-08-01T15:13:02Z

Re cubelists: The question is certainly a good one. Note that right now there is no guarantee that loading a single cube from a netcdf file and saving it again will give you the same file. For example,

if there is a global attribute comment and a local attribute comment, the local attribute will essentially overwrite the global one and end up as the only comment attribute in the final file in the global section
a global attribute that is one of the special local attributes in iris will end up in the variable section
a local attribute that is not recognized as special will end up in the global section.

Seeing as it seems to me that a lot (most?) data has one variable per file (notably of course all cmip and cordex data) I am not sure I would be worried about consistency in cubelist storing so much, at least until we have better consistency in cube storing. Though it is certainly a good idea to keep this in mind so as not to make unnecessary outright contradictory decisions.

Re format agnosticism: That is certainly a nice goal, but maybe it needs better definition? It seems that attributes in and of themselves are unsupportable in, eg grib and derived formats. Surely we don't want to abandon them completely. So, is there a format that has attribute support, is supported by iris, and could not be made to work with this model?

bjlittle · 2019-08-01T20:27:37Z

@ehogan From a purely idealistic perspective, I'd agree with you. A Cube should be format agnostic. However, in reality, that's not really the case.

For me it's an intention at best, rather than a hard and fast rule. Consider the special way that we handle PP STASH and NetCDF var_name. These fileformat specifics have crept into the way that we deal with cubes and coordinates, along with other CF-isms that may only make sense for NetCDF. The reason this has happened is that there is tangible utility or benefit behind it - so it's a common sense compromise in my opinion. However, we do try hard not to dilute our attempts to be as agnostic as possible.

I don't know if this helps answer your question...

pp-mo · 2019-08-02T09:25:04Z

Consider the special way that we handle PP STASH and NetCDF var_name

Actually we should have something in GRIB space too, but we don't.
Just plugged a suggestion here : SciTools/iris-grib#153

bjlittle added Type: Enhancement Release: Major Status: Decision Required Type: Infrastructure labels Jun 5, 2019

bjlittle assigned bjlittle and unassigned bjlittle Jun 6, 2019

bjlittle added Release: Minor and removed Status: Decision Required Release: Major labels Jun 11, 2019

bjlittle added this to the v2.3.0 milestone Jun 11, 2019

bjlittle added the Experience: High label Aug 1, 2019

bjlittle modified the milestones: v2.3.0, v3.1.0 Nov 13, 2019

bjlittle added the Sprint: Refine me label Nov 13, 2019

trexfeathers modified the milestones: Candidate for next release, v3.7 Jun 28, 2023

trexfeathers added the Dragon 🐉 https://github.com/orgs/SciTools/projects/19?pane=info label Jul 10, 2023

trexfeathers added this to 🐉 Dragon Taming Jul 10, 2023

trexfeathers assigned lbdreyer Jul 10, 2023

trexfeathers moved this to 🚧 In Development in 🐉 Dragon Taming Jul 10, 2023

trexfeathers unassigned lbdreyer Aug 3, 2023

trexfeathers modified the milestones: v3.7, v3.8 Aug 16, 2023

stephenworsley added this to 🐙Iris v3.8.0 Sep 28, 2023

stephenworsley moved this to 🆕 New - potential tasks in 🐙Iris v3.8.0 Sep 28, 2023

stephenworsley moved this from 🆕 New - potential tasks to Candidate for next sprint in 🐙Iris v3.8.0 Oct 5, 2023

stephenworsley moved this from Candidate for next sprint to 📋 Backlog in 🐙Iris v3.8.0 Oct 5, 2023

stephenworsley assigned trexfeathers Oct 5, 2023

stephenworsley moved this from 📋 Backlog to 👀 In review in 🐙Iris v3.8.0 Oct 12, 2023

trexfeathers modified the milestone: v3.8 Nov 14, 2023

trexfeathers assigned ESadek-MO and unassigned trexfeathers Nov 21, 2023

ESadek-MO closed this as completed in #5152 Nov 21, 2023

github-project-automation bot moved this from ⚔ In Development to 💰 Finished in 🐉 Dragon Taming Nov 21, 2023

github-project-automation bot moved this from 👀 In review to 🏁 Done in 🐙Iris v3.8.0 Nov 21, 2023

github-project-automation bot moved this from 🆕 New to ✅ Done in ESMValTool Nov 21, 2023

github-project-automation bot moved this from 📚 Backlog to 🏁 Done in 🌍 ESMValTool Surgery (Discussion Topics) Nov 21, 2023

github-project-automation bot moved this from 📋 Backlog to 🏁 Done in 🦊 Iris v3.7.0 Nov 21, 2023

github-project-automation bot moved this from To Do to Done in Iris v3.2.0 Nov 21, 2023

github-project-automation bot moved this to Done in 🚴 Peloton Nov 21, 2023

scitools-ci bot removed this from 🚴 Peloton Dec 15, 2023

scitools-ci bot added this to 🚴 Peloton Dec 15, 2023

scitools-ci bot removed this from 🚴 Peloton Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NetCDF global attributes vs data variable local attributes #3325

NetCDF global attributes vs data variable local attributes #3325

bjlittle commented Jun 5, 2019 •

edited by pp-mo

Loading

bjlittle commented Jun 5, 2019

zklaus commented Jun 5, 2019

bjlittle commented Jun 6, 2019 •

edited

Loading

zklaus commented Jun 11, 2019

jonseddon commented Aug 1, 2019

bjlittle commented Aug 1, 2019 •

edited

Loading

ehogan commented Aug 1, 2019 •

edited

Loading

zklaus commented Aug 1, 2019

bjlittle commented Aug 1, 2019

pp-mo commented Aug 2, 2019

NetCDF global attributes vs data variable local attributes #3325

NetCDF global attributes vs data variable local attributes #3325

Comments

bjlittle commented Jun 5, 2019 • edited by pp-mo Loading

bjlittle commented Jun 5, 2019

zklaus commented Jun 5, 2019

bjlittle commented Jun 6, 2019 • edited Loading

zklaus commented Jun 11, 2019

jonseddon commented Aug 1, 2019

bjlittle commented Aug 1, 2019 • edited Loading

ehogan commented Aug 1, 2019 • edited Loading

zklaus commented Aug 1, 2019

bjlittle commented Aug 1, 2019

pp-mo commented Aug 2, 2019

bjlittle commented Jun 5, 2019 •

edited by pp-mo

Loading

bjlittle commented Jun 6, 2019 •

edited

Loading

bjlittle commented Aug 1, 2019 •

edited

Loading

ehogan commented Aug 1, 2019 •

edited

Loading