-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper metadata for CICE grid files #7
Comments
I applaud this. |
It would be good to get @flicj191's opinion about adding a CRS string, and what would be appropriate |
Though I'm not sure of the detail here, I'd say you'd want to keep CRS information so you can map, transform and work with other data etc. Is the CF standard for lat/lon coordinate system WGS84 and that would be assumed here? And also when adding a CRS string you'd use a spatial library to get the full CRS definition and can refer using a short name or maybe epsg code eg 4326 but in other files I've seen its mainly been a name of the CRS then using a python spatial library if anything needs to be done to it. |
Some of the data available through cj50/ the nci thredds server is marked with 4326, so I would assume that's correct. I was surprised because I thought the tripole grid might mess with this, but apparently not. I guess the SRS was added to make the OMIP output a bit more tidy, but well need to double check it. |
@aidanheerdegen Do you know how to do this? i.e.- is there a package we should use to add the CF-metadata or add it "by-hand" as attributes |
I do not. @bschroeter may have opinions, he wrote this: https://github.com/AusClimateService/axiom And @flicj191 is our resident GIS expert, so is probably the best placed to comment on what is required for our data to be usable in geospatial applications. |
If I understand correctly, it's the grid mapping that is required, and so as per 5.8 or 5.9 here: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch05s06.html there's a dummy scalar int var "crs" where you put the datum details (or otherwise a named var where to put further params for an actual coordinate system). Geospatial folks would wish that simply the full WKT was always stored rather than being atomized into parameters, but I think that's too much to wish for given that PROJ strings ruled the waves until PROJ library version 6 established a new norm (you need the full specification and certain assumptions from short forms are deprecated or at least frowned upon). There's multiple pairs of lon,lat in the grid, and so I guess you specify on any variable that uses those the "grid_mapping" attribute, and it can stand in for the details of any lon, lat pairing. Can someone show me a file that uses the grid in I will try adding the attributes as I understand them and test in various geo contexts. And, I must admit I'm not much of a CF-adherent, more of a "how can I get geospatial to interpret this ..."-adherent, and I've added a few PRs to GDAL itself to enable some workarounds, but that can be pretty slow and maybe too conservative. So, apologies if this is noise and I'm just complicating it, but fwiw I wrote a hand-crafted request that they add similar grid mapping to the daily 0.25 Reynolds OISST (they replied and said it would be considered for future updates): https://github.com/mdsumner/fixoisst Also, this is my little patch to GDAL to allow configuring "assume the thing has a longlat crs ...", but it's a bit too conservative, misses important cases (, and who will even find out about it anyway ... but, it's useful to me now). This for me is one of those "who can even talk to about this ..." topics, so I appreciate being brought in and maybe I can help a little. 🙏 (it's also an issue before we get to the part about how a geospatial raster can be derived from this non-regular grid, but that's relatively easy and quite separate to the main topic here, I think). |
Thanks Mike! There are two things we needed to add:
There is nothing there to capture the tripole ... maybe that is ok.
Will having multiple variables identified as lat/lon can be annoying for analysis? Most of our output variables are at the T points but others are at the U points (I guess we need to make sure we are encoding this correctly in the output - it is sort of in the cell_measures attribute from OM2).
The actual values are propagated in the output files, but not in a particularly useful way, as a land(-ish)-mask has been applied (i.e. it doesn't save the values for the cells where no calculations were done).
This seems similar to the ESMValTool CMORise step to add the CF metadata and standardise the units, although they don't seem too worried about the projection information (I guess its basically always the same in our usecase). |
appreciate the answers, and questions - I'll get back to this at a later time just can't do it atm |
I realised that using radians for longitude and latitude is not CF-compliant, so that doesn't help our cause. We could:
Edit: or we could just label them using the standard_name='latitude'/'longitude' as a best effort compiance rather than a strict compliance |
I haven't quite had a chance to get back to this, will have a closer look soon |
I think you're right about the crs dummy var, that's enough as far as I'm concerned. I see that the relationship is already there from a GDAL perspective though, because it's recorded in the "coordinates" attribute. GDAL picks up TLON and TLAT or ULON and ULAT appropriately as the geolocation arrays:
(I'm using the fileServer because I can't get on gadi rn). So I don't know how import the grid mapping is in comparison, but if your changes are adding CF-compliance I think that's sufficient. Using the warp API with those sources will automatically resolve to a 0,360 longlat regular grid, or to a target spec (more important for downstream issues is the appropriateness or otherwise for what that means for the quantities when rotation from the native is involved - I'd like this to get attention in GDAL itself, or at least have warnings issued). At this stage I feel like I'm mostly just following along as you go here, but a chat in person would be good at some point. We can learn a lot from the experiences here. |
That is mighty interesting. I can't see anything in rioxarray for example has not picked up any georeferencing. It looks like maybe it has guessed? To quote the link:
There are multiple ways to add the grid projection, but I'll try using the cf preferred method. |
that's a good point, this is TLON and TLAT as data and without that dummy "crs" attribute it doesn't have it stored anywhere (i.e. what if it's degrees_east/west spherical?). I'll dig into this. There are other contexts where the NetCDF driver doesn't helpfully assume that it's a longlat crs for the regular grid derived from lon/lat 1D arrays (basic OISST netcdf files do not, for example). |
I tried setting the crs like this:
But im not sure it helped, rioxarray doesn't do anything clever with it. GDAL doesn't complain, but are these corners correct?
I am still confused about the utility of adding a crs in this case though, the geolocation array is the source of truth. We don't expect a program to be able to location the grid based only on the crs. |
in GDAL context this is "ungeoreferenced", only the potential is there for creating a traditional raster from it because of the references to those geolocation array. ( I had meant to tell this part of the story a bit more ... it's why from my perspective it's only the CF compliance that really matters for your purposes, because without actually regridding or turning this into a point data set there's not much here for a straight read into geospatial). For geospatial it's already in good shape (for the next step) To materialize a regular grid or write a virtual file we need the warper:
that will then be understood by rioxarray or QGIS etc:
rioxarray and other cubey-friends do this warping task when resolving Sentinel and Landsat from different crs, but they don't yet apply that resource to climate grids like these or xarray generally. the low-res result as a "geospatial raster" image library(terra)
(r <- rast("~/regular0_360.vrt"))
class : SpatRaster
dimensions : 180, 360, 1 (nrow, ncol, nlyr)
resolution : 1, 1 (x, y)
extent : 0, 360, -90, 90 (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326)
source : regular0_360.vrt
name : regular0_360 |
Yeah ok. So the important bit is specifying the coordinates attribute for each variable, so gdal can auto-detect those? And then we should also have the CRS for completeness, so its known 'how' the values of those coordinates are defined. This seems a bit counter-intuitive ... the CRS is not for the model grid or the netcdf dimensions, its for the coordinates (which are just variables) however the CRS is defined as a global attribute for the netcdf file. |
it is counter intuitive indeed, there's no geospatial standard for it (sadly, because the software available is pretty good). I think of it as a raster but the coordinates are also data, it's not just "points", but nothing in trad geo really handles it (though, weirdly modern cube tooling has gone ballistic on it because of all those utm crs grids) Closest to a standard is probably mdal, that at least is well supported in QGIS. (what I'm not clear on yet is if the QGIS map will automatically resolve these Cosima grids, it might - and if not it won't take much, a thought I just had). |
Hi Mike. Thanks for all you help on this. Do you know how to set the SRS for the geolocation array? I set grid_mapping for the variables, and the crs_wkt for that grid mapping ... but they haven't affected the geolocation array! See below ... Coordinate System is set Geolocation SRS is set differently! (See lines in Bold) ! gdalinfo NETCDF:1deg/grid.nc:uarea Driver: netCDF/Network Common Data Format crs#grid_mapping_name=tripolar_latitude_longitude |
I'll need to delve into a netcdf creation and investigation in GDAL itself, but - meta-question - why do you want a CRS that's using radians? I'll delve into this, and try setting the crs in different ways - |
The CICE model uses input data that is in radians. The lon/lat included in the history output is in degrees, but it has a land mask applied, and having missing values doesn't play nice with analysis tools. So having a file with the CRS set to load the grid from could be handy but maybe we don't need it. i.e. the GDAL defaults just work (as you've shown).
This is the ncdump output
I've seperately also tried setting the SRS for the 'GEOLOCATION' metadata using GDAL, and saving to netcdf, and it looked liked this:
But setting GDAL_SRS as a global attribute in the netcdf file didn't seem to work the other way, the GDAL_SRS attribute wasn't read in to make the SRS for the Geolocation array. |
I think there's a real problem here in gdal, I'll take my time to explore before I report 🙏 |
Thankyou - it looks a bit odd but we are well into corner case territory here! I started look at the GDAL source but didn't dig too far. There is this set of messages when I ran gdalinfo --config CPL_DEBUG ON :
|
I think it's a bug here, hardcoded WGS84 SRS: https://github.com/OSGeo/gdal/blob/35c434527aab4acb86af66480db00d4fec4b5400/frmts/netcdf/netcdfdataset.cpp#L4805 I've got a bit more to do to demonstrate it properly though ... |
it's now fixed this led me to trying a few things, and realizing some stuff - one part is some grid resolving tasks are not purely by setting the crs, so a rotated pole can't be expressed as crs wkt (iiuc), when that works it's because the gdal netcdf driver knows to look for the grid mapping params this mean i need to review a few things and summarize with a few examples (also we can't have geolocation arrays that are projected, not that anyone ever would but I'd kind of thought that was possible in principle, but if you try that it triggers "more normal" registration logic, not the warper) probably not helpful but I'm hoping to illustrate a few cases, and in case it triggers discussions 🙏 |
Thankyou!
Yes. If we could describe the tripole using a crs_wkt then we skip the whole geolocation array for netcdf and life would be easier!
This inability to have projected geolocation arrays is slightly counter-intuitive. But I guess there just isn't a use case for it.
Useful as always :). Drop past next time you are at IMAS if you like. |
This is a fun one, apparently two projections in one - I wonder if tripolar is somehow a longlat coord array expression of this? https://polar.ncep.noaa.gov/global/about/ https://lists.osgeo.org/pipermail/gdal-dev/2024-April/058827.html |
it's not obvious, there's still stretchy in the south when it's Mercator ... but maybe that's applied after the fact, in the north it's not obvious either, I though it could be transverse mercator - but maybe it's a rotated pole? Time for me to look at the grid design/history, and maybe that's been pointed to above or in related discussions?. I'll see if these wild speculations do apply to HYCOM. I'm throwing some guesses around, because if the two parts of the grid were expressible in trad GIS regular form, it would give a fairly straightforward way to deal with them (in auto-reprojecting software at least). But, even if this kind of scheme is the original design, it seems like there's been some tweaks along the way that can't be parameterized. (We still can wrap up user-friendly approaches though, as per #328). |
That looks really similar to the ACCESS-OM grids. In the Mercator part, sometimes the grids have non-uniform latitude spacing (i.e. maybe in HYCOM south of 60S?) which makes it hard I think? Also, I've never found a PROJ / crs code for the two north poles? |
Metadata implemented via We might demonstrate a use case: |
CICE
grid.nc
files have historically had poor CF compliant metadata.Case in point, this 1 degree CICE grid
/g/data/vk83/experiments/inputs/access-om2/ice/grids/global.1deg/2020.05.30/grid.nc
(same file as/g/data/ik11/inputs/access-om2/input_20200530/cice_1deg/grid.nc
).cf_xarray
is completely unable to intuit any CF coordinate information:While work is being done to update the grids the metadata should be improved.
The text was updated successfully, but these errors were encountered: