Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images #3128

Merged
merged 42 commits into from
Sep 29, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Mar 20, 2024

This PR implements the to_dataarray method for GMT_IMAGE.

Similar to #2398 but for images.

Need to wait for #3446.

@seisman seisman added the enhancement Improving an existing feature label Mar 20, 2024
Minimal xarray.DataArray output with data and coordinates, no metadata yet.
pygmt/clib/session.py Outdated Show resolved Hide resolved
@seisman
Copy link
Member Author

seisman commented Apr 14, 2024

xref GenericMappingTools/gmt#3299

pygmt/datatypes/image.py Outdated Show resolved Hide resolved
Comment on lines +104 to +105
name=header.name,
attrs=header.data_attrs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image name is currently hardcoded to z, is that ok for an RGB image?

@property
def name(self) -> str:
"""
Name of the grid.
"""
return "z"

The attrs fields might need some work. I'm getting 'actual_range': array([ 1.79769313e+308, -1.79769313e+308])} when loading the @earth_day_01d image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree they make no sense, but they're consistent with the behavior in GMT.

gmt grdinfo @earth_day_01d
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Title: Grid imported via GDAL
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Command:
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Remark:
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Pixel node registration used [Geographic grid]
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Grid file format: gd = Import/export through GDAL
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: x_min: -180 x_max: 180 x_inc: 1 name: x n_columns: 360
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: y_min: -90 y_max: 90 y_inc: 1 name: y n_rows: 180
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: v_min: 1.79769313486e+308 v_max: -1.79769313486e+308 name: z
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: scale_factor: 1 add_offset: 0
/Users/seisman/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Default CPT:
+proj=longlat +R=6378137 +no_defs

The GMT's image support was likely added by Joaquim so that you may ping him for more information.

weiji14 added 4 commits June 20, 2024 21:36
Reorder the dimensions to follow Channel, Height, Width (CHW) convention. Also added doctest checking output DataArray object and the image's x and y coordinates.
Get the registration and gtype info from the grid header and apply it to the GMT accessor attributes.
Trying to match some of the doctests in _GMT_GRID.
Remove hardcoded attribute that was only meant for NetCDF files, so that GeoTIFF files won't have it.
pygmt/datatypes/header.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/datatypes/image.py Outdated Show resolved Hide resolved
@seisman seisman changed the title WIP: Wrap GMT's standard data type GMT_IMAGE for images WIP: Implemention GMT_IMAGE.to_dataarray method Jul 27, 2024
@seisman seisman marked this pull request as draft September 20, 2024 15:19
@seisman seisman changed the base branch from main to ctypesarray September 20, 2024 16:02
Base automatically changed from ctypesarray to main September 24, 2024 02:10
@seisman seisman changed the title GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images Sep 28, 2024
@seisman
Copy link
Member Author

seisman commented Sep 28, 2024

I think this PR is almost ready for review. Here are the known limitations/issues:

  • Currently, only 3-band images are supported. [We can work on 1-band and 4-band images in a separate PR to make it easier to review]
  • As mentioned in GMT_IMAGE: Implement the GMT_IMAGE.to_dataarray method for 3-band images #3128 (comment), the name is hardcoded to z and the actual_range doesn't work. [For the hardcoded name z, I'm not sure if we can do better; for actual_range, I think we can remove it because (1) actual_range makes no sense for 3-band RGB images; (2) actual_range is the standard attribution defined by CF-1.7 convention, but it's only for netCDF files, not for images; (3) the xarray.DataArray returned by rioxarray.open_rasterio doesn't have the actual_range attribution].
  • For the y coordinates, both GMT API and rioxarray return a descending y array, but in _GMT_GRID.to_dataarray() method, we flip the y coordinate to make it ascending. Do we want to do the same thing for images?

@seisman seisman added the needs review This PR has higher priority and needs review. label Sep 28, 2024
@seisman seisman marked this pull request as ready for review September 28, 2024 14:31
Comment on lines 148 to 150
title:
history:
description:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't add these attributes if they're empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These come from the header no? For GeoTIFF files, they don't make sense since GMT is not parsing the TIFF tags, but the title/history/description might show up for 3-band NetCDF files (if those can be passed into this GMT_IMAGE struct).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, title/history/description are recommended attributes of CF-1.7. So, for netCDF files, I guess we should always show them even if they're empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those are only needed for CF-1.7, then maybe we can indent the 3 attrs["..."] lines here to be inside the if self.type in {GridFormat.NX} block?

if self.type in {
GridFormat.NB,
GridFormat.NS,
GridFormat.NI,
GridFormat.NF,
GridFormat.ND,
}: # Only set the 'Conventions' attribute for netCDF.
attrs["Conventions"] = "CF-1.7"
attrs["title"] = self.title.decode()
attrs["history"] = self.command.decode()
attrs["description"] = self.remark.decode()

(2) actual_range is the standard attribution defined by CF-1.7 convention, but it's only for netCDF files, not for images;

Same here, we can put attrs["actual_range"] under the if-block above:

attrs["actual_range"] = np.array([self.z_min, self.z_max])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done at 62f0ce0.

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Currently, only 3-band images are supported. [We can work on 1-band and 4-band images in a separate PR to make it easier to review]

Yep, just do 3-band images here for now.

  • For the y coordinates, both GMT API and rioxarray return a descending y array, but in _GMT_GRID.to_dataarray() method, we flip the y coordinate to make it ascending. Do we want to do the same thing for images?

I'd prefer descending y to match that the output of rioxarray.open_rasterio. In #2398 (comment), we kept the ascending y for GMT_GRID for pragmatic reasons, namely to match the output of xr.load_datarray so that the unit tests won't break. But I think the convention for most xarray.DataArray objects is to have descending y and ascending x.

Comment on lines 172 to 173
>>> da.gmt.registration, da.gmt.gtype
(1, 0)
Copy link
Member

@weiji14 weiji14 Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gtype for earth_day_01d should be Geographic (1), not Cartesian (0).

Suggested change
>>> da.gmt.registration, da.gmt.gtype
(1, 0)
>>> da.gmt.registration, da.gmt.gtype
(1, 1)

according to:

$ gmt grdinfo @earth_day_01d
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Title: Grid imported via GDAL
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Command: 
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Remark: 
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Pixel node registration used [Geographic grid]
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Grid file format: gd = Import/export through GDAL
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: x_min: -180 x_max: 180 x_inc: 1 name: x n_columns: 360
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: y_min: -90 y_max: 90 y_inc: 1 name: y n_rows: 180
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: v_min: 1.79769313486e+308 v_max: -1.79769313486e+308 name: z
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: scale_factor: 1 add_offset: 0
~/.gmt/server/earth/earth_day/earth_day_01d_p.tif: Default CPT: 

We either need to modify the logic here:

gtype = 1 if dims[0] == "lat" and dims[1] == "lon" else 0

Or I can manually override the gtype in the load_blue_marble function which I'm trying to finish up at #2235.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/GenericMappingTools/gmt/blob/7809736ba32d87a4a96b15444419eb176c6a35f3/src/gmt_grdio.c#L1360

This is the function (again, not a public API function) that GMT uses to determine the grid type. The logic here is not that complicated to duplicate in Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How easy would it be to make that C function public actually?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How easy would it be to make that C function public actually?

It doesn't seem too difficult. But we still need to implement our own functions if we want to be compatible with GMT 6.4-6.5.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it'll be some time before GMT 6.6 comes out. I just want the gtype determination to be consistent between GMT and PyGMT, and the best case is to bind to the GMT C function. But we can also reimplement it in PyGMT for now (in a separate PR), maybe in header.py?

Copy link
Member Author

@seisman seisman Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is to wrap the "hidden", private GMT_GRID_HEADER_HIDDEN structure https://github.com/GenericMappingTools/gmt/blob/7809736ba32d87a4a96b15444419eb176c6a35f3/src/gmt_hidden.h#L151. The structure has many members but we can only wrap the first few members. The members that are most useful to us are: grdtype (Cartesian or geographic), BC (boundary condition), varname (the actual variable name so that we don't have to hardcode the grid name to z), and cpt (the default CPT for this grid).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In f715aee, I've updated the logic for determining the grid/image gtype based on ProjRefPROJ4.

The logic of codes come from: https://github.com/GenericMappingTools/gmt/blob/7809736ba32d87a4a96b15444419eb176c6a35f3/src/gmt_grdio.c#L3778

Comment on lines 148 to 150
title:
history:
description:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those are only needed for CF-1.7, then maybe we can indent the 3 attrs["..."] lines here to be inside the if self.type in {GridFormat.NX} block?

if self.type in {
GridFormat.NB,
GridFormat.NS,
GridFormat.NI,
GridFormat.NF,
GridFormat.ND,
}: # Only set the 'Conventions' attribute for netCDF.
attrs["Conventions"] = "CF-1.7"
attrs["title"] = self.title.decode()
attrs["history"] = self.command.decode()
attrs["description"] = self.remark.decode()

(2) actual_range is the standard attribution defined by CF-1.7 convention, but it's only for netCDF files, not for images;

Same here, we can put attrs["actual_range"] under the if-block above:

attrs["actual_range"] = np.array([self.z_min, self.z_max])

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, thanks again @seisman! Excited to have direct 3-band read support in PyGMT without rioxarray! There's still a few tiny details that could be improved (around the attrs/metadata), but this looks really good already as an initial implementation. 🚀

@seisman seisman removed the needs review This PR has higher priority and needs review. label Sep 29, 2024
@seisman seisman merged commit 6606845 into main Sep 29, 2024
20 of 21 checks passed
@seisman seisman deleted the datatypes/gmtimage branch September 29, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants