Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**BREAKING** pygmt.grdcut: Refactor to store output in virtualfiles for grids #3115

Open
wants to merge 98 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 83 commits
Commits
Show all changes
98 commits
Select commit Hold shift + click to select a range
bcf43f0
Wrap GMT's standard data type GMT_IMAGE for images
seisman Mar 18, 2024
a052a1a
Initial implementation of to_dataarray method for _GMT_IMAGE class
weiji14 Mar 20, 2024
59d523c
pygmt.grdcut: Support both grid and image output
seisman Apr 16, 2024
56a6d65
Merge branch 'main' into datatypes/gmtimage
seisman Apr 17, 2024
3315324
Merge branch 'main' into gmtimage
seisman Apr 19, 2024
cea3374
Fix
seisman Apr 19, 2024
80d9837
Refactor
seisman Apr 19, 2024
22fba56
fix
seisman Apr 19, 2024
f71e79c
Merge branch 'main' into datatypes/gmtimage
weiji14 Jun 18, 2024
4cce4a2
Small typo fixes and add output type-hint for to_dataarray
weiji14 Jun 18, 2024
e02b650
Fix mypy error using np.array([0, 1, 2]) instead of np.arange
weiji14 Jun 18, 2024
f3d4b1f
Parse name and data_attrs from grid/image header
weiji14 Jun 18, 2024
4390136
Transpose array to (band, y, x) order and add doctest for to_dataarray
weiji14 Jun 20, 2024
5f25669
Set registration and gtype from header
weiji14 Jun 20, 2024
a3c6c14
Print basic shape and padding info in _GMT_IMAGE doctest
weiji14 Jun 20, 2024
5888e10
Only set Conventions = CF-1.7 attribute for NetCDF grid type
weiji14 Jun 20, 2024
798e658
Merge branch 'main' into datatypes/gmtimage
weiji14 Jun 20, 2024
3dbf2f2
Remove rioxarray import
weiji14 Jun 20, 2024
3a24ebd
Apply suggestions from code review
seisman Jun 20, 2024
4eee7e6
Merge branch 'main' into gmtimage
seisman Jun 20, 2024
5e390d4
Address reviewer's comments
seisman Jun 20, 2024
003383d
Fix GMT_OUT
seisman Jun 21, 2024
606ac7e
Merge branch 'main' into gmtimage
seisman Jun 21, 2024
c6cdcc8
Merge branch 'main' into gmtimage
seisman Jul 7, 2024
377941a
Revert changes for _GMT_IMAGE
seisman Jul 7, 2024
20617f5
Use rioxarray.open_rasterio for loading images
seisman Jul 7, 2024
a998718
Check if rioxarray is installed
seisman Jul 7, 2024
86cab44
Improve grdcut
seisman Jul 7, 2024
6031bab
Fix typos in grdcut
seisman Jul 7, 2024
eb0af2d
Add tests for grdcut images
seisman Jul 7, 2024
7f6ca7d
Fix one failing test
seisman Jul 7, 2024
21b194a
Fix open_rasterio
seisman Jul 7, 2024
e7eaf5c
Fix open_rasterio
seisman Jul 7, 2024
e3c8569
Make sure the image is loaded
seisman Jul 7, 2024
1c8312c
Update pygmt/clib/session.py
seisman Jul 7, 2024
3913430
Use rioxarray.open_rasterio in a context manager
seisman Jul 8, 2024
812a225
Merge branch 'main' into gmtimage
seisman Jul 8, 2024
90bd29e
Merge remote-tracking branch 'origin/gmtimage' into gmtimage
seisman Jul 8, 2024
ab77187
Fix mypy errors
seisman Jul 8, 2024
6f3e474
Move grdcut image tests to a separate test file
seisman Jul 8, 2024
5b07dd9
Fix copy & paste errors
seisman Jul 8, 2024
31272ab
Run codspeed benchmark for test_grdcut_image_dataarray
seisman Jul 8, 2024
6b860bf
Merge branch 'main' into datatypes/gmtimage
seisman Jul 27, 2024
5a09329
Merge branch 'main' into gmtimage
seisman Aug 5, 2024
279595b
Add the raster_kind function to determine the raster kind
seisman Aug 5, 2024
7def4b5
Simplify the grdcut function
seisman Aug 5, 2024
be175d8
Merge branch 'main' into gmtimage
seisman Sep 19, 2024
0bf9368
Merge branch 'main' into datatypes/gmtimage
seisman Sep 19, 2024
7d437be
Use enum for grid ids
seisman Sep 19, 2024
268e34e
Fix the band. Starting from 1
seisman Sep 19, 2024
86765e1
Refactor the tests for images
seisman Sep 19, 2024
86f3ffa
In np.reshape, a is a position-only parameter
seisman Sep 20, 2024
cc28247
Improve tests
seisman Sep 20, 2024
1e2c973
Fix one failing doctest due to xarray changes
seisman Sep 20, 2024
734dc28
The np.reshape's newshape parameter is deprecated
seisman Sep 20, 2024
919dc00
Define grid IDs using IntEnum instead of Enum
seisman Sep 20, 2024
b1eacf1
Pass the new shape as a positional parameter
seisman Sep 20, 2024
aa4fdc9
Fix failing tests
seisman Sep 20, 2024
c87a3ec
One more fix
seisman Sep 20, 2024
a20d8a2
One more fix
seisman Sep 20, 2024
926427b
Simplify a doctest
seisman Sep 20, 2024
c73328e
Improve the tests
seisman Sep 20, 2024
2825eae
Merge branch 'datatypes/gmtimage' into gmtimage
seisman Sep 20, 2024
bf9275c
Remove the workaround for images
seisman Sep 20, 2024
fb97daa
Convert ctypes array to numpy array using np.ctypeslib.as_array
seisman Sep 20, 2024
15b8d53
Fix the incorrect value due to floating number conversion in sphinter…
seisman Sep 20, 2024
8433e78
Merge branch 'ctypesarray' into datatypes/gmtimage
seisman Sep 20, 2024
3e3a6f3
Update the to_dataarray method to match the codes in GMT_GRID
seisman Sep 20, 2024
12ef40a
image data should has uint8 dtype
seisman Sep 20, 2024
f64fbb8
Further improve the tests
seisman Sep 21, 2024
e9cb0a5
Merge branch 'datatypes/gmtimage' into gmtimage
seisman Sep 21, 2024
4f2ae48
Merge branch 'main' into datatypes/gmtimage
seisman Sep 24, 2024
d49afed
Add a note that currently only 3-band images are supported
seisman Sep 24, 2024
a97d0b3
Apply suggestions from code review
seisman Sep 28, 2024
f70bec0
Merge branch 'main' into datatypes/gmtimage
seisman Sep 28, 2024
2fd13fb
Remove the old GMTGridID enums from pygmt/datatypes/header.py
seisman Sep 28, 2024
9972ba1
A minor fix
seisman Sep 28, 2024
ac6b7c3
Merge branch 'datatypes/gmtimage' into gmtimage
seisman Sep 28, 2024
7c32d41
Merge branch 'main' into gmtimage
seisman Sep 29, 2024
9ec00be
Let _raster_kind return grid by default
seisman Sep 29, 2024
f3a2f8e
Simplify the grdcut image tests
seisman Sep 29, 2024
3c12e2b
Add one more test for file in & file out
seisman Sep 29, 2024
f852b0d
Fix typos
seisman Sep 29, 2024
5f7683c
Merge branch 'main' into gmtimage
seisman Sep 30, 2024
bb1a0b0
Use the new load_blue_marble function
seisman Sep 30, 2024
584b5af
Drop the spatial_ref coord
seisman Sep 30, 2024
b19ba00
Merge branch 'main' into gmtimage
seisman Nov 22, 2024
cceb929
Merge branch 'main' into gmtimage
seisman Nov 25, 2024
fac51f1
Update _raster_kind
seisman Nov 25, 2024
517ee52
Revert "Update _raster_kind"
seisman Dec 9, 2024
d5ec308
Merge branch 'main' into gmtimage
seisman Dec 9, 2024
7ca58fb
Remove the _raster_kind function
seisman Dec 9, 2024
66580d4
Add the 'kind' parameter
seisman Dec 9, 2024
194ee90
Minor update
seisman Dec 9, 2024
473428c
Avoid keep the file open
seisman Dec 9, 2024
84b2ed4
Remove one unnecessary pytest.skipif marker
seisman Dec 9, 2024
7968096
Add it back because we still need rioxarray
seisman Dec 9, 2024
90d5210
Merge branch 'main' into gmtimage
seisman Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions pygmt/clib/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -1942,6 +1942,7 @@ def virtualfile_out(
"grid": ("GMT_IS_GRID", "GMT_IS_SURFACE"),
"image": ("GMT_IS_IMAGE", "GMT_IS_SURFACE"),
}[kind]
# For unknown reasons, 'GMT_OUT' crashes for 'image' kind.
direction = "GMT_OUT|GMT_IS_REFERENCE" if kind == "image" else "GMT_OUT"
with self.open_virtualfile(family, geometry, direction, None) as vfile:
yield vfile
Expand Down Expand Up @@ -2311,3 +2312,35 @@ def extract_region(self) -> np.ndarray:
if status != 0:
raise GMTCLibError("Failed to extract region from current figure.")
return region


def _raster_kind(raster: str) -> Literal["grid", "image"]:
"""
Determine the raster kind.

Examples
--------
>>> _raster_kind("@earth_relief_01d")
'grid'
>>> _raster_kind("@static_earth_relief.nc")
'grid'
>>> _raster_kind("@earth_day_01d")
'image'
>>> _raster_kind("@hotspots.txt")
'grid'
"""
# The logic here is because: an image can be read into a grid container, but a grid
# can't be read into an image container. So, try to read the file as an image first.
# If fails, try to read it as a grid.
Copy link
Member Author

@seisman seisman Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the logic here is not 100% correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • A grid can be read into an image container, as long as another libgdal package (maybe libgdal-netcdf or libgdal-hdf5) is installed. xref:

The first try-except check is to try reading into a GMT_IMAGE struct, and return "image" if there are 3 bands, and "grid" if there is only 1 band. Even if grids can be read into GMT_IMAGE, we will still call it a grid if it has 1-band right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to refresh my memory about what's happening:

  • Reading a grid into GMT_GRID: Works!
  • Reading an image into GMT_GRID: Only the first band is read.
  • Reading a grid into GMT_IMAGE: It depends on libgdal-hdf5 being installed. Read as an single-band image if installed, otherwise raise errors.
  • Reading an image into GMT_IMAGE: works.
import pygmt
pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])

If libgdal-hdf5 is not installed, _raster_kind first read it as an image, which fails and reports following errors:

ERROR 4: `/home/runner/.gmt/server/earth/earth_relief/earth_relief_01d_p.grd' not recognized as being in a supported file format. It could have been recognized by driver HDF5, but plugin gdal_HDF5.so is not available in your installation. You may install it with 'conda install -c conda-forge libgdal-hdf5'
Error: ession [ERROR]: Unable to open earth_relief_01d_p.grd.
Error: ession [ERROR]: ERROR reading image with gdalread.
pygmt-session (gmtapi_import_image): Could not read from file [earth_relief_01d_p.grd]
[Session pygmt-session (43)]: Error returned from GMT API: GMT_IMAGE_READ_ERROR (22)
[Session pygmt-session (43)]: Error returned from GMT API: GMT_IMAGE_READ_ERROR (22)

then it tries to read it as a grid, which succeeds. But we still see the error messages above. I believe we need to find a way to suppress these errors or find an alternative way to determine the data kind.

with Session() as lib:
try:
img = lib.read_data(infile=raster, kind="image", mode="GMT_CONTAINER_ONLY")
return "image" if img.contents.header.contents.n_bands == 3 else "grid"
except GMTCLibError:
pass
try:
_ = lib.read_data(infile=raster, kind="grid", mode="GMT_CONTAINER_ONLY")
return "grid"
except GMTCLibError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checking that there's not a function in GMT C already to determine grid/image type? I know there's GMT_Inquire_VirtualFile for virtualfiles, but is there not one for just files? I saw this GMT_Get_Info, but unsure if it'll be usable here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/GenericMappingTools/gmt/blob/7809736ba32d87a4a96b15444419eb176c6a35f3/src/gmt_grdio.c#L3377

gmt_raster_type is the function that GMT uses to determine the raster type, but it's not a public API function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oo, magic byte parsing 😆 If that function becomes public someday, maybe we'll use it.

return "grid" # Fallback to "grid" and let GMT determine the type.
39 changes: 25 additions & 14 deletions pygmt/src/grdcut.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,21 @@

import xarray as xr
from pygmt.clib import Session
from pygmt.clib.session import _raster_kind
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import (
GMTTempFile,
build_arg_list,
data_kind,
fmt_docstring,
kwargs_to_strings,
use_alias,
)
from pygmt.io import load_dataarray

__doctest_skip__ = ["grdcut"]


@fmt_docstring
@use_alias(
G="outgrid",
R="region",
J="projection",
N="extend",
Expand All @@ -28,9 +28,9 @@
f="coltypes",
)
@kwargs_to_strings(R="sequence")
def grdcut(grid, **kwargs) -> xr.DataArray | None:
def grdcut(grid, outgrid: str | None = None, **kwargs) -> xr.DataArray | None:
r"""
Extract subregion from a grid.
Extract subregion from a grid or image.

Produce a new ``outgrid`` file which is a subregion of ``grid``. The
subregion is specified with ``region``; the specified range must not exceed
Expand Down Expand Up @@ -100,13 +100,24 @@
>>> # 12° E to 15° E and a latitude range of 21° N to 24° N
>>> new_grid = pygmt.grdcut(grid=grid, region=[12, 15, 21, 24])
"""
with GMTTempFile(suffix=".nc") as tmpfile:
with Session() as lib:
with lib.virtualfile_in(check_kind="raster", data=grid) as vingrd:
if (outgrid := kwargs.get("G")) is None:
kwargs["G"] = outgrid = tmpfile.name # output to tmpfile
lib.call_module(
module="grdcut", args=build_arg_list(kwargs, infile=vingrd)
)
# Determine the output data kind based on the input data kind.
inkind = data_kind(grid)
match inkind:
case "image" | "grid":
outkind = inkind
case "file":
outkind = _raster_kind(grid)
case "_":
msg = f"Unsupported data type {type(grid)}."
raise GMTInvalidInput(msg)

Check warning on line 112 in pygmt/src/grdcut.py

View check run for this annotation

Codecov / codecov/patch

pygmt/src/grdcut.py#L111-L112

Added lines #L111 - L112 were not covered by tests
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 117-118 should be caught by the test_grdcut_fails test.


return load_dataarray(outgrid) if outgrid == tmpfile.name else None
with Session() as lib:
with (
lib.virtualfile_in(check_kind="raster", data=grid) as vingrd,
lib.virtualfile_out(kind=outkind, fname=outgrid) as voutgrd,
):
kwargs["G"] = voutgrd
lib.call_module(module="grdcut", args=build_arg_list(kwargs, infile=vingrd))
return lib.virtualfile_to_raster(
vfname=voutgrd, kind=outkind, outgrid=outgrid
)
85 changes: 85 additions & 0 deletions pygmt/tests/test_grdcut_image.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
"""
Test pygmt.grdcut on images.
"""

from pathlib import Path

import numpy as np
import pytest
import xarray as xr
from pygmt import grdcut, which
from pygmt.helpers import GMTTempFile

try:
import rioxarray

_HAS_RIOXARRAY = True
except ImportError:
_HAS_RIOXARRAY = False


@pytest.fixture(scope="module", name="region")
def fixture_region():
"""
Set the data region.
"""
return [-53, -49, -20, -17]


@pytest.fixture(scope="module", name="expected_image")
def fixture_expected_image():
"""
Load the expected grdcut image result.
"""
return xr.DataArray(
data=np.array(
[
[[90, 93, 95, 90], [91, 90, 91, 91], [91, 90, 89, 90]],
[[87, 88, 88, 89], [88, 87, 86, 85], [90, 90, 89, 88]],
[[48, 49, 49, 45], [49, 48, 47, 45], [48, 47, 48, 46]],
],
dtype=np.uint8,
),
coords={
"band": [1, 2, 3],
seisman marked this conversation as resolved.
Show resolved Hide resolved
"x": [-52.5, -51.5, -50.5, -49.5],
"y": [-17.5, -18.5, -19.5],
},
dims=["band", "y", "x"],
attrs={
"scale_factor": 1.0,
"add_offset": 0.0,
},
)


@pytest.mark.benchmark
def test_grdcut_image_file(region, expected_image):
"""
Test grdcut on an input image file.
"""
result = grdcut("@earth_day_01d_p", region=region)
xr.testing.assert_allclose(a=result, b=expected_image)


@pytest.mark.skipif(not _HAS_RIOXARRAY, reason="rioxarray is not installed")
seisman marked this conversation as resolved.
Show resolved Hide resolved
def test_grdcut_image_dataarray(region, expected_image):
"""
Test grdcut on an input xarray.DataArray object.
"""
raster = rioxarray.open_rasterio(which("@earth_day_01d", download="a")).load()
seisman marked this conversation as resolved.
Show resolved Hide resolved
result = grdcut(raster, region=region)
xr.testing.assert_allclose(a=result, b=expected_image)


def test_grdcut_image_file_in_file_out(region, expected_image):
"""
Test grdcut on an input image file and outputs to another image file.
"""
with GMTTempFile(suffix=".tif") as tmp:
result = grdcut("@earth_day_01d_p", region=region, outgrid=tmp.name)
assert result is None
assert Path(tmp.name).stat().st_size > 0
if _HAS_RIOXARRAY:
raster = rioxarray.open_rasterio(which("@earth_day_01d", download="a")).load()
xr.testing.assert_allclose(a=raster, b=expected_image)
Loading