Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xyz2grd crashes python with large grids #3086

Open
MarkWieczorek opened this issue Mar 4, 2024 · 11 comments
Open

xyz2grd crashes python with large grids #3086

MarkWieczorek opened this issue Mar 4, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@MarkWieczorek
Copy link
Contributor

MarkWieczorek commented Mar 4, 2024

Description of the problem

I am trying to create a grd file using pygmt's xyz2grd with a numpy array. The numpy array is large (64 pixels per degree corresponding to 11520 x 23040). Running the xyz2grd command kills the python kernel and gives the following result

python3.12(11590,0x1e10f1c40) malloc: *** error for object 0x3ff142fe20000000: pointer being freed was not allocated
python3.12(11590,0x1e10f1c40) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    11590 abort      ipython

I have the same problem on macOS and on Fedora linux (though on fedora, I don't think a error is reported, it just crashes).

Minimal Complete Verifiable Example

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.empty((nlat, nlon))
spacing = 1/64

grd = pygmt.xyz2grd(array, spacing=spacing, region=[0, 360, -90, 90], registration='p')

Full error message

python3.12(11590,0x1e10f1c40) malloc: *** error for object 0x3ff142fe20000000: pointer being freed was not allocated
python3.12(11590,0x1e10f1c40) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    11590 abort      ipython

System information

PyGMT information:
  version: v0.11.0
System information:
  python: 3.12.1 | packaged by conda-forge | (main, Dec 23 2023, 08:01:35) [Clang 16.0.6 ]
  executable: /Users/lunokhod/miniconda3/envs/py312/bin/python
  machine: macOS-14.3-arm64-arm-64bit
Dependency information:
  numpy: 1.26.3
  pandas: 2.1.4
  xarray: 2023.12.0
  netCDF4: 1.6.5
  packaging: 23.2
  contextily: None
  geopandas: None
  ipython: None
  rioxarray: None
  ghostscript: 10.02.1
GMT library information:
  binary version: 6.5.0
  cores: 10
  grid layout: rows
  image layout:
  library path: /Users/lunokhod/miniconda3/envs/py312/lib/libgmt.dylib
  padding: 2
  plugin dir: /Users/lunokhod/miniconda3/envs/py312/lib/gmt/plugins
  share dir: /Users/lunokhod/miniconda3/envs/py312/share/gmt
  version: 6.5.0
@MarkWieczorek MarkWieczorek added the bug Something isn't working label Mar 4, 2024
@seisman seisman added the upstream Bug or missing feature of upstream core GMT label Mar 4, 2024
@seisman
Copy link
Member

seisman commented Mar 4, 2024

I can confirm the issue. It's likely an upstream GMT bug in memory management.

Here is a working version:

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.empty((nlat, nlon))
spacing = 1/64

x = np.arange(0, 360, spacing)
y = np.arange(-90, 90, spacing)
xx, yy = np.meshgrid(x, y)

grid = pygmt.xyz2grd(x=xx.flatten(), y=yy.flatten(), z=array.flatten(), spacing=spacing, region=[0, 360, -90, 90], registration='p')
print(grid)

@MarkWieczorek
Copy link
Contributor Author

For info, I was able to read the original binary .img fine with GMT. I'm thus not sure if it's possible to reproduce this problem with GMT given that the input I used with the pygmt version was a numpy array.

@MarkWieczorek
Copy link
Contributor Author

Actually, the above example doesn't work. If you replace np.empty() with np.ones(), you will see that there are lots of nans in the grid that is created:

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.ones((nlat, nlon))
spacing = 1/64

x = np.arange(0, 360, spacing)
y = np.arange(-90, 90, spacing)
xx, yy = np.meshgrid(x, y)

grid = pygmt.xyz2grd(x=xx.flatten(), y=yy.flatten(), z=array.flatten(), spacing=spacing, region=[0, 360, -90, 90], registration='p')
print(grid)

which gives as output

<xarray.DataArray 'z' (y: 11520, x: 23040)>
array([[ 1., nan,  1., ..., nan,  1., nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [ 1., nan,  1., ..., nan,  1., nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [ 1., nan,  1., ..., nan,  1., nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
  * x        (x) float64 0.007812 0.02344 0.03906 0.05469 ... 360.0 360.0 360.0
  * y        (y) float64 -89.99 -89.98 -89.96 -89.95 ... 89.95 89.96 89.98 89.99
Attributes:
    long_name:     z
    actual_range:  [1. 1.]

The same thing occurs with np.zeros() and real data. In fact, the nans also occur with np.empty()

@seisman
Copy link
Member

seisman commented Mar 5, 2024

I think it's most likely because the grid should be gridline-registration instead of pixel-registration.

import numpy as np
import pygmt

nlat = 180 * 64
nlon = 360 * 64
array = np.ones((nlat, nlon))
spacing = 1/64

x = np.arange(0, 360, spacing)
y = np.arange(-90, 90, spacing)
xx, yy = np.meshgrid(x, y)

grid = pygmt.xyz2grd(x=xx.flatten(), y=yy.flatten(), z=array.flatten(), spacing=spacing, region=[0, 360, -90, 90], registration='g', coltypes="g")
print(grid)

the output is:

<xarray.DataArray 'z' (lat: 11521, lon: 23041)>
array([[ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       ...,
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [ 1.,  1.,  1., ...,  1.,  1.,  1.],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
  * lon      (lon) float64 0.0 0.01562 0.03125 0.04688 ... 360.0 360.0 360.0
  * lat      (lat) float64 -90.0 -89.98 -89.97 -89.95 ... 89.95 89.97 89.98 90.0
Attributes:
    long_name:     z
    actual_range:  [1. 1.]

the last row is still NaN, because lat=90 or -90 is missing data.

@MarkWieczorek
Copy link
Contributor Author

The grid is supposed to be pixel registered. What I am trying to do is read a large pixel registered image into pygmt, and then convert it to gridline using grdconvert. This works fine using the GMT command line tools, but doesn't work using pygmt.

@yvonnefroehlich
Copy link
Member

I can confirm the issue. It's likely an upstream GMT bug in memory management.

Hm. I am not sure, but sofar I understood xyz2grd in the way that your input data have to be tabular data, i.e., three columns longitude, latitude, and a z-value. And they should be converted to a GMT-ready grid. But an array in the form array = np.empty((nlat, nlon)) passed to data is not in that format. So you have to use .flatten, as done in the later examples.

@seisman
Copy link
Member

seisman commented Mar 6, 2024

I can confirm the issue. It's likely an upstream GMT bug in memory management.

Hm. I am not sure, but sofar I understood xyz2grd in the way that your input data have to be tabular data, i.e., three columns longitude, latitude, and a z-value. And they should be converted to a GMT-ready grid. But an array in the form array = np.empty((nlat, nlon)) passed to data is not in that format. So you have to use .flatten, as done in the later examples.

I think you're right. xyz2grd should not accept a 2-d matrix like the one in the original post.

@seisman
Copy link
Member

seisman commented Mar 6, 2024

Also see a related feature request at #2852.

@seisman seisman removed the upstream Bug or missing feature of upstream core GMT label Mar 6, 2024
@MarkWieczorek
Copy link
Contributor Author

The documentation states that you can use 2d arrays. If this is incorrect, could we update the documentation?

@seisman
Copy link
Member

seisman commented Mar 6, 2024

it's a little misleading. The 2-d array should has three columns for x, y and z.

@weiji14
Copy link
Member

weiji14 commented Mar 22, 2024

Maybe we can add a check, so that an error or warning is raised when a 2D array with >3 columns is passed in to the data parameter? Code will need to be inserted somewhere here:

pygmt/pygmt/src/xyz2grd.py

Lines 149 to 157 in dd8e0cd

"""
if kwargs.get("I") is None or kwargs.get("R") is None:
raise GMTInvalidInput("Both 'region' and 'spacing' must be specified.")
with GMTTempFile(suffix=".nc") as tmpfile:
with Session() as lib:
with lib.virtualfile_in(
check_kind="vector", data=data, x=x, y=y, z=z, required_z=True
) as vintbl:

@weiji14 weiji14 added the triage Unsure where this issue fits label Mar 22, 2024
@seisman seisman removed the triage Unsure where this issue fits label Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants