-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly reserve the grid data dtype by converting ctypes array to numpy array with np.ctypeslib.as_array #3446
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,4 +41,4 @@ def test_sphinterpolate_no_outgrid(mars): | |
npt.assert_allclose(temp_grid.max(), 14628.144) | ||
npt.assert_allclose(temp_grid.min(), -6908.1987) | ||
npt.assert_allclose(temp_grid.median(), 118.96849) | ||
npt.assert_allclose(temp_grid.mean(), 272.60578) | ||
npt.assert_allclose(temp_grid.mean(), 272.60593) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The grid mean value is updated from To verify which value is correct, I first generate the output grid using the following GMT CLI:
then get the mean value by:
With option 1, the ctypes array was first converted to a Python list and then converted to a numpy array by One thing to note is that, in GMT, the mean value is weighted by spherical area for geographic grids, so it's different from the one reported by xarray.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand this sentence a bit more. So we can't assign ownership of the array data from GMT to numpy/xarray, and need to make a copy to ensure that exiting the GMT session doesn't crash the program? It doesn't look like we needed to explicitly make a copy before, so why is it needed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the arrays allocated by GMT will be freed when exiting the GMT session. This can be verified by the example below. In the GMT session, we can correctly access the
grid.x
array, but outside the session,grid.x
reports a "NULL pointer access" error.As explained in the top post, in the previous version, we first access the ctypes array via the list slice syntax (i.e.,
ctypes_array[:120]
) and then convert the list slice to a numpy array bynp.reshape
. The list slice doesn't make a new copy of the array memory, but the list->ndarray conversion changes the data dtype fromfloat32
(the default dtype for GMT grid) tofloat64
(the default dtype for numpy). I think in this conversion, numpy makes a new copy of the array:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a test script:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this part makes sense, so we were implicitly doing a copy of the array before, whereas now it's more explicit via a
.copy()
call.Thanks for showing the "NULL pointer access" exception. We probably don't want the pointer to remain active after the session is closed since that would be a memory leak, but is there a way for GMT to pass ownership of the array data to NumPy before exiting the session, so that the memory is then managed by Python (and its garbage collector)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the API function
GMT_Set_AllocMode
may help, but I just tried it locally and it doesn't work as I expected.Maybe another upstream bug or something else that needs to debug the C codes.Edit: The API documentation says:
The API function only tells GMT to not free the memory allocated by Python and can't do the reverse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From this post https://stackoverflow.com/questions/28760273/python-ctypes-and-garbage-collection and some answers from ChatGPT, it seems even if it's possible to tell GMT to not free the memory, the Python side is still responsible to free the memory explicitly. Otherwise, there may be memory leaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so we either need to:
GMT_Set_AllocMode
so that memory is still managed by PythonMaybe something to think about longer term if we want to avoid copies 🙂