Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clib.conversion._to_numpy: Add tests for pyarrow.array with pyarrow string types #3608

Merged
merged 7 commits into from
Nov 15, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Nov 10, 2024

Description of proposed changes

This PR adds tests for PyArrow string types: string/utf8/large_string/large_utf8/string_view (xref: https://arrow.apache.org/docs/python/api/datatypes.html).

None of them can be converted to np.str_ directly, so we need to mapping them explicitly.

In [1]: import pyarrow as pa

In [2]: x = pa.array(["abc", "defg", "12345"], type=pa.string())

In [3]: x.type
Out[3]: DataType(string)

In [4]: str(x.type)
Out[4]: 'string'

In [6]: import numpy as np

In [7]: np.ascontiguousarray(x)
Out[7]: array(['abc', 'defg', '12345'], dtype=object)

@seisman seisman added maintenance Boring but important stuff for the core devs needs review This PR has higher priority and needs review. labels Nov 11, 2024
@seisman seisman added this to the 0.14.0 milestone Nov 11, 2024
@seisman seisman marked this pull request as ready for review November 11, 2024 09:15
@seisman seisman force-pushed the to_numpy/pyarrow_string branch from 8bfac8e to f10175f Compare November 11, 2024 09:48
@michaelgrund michaelgrund added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Nov 12, 2024
@seisman
Copy link
Member Author

seisman commented Nov 13, 2024

I plan to cherry-pick f10175f into a separate PR to have an entry in the "Enhancement" category.

@seisman seisman requested a review from weiji14 November 14, 2024 04:37
@@ -173,7 +178,7 @@ def _to_numpy(data: Any) -> np.ndarray:
# we can remove the workaround in PyGMT v0.17.0.
array = np.ascontiguousarray(data.astype(float))
else:
vec_dtype = str(getattr(data, "dtype", ""))
vec_dtype = str(getattr(data, "dtype", getattr(data, "type", "")))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyarrow array only has .type, not .dtype.

@seisman seisman added skip-changelog Skip adding Pull Request to changelog and removed final review call This PR requires final review and approval from a second reviewer labels Nov 15, 2024
@seisman seisman merged commit c07f1b6 into main Nov 15, 2024
18 of 20 checks passed
@seisman seisman deleted the to_numpy/pyarrow_string branch November 15, 2024 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs skip-changelog Skip adding Pull Request to changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants