Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**Breaking**: data_kind: Now 'matrix' represents a 2-D numpy array and unrecognized data types fall back to 'vectors' #3351

Merged
merged 21 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
0c82b3c
data_kind: Refactor the if-else statements into if-return statements
seisman Oct 3, 2024
808755d
data_kind: Now 'matrix' represents a 2-D numpy array and unrecognizd …
seisman Oct 3, 2024
0eb4f8f
Make 'data' a required parameter
seisman Oct 3, 2024
9891b2c
Fix x2sys_cross as pd.DataFrame is 'vectors' kind now
seisman Oct 3, 2024
a9d094c
Fix legend as now 'vectors' doesn't mean data is None
seisman Oct 3, 2024
3d8be4d
Add docstrings for stringio
seisman Oct 3, 2024
7c104a9
Merge branch 'data_kind/return' into refactor/data_kind
seisman Oct 4, 2024
5790923
Merge branch 'main' into refactor/data_kind
seisman Oct 7, 2024
6954c5d
Fix docstrings
seisman Oct 7, 2024
9300ca3
Merge branch 'main' into refactor/data_kind
seisman Oct 7, 2024
2701a4a
Merge branch 'main' into refactor/data_kind
seisman Oct 8, 2024
7fcf57f
Merge branch 'main' into refactor/data_kind
seisman Oct 10, 2024
991f688
Merge branch 'main' into refactor/data_kind
seisman Oct 11, 2024
51569c8
Update pygmt/clib/session.py
seisman Oct 13, 2024
4a4f192
2-D array-like that implements '__array_interface__' is matrix
seisman Oct 14, 2024
2a6e788
Merge branch 'main' into refactor/data_kind
seisman Oct 14, 2024
2b054b6
Merge branch 'main' into refactor/data_kind
seisman Oct 15, 2024
cfa32ed
Add a test for passing string dtype matrix
seisman Oct 15, 2024
ef6e6aa
Fix a bug when passing a 2-D matrix to virtualfile_from_vectors
seisman Oct 15, 2024
423e5dc
Should transpose the 2-D matrix before passing to virtualfile_from_ve…
seisman Oct 15, 2024
81e57f7
Merge branch 'main' into refactor/data_kind
seisman Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 19 additions & 18 deletions pygmt/clib/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -1787,10 +1787,7 @@ def virtualfile_in( # noqa: PLR0912
"grid": self.virtualfile_from_grid,
"image": tempfile_from_image,
"stringio": self.virtualfile_from_stringio,
# Note: virtualfile_from_matrix is not used because a matrix can be
# converted to vectors instead, and using vectors allows for better
# handling of string type inputs (e.g. for datetime data types)
"matrix": self.virtualfile_from_vectors,
"matrix": self.virtualfile_from_matrix,
"vectors": self.virtualfile_from_vectors,
}[kind]

Expand All @@ -1807,29 +1804,33 @@ def virtualfile_in( # noqa: PLR0912
warnings.warn(message=msg, category=RuntimeWarning, stacklevel=2)
_data = (data,) if not isinstance(data, pathlib.PurePath) else (str(data),)
elif kind == "vectors":
_data = [x, y]
if z is not None:
_data.append(z)
if extra_arrays:
_data.extend(extra_arrays)
elif kind == "matrix": # turn 2-D arrays into list of vectors
if hasattr(data, "items") and not hasattr(data, "to_frame"):
if data is None:
# data is None, so data must be given via x/y/z.
_data = [x, y]
if z is not None:
_data.append(z)
if extra_arrays:
_data.extend(extra_arrays)
elif hasattr(data, "items") and not hasattr(data, "to_frame"):
# pandas.DataFrame or xarray.Dataset types.
# pandas.Series will be handled below like a 1-D numpy.ndarray.
_data = [array for _, array in data.items()]
elif hasattr(data, "ndim") and data.ndim == 2 and data.dtype.kind in "iuf":
# Just use virtualfile_from_matrix for 2-D numpy.ndarray
# which are signed integer (i), unsigned integer (u) or
# floating point (f) types
_virtualfile_from = self.virtualfile_from_matrix
_data = (data,)
else:
# Python list, tuple, numpy.ndarray, and pandas.Series types
_data = np.atleast_2d(np.asanyarray(data).T)
elif kind == "matrix":
# GMT can only accept a 2-D matrix which are signed integer (i), unsigned
# integer (u) or floating point (f) types. For other data types, we need to
# use virtualfile_from_vectors instead, which turns the matrix into a list
# of vectors and allows for better handling of non-integer/float type inputs
# (e.g. for string or datetime data types).
_data = (data,)
if data.dtype.kind not in "iuf":
_virtualfile_from = self.virtualfile_from_vectors
Comment on lines +1828 to +1829
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for these lines?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a test for it. Before adding more tests, I'm wondering if we should split the big "test_clib_virtualfiles.py" file (with more than 500 lines) into separate smaller test files, i.e., one test file for each Session method.

  • test_clib_virtualfile_in
  • test_clib_virtualfile_from_vectors
  • test_clib_virtualfile_from_matrix
  • test_clib_open_virtualfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've split it before in #2784, so yes, ok to split it again 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've split it before in #2784, so yes, ok to split it again 😆

Done in #3512.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test in cfa32ed to cover this line.

_data = data.T

# Finally create the virtualfile from the data, to be passed into GMT
file_context = _virtualfile_from(*_data)

return file_context

def virtualfile_from_data(
Expand Down
37 changes: 22 additions & 15 deletions pygmt/helpers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,12 @@ def data_kind(
- ``"grid"``: a :class:`xarray.DataArray` object that is not 3-D
- ``"image"``: a 3-D :class:`xarray.DataArray` object
- ``"stringio"``: a :class:`io.StringIO` object
- ``"matrix"``: anything else that is not ``None``
- ``"vectors"``: ``data`` is ``None`` and ``required=True``
- ``"matrix"``: a 2-D array-like object that implements ``__array_interface__``
(e.g., :class:`numpy.ndarray`)
- ``"vectors"``: ``data`` is ``None`` and ``required=True``, or any unrecognized
data. Common data types include, a :class:`pandas.DataFrame` object, a dictionary
with array-like values, a 1-D/3-D :class:`numpy.ndarray` object, or array-like
objects.

Parameters
----------
Expand Down Expand Up @@ -268,27 +272,27 @@ def data_kind(

The "matrix"`` kind:

>>> data_kind(data=np.arange(10)) # 1-D numpy.ndarray
'matrix'
>>> data_kind(data=np.arange(10).reshape((5, 2))) # 2-D numpy.ndarray
'matrix'

The "vectors" kind:

>>> data_kind(data=np.arange(10)) # 1-D numpy.ndarray
'vectors'
>>> data_kind(data=np.arange(60).reshape((3, 4, 5))) # 3-D numpy.ndarray
'matrix'
'vectors'
>>> data_kind(xr.DataArray(np.arange(12), name="x").to_dataset()) # xarray.Dataset
'matrix'
'vectors'
>>> data_kind(data=[1, 2, 3]) # 1-D sequence
'matrix'
'vectors'
>>> data_kind(data=[[1, 2, 3], [4, 5, 6]]) # sequence of sequences
'matrix'
'vectors'
>>> data_kind(data={"x": [1, 2, 3], "y": [4, 5, 6]}) # dictionary
'matrix'
'vectors'
>>> data_kind(data=pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})) # pd.DataFrame
'matrix'
'vectors'
>>> data_kind(data=pd.Series([1, 2, 3], name="x")) # pd.Series
'matrix'

The "vectors" kind:

'vectors'
>>> data_kind(data=None)
'vectors'
"""
Expand All @@ -312,7 +316,10 @@ def data_kind(
# geopandas.GeoDataFrame or shapely.geometry).
# Reference: https://gist.github.com/sgillies/2217756
kind = "geojson"
case x if x is not None: # Any not-None is considered as a matrix.
case x if hasattr(x, "__array_interface__") and data.ndim == 2:
# 2-D Array-like objects that implements ``__array_interface__`` (e.g.,
# numpy.ndarray).
# Reference: https://numpy.org/doc/stable/reference/arrays.interface.html
kind = "matrix"
case _: # Fall back to "vectors" if data is None and required=True.
kind = "vectors"
Expand Down
2 changes: 1 addition & 1 deletion pygmt/src/legend.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ def legend(
kwargs["F"] = box

kind = data_kind(spec)
if kind not in {"vectors", "file", "stringio"}: # kind="vectors" means spec is None
if spec is not None and kind not in {"file", "stringio"}:
raise GMTInvalidInput(f"Unrecognized data type: {type(spec)}")
if kind == "file" and is_nonstr_iter(spec):
raise GMTInvalidInput("Only one legend specification file is allowed.")
Expand Down
2 changes: 1 addition & 1 deletion pygmt/src/x2sys_cross.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ def x2sys_cross(
match data_kind(track):
case "file":
file_contexts.append(contextlib.nullcontext(track))
case "matrix":
case "vectors":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.DataFrame now is "vectors" kind.

# find suffix (-E) of trackfiles used (e.g. xyz, csv, etc) from
# $X2SYS_HOME/TAGNAME/TAGNAME.tag file
tagfile = Path(
Expand Down
28 changes: 27 additions & 1 deletion pygmt/tests/test_clib_virtualfile_in.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@
import pandas as pd
import pytest
import xarray as xr
from packaging.version import Version
from pygmt import clib
from pygmt.clib import __gmt_version__
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import GMTTempFile
from pygmt.helpers import GMTTempFile, data_kind

POINTS_DATA = Path(__file__).parent / "data" / "points.txt"

Expand Down Expand Up @@ -101,3 +103,27 @@ def test_virtualfile_in_fail_non_valid_data(data):
z=data[:, 2],
data=data,
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is added to address #3351 (comment).

This new test passes a string dtype numpy array into GMT C API, which contains longitude/latitude strings. The data kind is "matrix", but since the data dtype is not in iuf, PyGMT will use virtualfile_from_vectors rather than virtualfile_from_matrix to pass the data into GMT C API. Ideally, we should check that the virtualfile_from_vectors is called once and virtualfile_from_matrix is not called, but I find it technically complicated with unittest.mock, so I'll leave it untested.

The function is marked as xfail with GMT 6.5 due to a newly found upstream bug at GenericMappingTools/gmt#8600.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, and thanks for adding the test!


@pytest.mark.xfail(
condition=Version(__gmt_version__) <= Version("6.5.0"),
reason="Upstream bug fixed in https://github.com/GenericMappingTools/gmt/pull/8600",
)
def test_virtualfile_in_matrix_string_dtype():
"""
Pass a string dtype matrix should work and the matrix should be passed via a series
of vectors.
"""
data = np.array([["11:30W", "30:30S"], ["12:30W", "30:00S"]])
assert data_kind(data) == "matrix" # data is recognized as "matrix" kind
assert data.dtype.type == np.str_
assert data.dtype.kind not in "iuf" # dtype is not in numeric dtypes

with clib.Session() as lib:
with lib.virtualfile_in(data=data) as vintbl:
with GMTTempFile() as outfile:
lib.call_module("info", [vintbl, "-C", f"->{outfile.name}"])
output = outfile.read(keep_tabs=False)
assert output == "347.5 348.5 -30.5 -30\n"
# Should check that lib.virtualfile_from_vectors is called once,
# not lib.virtualfile_from_matrix, but it's technically complicated.
Loading