Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pyarrow-based pandas ExtensionArray geometry classes: #2

Merged
merged 20 commits into from
Nov 12, 2019

Conversation

jonmmease
Copy link
Collaborator

@jonmmease jonmmease commented Nov 4, 2019

Adds a collection of pandas ExtensionArray classes for representing homogenous arrays of geometric objects. The following extension arrays are added (shapely analog included in parentheses):

  • MultiPoint2dArray (shapely MultiPoint)
  • Line2dArray (shapely LineString)
  • MultiLine2dArray (shapely MultiLineString)
  • Ring2dArray (shapely Ring)
  • Polygon2dArray (shapely Polygon)
  • MultiPolygon2dArray (shapely MultiPolygon)

These classes are implemented as wrappers around pyarrow ListArray objects.

 - MultiPoint2dArray
 - Line2dArray
 - MultiLine2dArray
 - Ring2dArray
 - Polygon2dArray
 - MultiPolygon2dArray
for shapely/geopandas conversion methods
Copy link
Member

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/multipolygon2d.py Outdated Show resolved Hide resolved
spatialpandas/geometry/multipolygon2d.py Outdated Show resolved Hide resolved
spatialpandas/geometry/polygon2d.py Outdated Show resolved Hide resolved
spatialpandas/geometry/polygon2d.py Outdated Show resolved Hide resolved
spatialpandas/geometry/polygon2d.py Outdated Show resolved Hide resolved
Copy link

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Added a bunch of comments (only a partial review, but since I might not have time to do the rest before friday, already posting them)

spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/base.py Show resolved Hide resolved
spatialpandas/geometry/base.py Show resolved Hide resolved
spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/__init__.py Outdated Show resolved Hide resolved
spatialpandas/geometry/line2d.py Outdated Show resolved Hide resolved
spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
spatialpandas/geometry/base.py Outdated Show resolved Hide resolved
@jonmmease
Copy link
Collaborator Author

Thanks for taking a look @jorisvandenbossche, all of your comments make sense and I'll soon update the PR accordingly.

Are there (vague) plans to support 3D at some point?
I would personally consider dropping the "2d" in all names

I was thinking it may make sense to eventually include 1D and 3D geometric objects / utilities in the library. @jbednar, what do you think about the possibility of non-2D support and of having "2d" in the current class names?

@jbednar
Copy link
Member

jbednar commented Nov 6, 2019

I only currently need 2D shapes out of this library, but historically, I've always eventually regretted when I've committed an entire library to be 2D only. So I think being explicit for the portions of the library that are specifically 2D is good.

@jorisvandenbossche
Copy link

If it's only about 2D/3D (I don't know the use cases of 1D?), it's also an option to use the pattern of WKT, which uses "POINT (1 2)" for 2D and "POINT 3D (& 2 3)" for 2D (so no indication is 2D, add 3D for 3D)

@jonmmease
Copy link
Collaborator Author

Having "2d" be the default and explicitly calling out 3d/1d in the future would be fine with me as a plan. For 1d, I was thinking it might be useful to be able to represent and manipulate 1D points and intervals using the same approach as 2D. e.g. union, intersection, containment, and spacial join would all be meaningful, though I don't have a particular domain usecase in mind at the moment.

@jbednar
Copy link
Member

jbednar commented Nov 6, 2019

I'll leave that up to you, @jonmmease; whatever seems more natural to you!

@xhochy
Copy link

xhochy commented Nov 8, 2019

The base class looks nice but seems like there is a bit of overlap with fletcher. @jonmmease Would you be open to depending on fletcher if we provide a base implementation of an ExtensionArray based on pyarrow.Array in addition to the pyarrow.ChunkedArray one? I guess that both would include a lot of shared code. If so, I would start to make a base implementation that looks so nice that one actually wants to depend on it.

@jonmmease
Copy link
Collaborator Author

Would you be open to depending on fletcher if we provide a base implementation of an ExtensionArray based on pyarrow.Array in addition to the pyarrow.ChunkedArray one?

Yes, definitely open to it. I think these are the main things that we would need to be able to customize/override.

  • Custom validation logic for the pyarrow Array that is created from the constructor inputs.
  • Customize the Dtype string representation (e.g multipoint[float64] rather than something like list[list[float64]]).
  • Ability to wrap individual array elements in a custom class when accessed through __getitem__.

Thanks for taking the time to think about this @xhochy!

"1d" and "3d" will be used in the future to disambiguate in the future
if needed.
@jonmmease
Copy link
Collaborator Author

Going to merge now. At any point in the future, we should be able to update the base geometry classes to subclass a suitable fletcher base class without changing the API.

@jonmmease jonmmease merged commit 838b38d into master Nov 12, 2019
@jonmmease jonmmease deleted the extension_arrays branch November 12, 2019 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants