-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DISCUSSION: allow external libraries to define a custom Block #17144
Comments
#17143 is an example issue of a small change. Based on my first experiments, it seems that implementing the GeometryBlock is somehow feasible. The repr (with above PR), (re)indexing, slicing, accessing elements, some operations, .. are already working, although it is of course possible that this were just the easy parts and that the can of worms opens only now trying to fix the remaining problems :-) |
I'm generally supportive of making pandas easier to extend like this,
primarily since it'll force us to clean up some internal things like you
discovered in #17143.
(of course for 2.0 this is a whole other issue).
I think this deserves more than a parenthetical :) We want to avoid
introducing new APIs that will break with pandas 2, as your GeometryBlock
would (I think). That said, I think it's worthwhile, even if the internals
of geopandas will need to be updated for pandas 2.
…On Tue, Aug 1, 2017 at 9:59 AM, Joris Van den Bossche < ***@***.***> wrote:
#17143 <#17143> is an example
issue of a small change.
Based on my first experiments, it seems that implementing the
GeometryBlock is somehow feasible. The repr (with above PR), (re)indexing,
slicing, accessing elements, some operations, .. are already working,
although it is of course possible that this were just the easy parts and
that the can of worms opens only now trying to fix the remaining problems
:-)
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#17144 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIo9o7ml-A4fokRuBSLlnDCrCRER5ks5sTz1KgaJpZM4Op2i6>
.
|
I think it is perfectly reasonable to assume that the current GeometryBlock that would be included in geopandas will only work for pandas 1.x, and that we will have to rework this for pandas 2.x. Maybe the main constraint for pandas 2.x from this regard would be is not to support such blocks, but to at least have a similar (hopefully cleaner) mechanism to let external libraries extend pandas. |
Yes, I was going to suggest that, but I don't want to put more work on Wes and others' plate :) I wouldn't really consider this a hard requirement for the initial pandas 2, but at some point it would be good to have. |
I think we'll be able to make user defined types much simpler. For example, a Lattitude-Longitude type could be embedded in As an aside, it seems more and more likely that the optimal route for pandas2 will be a separate codebase, while factoring out reusable components of pandas 0.x that do not need to have knowledge of the low level internals. |
The GeoPandas case is a bit more complex than storing structs. We need to store (and track) pointers to an external library, GEOS. This is the library that backs essentially every geospatial system, including Postgres' PostGIS. Currently our array-like-geometry object tracks references so that we can free the GEOS pointers at the appropriate time. Is handling pointers to external libraries within scope for Pandas 2? This is a bit atypical. |
It looks like the set of recognized |
Can this be closed now that we have the extension array interface, and through that an |
Yes I think extension block is no longer necessary. |
And hooray that they are no longer necessary! The extension array interface is much better |
I am opening this issue because I want (to try) to pursue this in GeoPandas to add a custom
GeometryBlock
(work together with Matthew Rocklin in geopandas/geopandas#467, ultra short motivation: we want to store integers (pointers to C objects) in a column but box it to shapely python objects when the user interacts with the column (repr, accessing element, ..))I am of course free to try this :-), but I wanted to raise this because it has some consequences. With the "allow external libraries" in the issue title, I mean the following:
I don't think we plan many internal refactorings for pandas 0.x / 1.x, so on that regard the Block API should/could remain rather stable (of course for 2.0 this is a whole other issue).
So this issue can serve as general discussion for this (or if people have input or feedback) and as a reference for when changes in pandas are made for this.
cc @pandas-dev/pandas-core
The text was updated successfully, but these errors were encountered: