Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include non-numpy array in DataFrame #17100

Closed
mrocklin opened this issue Jul 27, 2017 · 3 comments
Closed

Include non-numpy array in DataFrame #17100

mrocklin opened this issue Jul 27, 2017 · 3 comments
Labels
Enhancement Internals Related to non-user accessible pandas implementation

Comments

@mrocklin
Copy link
Contributor

@jorisvandenbossche and I have been looking at Cythonizing some of GeoPandas. This has resulted in us having an object that holds onto a numpy array of pointers to C-level Geometry objects. We would maybe like to include this numpy-like object as a column in a Pandas dataframe. However, we would still like to hold onto the object, and not have these pointers just join an integer block in the block manager. This is useful for things like garbage collection on the C side, odd indexing rules, etc..

My intuition says that putting a numpy-like object into a Pandas dataframe without it being coerced into part of a numpy array is probably not feasible with present-day Pandas, but I thought I'd check first just in case. We have backup plans if this isn't feasible, so it's not a big deal either way.

@jbrockmendel
Copy link
Member

AFAICT this would require implementing a new Block type and patching BlockManager to recognize it. I've been toying with something similar but have shied away from it as being "too internal".

@gfyoung where does this sort of thing lie on a scale of "What's The Worst That Could Happen" to "You're Gonna Have A Bad Time"?

@gfyoung gfyoung added the Internals Related to non-user accessible pandas implementation label Jul 28, 2017
@gfyoung
Copy link
Member

gfyoung commented Jul 28, 2017

@jbrockmendel : Working with internals is not going to be a cakewalk. That being said, you're not developing in production, so "what's the worst that could happen?" 😄

I wouldn't worry about it being "too internal" - as long as you can surface it in the end with the desired behavior, that's all that counts.

@jorisvandenbossche
Copy link
Member

Let's close this in favor of #17144 (allow external Blocks), as I think that is the only option to include non-numpy arrays in a DataFrame (and the one we are using now in geopandas)

@jorisvandenbossche jorisvandenbossche added this to the No action milestone Aug 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

No branches or pull requests

4 participants