-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Support for FixedSizeList #4014
Comments
I have been thinking about this and is something that might fit in the scope of polars eventually. It is a lot of work with currently not much benefit with regard to the default list type. Eventually I'd like geotypes under the polars umbrella, but I first want to mature the default use case and have not a battle on two fronts. |
Can Structs be used instead of FixedSizeLists? For 2-3 data points, I'm wondering if the list properties are relevant. |
On our side that should work if we were to implement geo types. |
My goal is be to be compliant with the GeoArrow specification in development. At this point, the spec defines a nested list format where the inner array is a
My preference is to not use a polars Today,
To clarify, are you referring to rust structs or Arrow structs? Early on in GeoArrow discussions, an Arrow Struct format was proposed, but this was decided against because it is nearly identical to the physical layout of the nested list approach, while lacking the easier logical API of the nested lists.
I'm sympathetic to the extra dev overhead of new data types. I wonder whether it would be possible to add some sort of minimal "container" data type that just wraps Arrow arrays but doesn't have full polars support otherwise. In the current approach of |
Not sure I understand. For a geometry column of type |
There are more requests for fixedsizelist + extension types so that we can deal with tensor types. I want to add fixedsizelist type as a minimal type. One that can be put into a DataFrame and supports minimal aggregations and take functionality. That should allow third parties to work with more of the arrow spec + polars. |
As a heads up: the GeoArrow community is reconsidering using a struct type instead of |
@ritchie46 is there any news on supporting FixedSizeList in Polars? What would be involved in adding support? |
It would need a PR similar to this one. #5122 I would accept such a PR. It's just a few hours if work. |
I'd absolutely love to be able to use tensor types within Polars! (I'm currently using xarray, which is awesome but uses Pandas + Dask). |
Along with @stuartlynn, I've been working on https://github.com/kylebarron/geopolars to extend polars to add support for geospatial data, much like GeoPandas extends Pandas (see also polars issues #1830, #3208).
With Arrow, the whole ecosystem benefits when a common memory layout is used. There's been a lot of work in https://github.com/geopandas/geo-arrow-spec to define common ways to store vector geospatial data (points, lines, polygons, etc) in Arrow memory. Right now, two alternate layouts are defined in the spec:
geoarrow.wkb
: use aBinary
column where geometries are stored in Well-Known Binary format. WKB is common in the geo-world, but this is a less performant storage format; coordinates can't be accessed with zero copy and parsing isO(n)
.List
andFixedSizeList
(spec). This is more performant because geometry access to any coordinate is possible inO(1)
time and zero-copy access is possible. For example:geoarrow.point: FixedSizeList<f64>[2]
geoarrow.linestring: List<FixedSizeList<f64>[2]>
geoarrow.multipolygon: List<List<List<FixedSizeList<f64>[2]>>>
Therefore, to support the current version of the
geo-arrow-spec
,FixedSizeList
would be a necessary data type.Arrow2 supports
FixedSizeList
. Beyond that, I don't know thepolars
codebase well enough to know how much work it would be to add and supportFixedSizeList
. Would it be possible to reuse existingList
support forFixedSizeList
?Thoughts? I would be open to submitting a PR for this as well.
Appendix
Current Behavior
When trying to load this Arrow file (cities-geoarrow.arrow.zip), with schema:
into Polars using
table = pyarrow.feather.read_table(path); polars.from_arrow(table)
it errors with:Example files:
Point
geometries in afixed_size_list<xy: double not null>[2]
columnMultiPolygon
geometries in a column:The text was updated successfully, but these errors were encountered: