-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python bindings: check for Arrow PyCapsule Interface in ogr.Layer.WritePyArrow
#9132
Labels
Comments
rouault
added a commit
to rouault/gdal
that referenced
this issue
Jan 24, 2024
…w_c_stream__ or __arrow_c_array__ interfaces fixes OSGeo#9132
implemneted per #9133 |
rouault
added a commit
to rouault/gdal
that referenced
this issue
Jan 24, 2024
…w_c_stream__ or __arrow_c_array__ interfaces fixes OSGeo#9132
rouault
added a commit
to rouault/gdal
that referenced
this issue
Jan 24, 2024
…w_c_stream__ or __arrow_c_array__ interfaces fixes OSGeo#9132
rouault
added a commit
to rouault/gdal
that referenced
this issue
Jan 24, 2024
…w_c_stream__ or __arrow_c_array__ interfaces fixes OSGeo#9132
Open
8 tasks
dshean
pushed a commit
to dshean/gdal
that referenced
this issue
Jan 27, 2024
…w_c_stream__ or __arrow_c_array__ interfaces fixes OSGeo#9132
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Expected behavior and actual behavior.
This is a corollary of #9043, which added support for the Arrow PyCapsule Interface for reading from a layer. This ticket is a feature request for writing from objects that expose the PyCapsule Interface.
The current implementation of
ogr.Layer.WritePyArrow
uses pyarrow-specific APIs, including theto_batches
methodgdal/swig/include/python/ogr_python.i
Line 652 in 5f9ffa3
and later the
_export_to_c
methodsgdal/swig/include/python/ogr_python.i
Line 666 in 5f9ffa3
gdal/swig/include/python/ogr_python.i
Line 642 in 5f9ffa3
With the PyCapsule Interface, any arrow-based table or record batch would be supported. Not just pyarrow (v14 or higher) but also
geoarrow-c
,geoarrow-rust
, and potentially more in the future, likegeopandas
(ref pandas-dev/pandas#56587 for the pandas implementation).Table constructs like
pyarrow.Table
include an__arrow_c_stream__()
method and RecordBatch constructs likepyarrow.RecordBatch
include an__arrow_c_array__
method that returns a struct column.The exact changes to
WritePyArrow
would be:__arrow_c_stream__
on the input value and callingpyarrow.table()
on the input. If__arrow_c_stream__
does not exist but__arrow_c_array__
does exist, then callpyarrow.record_batch()
on the input.__arrow_c_stream__
to access the underlying stream pointer, which has a reference to the schema and an iterator for the batches. Then pass those pointers directly intoself.CreateFieldFromArrowSchema
andself.WriteArrowBatch
.Steps to reproduce the problem.
Feature request, not a bug.
Operating system
Feature request, not a bug.
GDAL version and provenance
Feature request, not a bug.
The text was updated successfully, but these errors were encountered: