Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow interface (RFC 86) returns features based on bounding box overlap instead of intersection when using spatial filter #8347

Closed
brendan-ward opened this issue Sep 6, 2023 · 4 comments
Assignees

Comments

@brendan-ward
Copy link

Expected behavior and actual behavior.

Given a geometry spatial filter or a bounding box spatial filter, the Arrow interface returns more features than intersect the geometry for some drivers (GPKG, FlatGeobuf), whereas the regular interface returns the expected features. It appears that the Arrow interface uses the bounding boxes of the geometries in the data source instead of their actual geometries.

Other drivers tested (Shapefile, GeoJSON) produce expected results when using both the regular and Arrow interfaces.

For example when querying NaturalEarth countries (WGS84 coordinates), a point located in the middle of Canada returns only a single record for Canada when using the regular interface, whereas it returns Canada, Russia, and the USA when using the Arrow interface (bounding boxes for Russia and USA wrap around the anti-meridian).

First observed in pyogrio #285

Steps to reproduce the problem.

Using a test GPKG file created from NaturalEarth countries (1:110m)

from osgeo import ogr

path = "/tmp/test.gpkg"
driver = ogr.GetDriverByName("GPKG")

dataSource = driver.Open(path, 0)
layer = dataSource.GetLayer()

# point located in Canada
layer.SetSpatialFilter(ogr.CreateGeometryFromWkt("Point (-105 55)"))
# or 
# layer.SetSpatialFilterRect(-105, 54, -104, 55)

iso_a3 = []
for feature in layer:
    iso_a3.append(feature.GetField("iso_a3"))

print(f"Using regular interface: {iso_a3}")


stream = layer.GetArrowStreamAsPyArrow()

iso_a3_arrow = []
for batch in stream:
    iso_a3_arrow.extend(batch.field("iso_a3").tolist())

print(f"Using arrow interface: {iso_a3_arrow}")

Outputs:

Using regular interface: ['CAN']
Using arrow interface: ['RUS', 'USA', 'CAN']

Operating system

MacOS 12.6.5 (M1)

GDAL version and provenance

Reproduced using both:

  • 3.7.1 installed via Homebrew
  • gdal python package (3.7.1.1) installed via pip
@jratike80
Copy link
Collaborator

There is a ready made geopackage in the Geoserver sources, i believe it is good for testing https://github.com/geoserver/geoserver/blob/main/data/release/data/ne/natural_earth.gpkg.

This returns just 'CAN' but I do not know what kind of filter my SQL is setting.

ogrinfo -sql "select iso_a3 from countries where st_intersects(geom, st_geomfromtext('point (-105 55)'))" natural_earth.gpkg

@rouault rouault self-assigned this Sep 6, 2023
rouault added a commit to rouault/gdal that referenced this issue Sep 6, 2023

Verified

This commit was signed with the committer’s verified signature.
florianduros Florian Duros
…not just bbox intersection) (fixes OSGeo#8347)
@rouault
Copy link
Member

rouault commented Sep 6, 2023

Fix in #8354 (will be in 3.8 only)

@rouault
Copy link
Member

rouault commented Sep 6, 2023

Before fix:

$ bench_ogr_batch natural_earth.parquet  -spat -105 54 -104 55 -v
3 features/rows selected
$ bench_ogr_batch natural_earth.gpkg countries  -spat -105 54 -104 55 -v
3 features/rows selected

After:

$ bench_ogr_batch natural_earth.parquet  -spat -105 54 -104 55 -v
1 features/rows selected
$ bench_ogr_batch natural_earth.gpkg countries  -spat -105 54 -104 55 -v
1 features/rows selected

rouault added a commit to rouault/gdal that referenced this issue Sep 6, 2023
@brendan-ward
Copy link
Author

Thanks for the quick fix @rouault ! 🎉

rouault added a commit to rouault/gdal that referenced this issue Sep 6, 2023
rouault added a commit to rouault/gdal that referenced this issue Sep 6, 2023
rouault added a commit to rouault/gdal that referenced this issue Sep 6, 2023
rouault added a commit to rouault/gdal that referenced this issue Sep 6, 2023
rouault added a commit to rouault/gdal that referenced this issue Sep 6, 2023
@rouault rouault closed this as completed in 75e7c54 Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants