-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: (sometimes) error in read on .gpkg when using a bbox filter #149
Comments
Is this consistently reproducible for the same GPKG? If this is reproducible, perhaps characteristics of the failing record may help us here. What version of GDAL is failing? I had interpreted "dynamic" to mean that the actual set of records within the database may be changing when running the feature count or iterating over features; I assume that is not the case here? It is possible that bounding boxes of a given geometry intersect that of the spatial filter and yet the actual geometry does not intersect the filter box. But I'm surprised we haven't seen errors on this before, though maybe spatial filter has not yet been used extensively in practice for pyogrio (I rarely use it) and not sufficiently covered in our test cases for this sort of case. We do need an accurate count in advance in order to set up the arrays to receive the data. We currently set the |
Yes. As I mentioned above, the problem for the case I encountered is
In the mean time, I've also reproduced it in the pyogrio tests: if I enlarge the bbox in the test cases that use a bbox so another country's MBR comes in the bbox, but is actually outside the bbox, the same issue surfaces.
I'm running on 3.5.1
Yes, understandable... but that's not the case here.
I'm looking into it as we speak to write a fix and I saw indeed that the code relies a lot on the count being available beforehand. If the count isn't known beforehand:
So, to minimize the impact I'm keeping the count for now... |
Thanks for looking further into this! It sounds like count via |
Indeed, that's the way I implemented the fix. I also added an explicit check so if more rows would be available than the "prediction" by count |
I had a look at the time it takes to execute OGR_L_GetFeatureCount. Also usefull to note: apparently in validate_feature_range there is another OGR_L_GetFeatureCount that is run every time. Logically it depends on the file format you are reading from + the availability of a spatial index. I tested it on 3 different but similar files with 550.000 polygons of +- 300 MB (including attibutes), where +- 1000 rows were selected using the bbox:
Conclusion?
|
Thanks for the additional investigation and good catch on the performance impact of getting feature count twice when there are filters applied. I think we should consolidate the use of |
When reading a .gpkg with a bbox filter, some
read
operations trigger the following error:pyogrio.errors.FeatureError: Could not read feature at index ???
After investigating a bit it seems that for .gpkg files OGR_L_GetFeatureCount returns the number of results doing a "coarse" bbox comparison (probably it just uses the spatial index), but when fetching the results with OGR_L_GetNextFeature it does a more thorough check (probably checks for an intersection).
Because the result of OGR_L_GetFeatureCount is used to determine the range of the loop in
_io.pyx::get_features()
this results in the error above.For some other file types (.geojson, shapefile) this doesn't seem to give an issue at first sight.
The gdal documentation for OGR_L_GetFeatureCount also gives a hint that the result of it is not 100% reliable:
For dynamic databases the count may not be exact.
The text was updated successfully, but these errors were encountered: