-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read osm.pbf #269
Comments
Thanks for reporting this! I am able to reproduce the error locally. One way to sidestep this is to use the The problem appears to be that GDAL is returning -1 for the number of features in each layer; We use the feature count in various places to allocate arrays that we then populate when iterating over features, even when we tell it to give us the count the slow way (by iterating over all features) if the driver doesn't support a fast count (this driver does not). The only idea I'm coming up with for this is that if we get back a -1 from GDAL, that we do our own loop over all records first to get the count, use that to allocate arrays, and then iterate over all the records again. That seems pretty inefficient, but maybe is no different than what GDAL would be doing normally to get a count of records. |
Related StackOverflow post from 9(!) years ago. Looking at the GDAL source code, feature count is specifically disabled for this driver, probably because of the performance impact of iterating over all features for OSM format. |
Never mind. import geopandas as gpd
url = "https://download.geofabrik.de/europe/liechtenstein-latest.osm.pbf"
gdf = gpd.read_file(url, engine="pyogrio", layer="lines", use_arrow=True) does the trick.. 😅 |
@brendan-ward I try to load this file: import pyogrio
url = "http://download.geofabrik.de/europe/germany/baden-wuerttemberg-latest.osm.pbf"
pgdf = pyogrio.read_dataframe(url, use_arrow=True, layer="multipolygons") but this returns an empty dataframe: pgdf.info()
Doing the same with
the output returns a filled dataframe. Is this related to the original issue? |
Based on a lead found by @brendan-ward in #272 I gave the following snippet a try, and it seems to work, even though it crashes on my laptop without LIMIT clause because I don't have enough memory to load this entire file: import pyogrio
url = "http://download.geofabrik.de/europe/germany/baden-wuerttemberg-latest.osm.pbf"
pgdf = pyogrio.read_dataframe(url, use_arrow=True, sql="SELECT * FROM multipolygons LIMIT 100") |
@CaptainInler I believe I have a fix for this now in #271; it uses a similar approach as However, because it has to do 2 passes over a lot of records the OSM file, not all of which are in the layer you read, it is slow to read from a remote URL. I added some additional recommendations around this as part of #271, but in short, I highly recommend you download the file first. |
I tried this
returned this error:
Since the traceback so kindly asks to open an issue, I could not resist doing so... 😃
The text was updated successfully, but these errors were encountered: