-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSM Planet - Examples and geospatial features current status #678
Comments
Thank you for these questions and queries, Lorenz, I will reply to them one comment at a time. Concerning your first query ("contains envelope"): This was realized using #413, which is based on an old version of QLever, where values used to be implemented very inefficiently as strings. In the meantime, QLever's handling of values has been completely refactored and is now much more efficient, see #648 and #650 (and some preparatory PRs). In a nutshell, values that fit into an 8-byte integer (that is, most values) are now represented directly in their ID instead of as strings like before. Also note that since #638, QLever supports @joka921 How much do you consider it to adapt #413 to the current master? I would expect that much of the magic that we needed for #413 is not needed anymore because the values are now efficient out of the box already. @LorenzBuehmann Is this a feature which you need urgently or were you just curious? |
Concerning your second query ("the 16 states of Germany, selected via their ISO3166-2 code"): In your query, the quotation mark in the REGEX was not escaped. If you escape it, the query works: https://qlever.cs.uni-freiburg.de/osm-planet/C76hnG We are aware that according to the SPARQL standard, the REGEX should actually be |
Concerning your third query ("all castles in Switzerland"): As explained in the osm2rdf paper, the The query rewriting currently happens in the QLever UI and is imperfect because it uses regexes instead of a proper SPARQL parser. It didn't work for your query because of the many nested braces. I fixed that bug and your query now works: https://qlever.cs.uni-freiburg.de/osm-planet/CULQMR . I have encountered that bug many times before, but never had the nerve to fix it. Now I did thanks to your request :-) |
Concerning your fourth query ("all highway road segments"): ´osm2rdf` has various options to control which of these triples are produced: https://github.com/ad-freiburg/osm2rdf/blob/master/src/config/Config.cpp#L86-L110 . In the instances that currently run on https://qlever.cs.uni-freiburg.de, these triples have not been added. Do you need them or were you just curious? |
Concerning your fifth query ("all 16 states of Germany" on OSM Planet): I have changed the name of this example query :-) |
Good morning/afternoon ? (according to your 2am responses ...) @hannahbast First of all, many thanks for the helpful comments and bug fixes already. My questions have been mostly out of curiosity as I'm playing around with OSM data and GeoSPARQL stuff in my current work. That's why I read the paper and also then tried the examples registered in the QLever UI. So no need to hurry up or change your schedule, I'll take what I get when you have the time. So, all of the queries reported here are not created by myself. Maybe it's worth to either omit those in the UI then, at least the fourth query where the data isn't loaded, thus, the query always returns an empty result - might be confusing otherwise for users. Nice to hear that you're still in the process of improving and speeding up things, e.g. the literal value handling part. And I totally agree, that WKT parsing and indexing should happen already during loading and indexing time. At least that's how most spatial indexes work, currently you can't make use of such unless you would load the WKT literals into some spatial index like structure, e.g. ST Tree, R tree and similar. In the long term I think you could try to support a larger part of geospatial features, especially beyond containment checks - but be careful, it's a huge topic (personally I'd start with just the basic topological and non-topological simple functions) Minor comment: please keep in mind that neither |
@LorenzBuehmann Thanks + yes, I will clean up the example queries, they should indeed all work. Where is the Swiss castle query from, that's not from us, or is it? We included And yes, GeoSPARQL is a bottomless barrel. Our strategy will be to cover the basic stuff and do those things fast, for which other engines are slow or don't manage at all. For example, PostgreSQL+PostGIS is very slow with contains queries. The only fast engine for that is Overpass, but that is so not combinable with anything else. Another example: we have started working on the efficient visualization of large result sets. It always bothered me when Map UIs break down when you have more than a few thousand results to show. Here are all 17M trees in OpenStreetMap: https://qlever.cs.uni-freiburg.de/osm-planet/giNBGB . Ich you click on Map View++, you will get a nice view on all zoom levels and it reacts pretty fast given the large result set. |
Sure it is: "All castles in Switzerland"
Is it? In my very limited world this is supposed to be a fast geospatial database. Do you have some example and numbers in mind? Do you refer to point in polygon queries or even something beyond? Yep, know what you mean - UI becomes very sluggish. What is your approach? Looks like you merge points into clusters and make use of some heatmap feature of Leaflet? |
Thanks for reminding me about the "All castles in Switzerland" query :-) It came from one of our users and since I found it interesting and challenging, I added it. So in this sense, it is both not from us (but from that user) and from us (because I added it as an example query, which I forgot in the meantime). Thanks for the blog post. It speaks about a spatial join between a set of 9M parking violations and a set of 150 neighborhoods of Philadelphia. That is no big challenge: the neighborhoods are few and have a rather simple shape, so a simple R-tree will do the job: For each shape, get the parking violations (which are just points) in the bounding box of the shape, and for each of them compute whether they are really contained. That is a matter of seconds even without any parallelization. And since this algorithm is trivially parallelizable, you get a speedup of k if you use k cores and even more with special hardware like a GPU For OSM Planet, you have 1.5 billion geometric objects, some of them very complex, like the border of a country. For Germany alone, you have 118M objects and a lot of complex borders. For example, consider the relatively simple query "all post boxes in Baden-Württemberg", which QLever can handle easily: https://qlever.cs.uni-freiburg.de/osm-germany/zss8C5 . The result is 11,270 objects. PostgeSQL+PostGIS will do at least these many comparisons to the complex border of Baden-Württemberg (and some more comparisons of objects close to the border). We have tried it and it takes forever. Parallelization or GPUs will not solve this problem, it's simply too much computation. And that is a simple query. Much harder ones are "all buildings in Baden-Württemberg": https://qlever.cs.uni-freiburg.de/osm-germany/sGlYuZ (very many comparisons against a complex shape) or "all level-6 administrative regions in Baden-Württemberg": https://qlever.cs.uni-freiburg.de/osm-germany/PPx8OQ (many expensive comparisons between complex shapes), |
Hi there, I also face same problem. So if query ("contains envelope") implemented refactored, how could I run it on the recent Qlever? Thank you for help. |
@siwenyang Can you give an example of a query you wonder how to ask? As I wrote, there is now |
Hi there, here is another issue I opened and has query code in: #844 Thank you for help! |
@hannahbast as far as I understand, it is not about using the materialized Thanks in advance |
@LorenzBuehmann I think we should distinguish between two types of "contained in arbitrary given geometry" queries, and I have a question about both:
|
Hi @hannahbast . Well, for me any client asking maybe only for an externally defined part of the whole world would be the most natural use case. Rectangle based lookup might sound weird at a first glance, but even Leaflet has the concept of tiles and might just want to render on demand. As far as I know, most tools I've been working with use an R-Tree as index structure, and then the envelope of whatever geometry I use for querying is used to get the candidates from the R-Tree and in a secondary step only on those retrieved objects intersection, containment etc. is computed to get the correct result. |
@hannahbast |
I tried to do an updated recap of this long thread:
Now, I'm currently trying to check whether a
Looking at the paper, if I understand correctly you are referring to Anyway, in general, has any predicate/function been implemented to this time that would allow bbox spatial queries? My specific use case would be to fetch only the elements that have to be rendered in the portion of a map that is rendered on a screen, and using the latitudes and longitudes at the borders of the screen is the most natural lookup method. More in general using a bbox to filter objects is the most typical method of querying used to query OSM data, for example through other query engines like Overpass or through map tiles. |
I think we need to keep track of Sophox vs QLever OSM mapping. The current Sophox mapping (How OSM data is stored). |
@nyurik Can you briefly explain how you get the latest data? We currently fetch And I agree, we should keep track of how we map data to RDF. And ideally use the same mapping. |
All the code is in https://github.com/Sophox/sophox/tree/main/osm2rdf -- that tool subscribes to minutely updates, and generates either TTL files (from full dump) or SPARQL insert statements (from updates) |
@nyurik Thanks! One quick question: how do you keep track of:
I am assuming that your code also works if it's not constantly running. BTW, I wasn't aware until now that your converter is also called |
i had it named osm2rdf from the start, and it was part of the https://github.com/Sophox/sophox/tree/main/osm2rdf repo - I think i wrote most of it 6 years ago :) I store sequence number in the index itself - https://github.com/Sophox/sophox/blob/main/osm2rdf/RdfUpdateHandler.py#L93 - that sequence value is what osmosis code and its python wrapper use to manage updates. Sadly, there is no well defined algorithm as far as i know to convert a timestamp to a sequence number (it should be possible though). A better path forward IMO would be to work together on the rust version of osm2rdf that already handles most of the rapid TTL/SPARQL generation, and just needs to gain a way to determine which update files (first daily, followed by hourly / minutly) to get |
@nyurik Working together sounds good. One focus of our tool was getting the full geometry for each OSM object (currently as a WKT literal, but this could also be in any other format, for example, WKB). Which is non-trivial to make efficient because you need to store all the point locations (very many) and then gather the ones you need for each object. On https://sophox.org/ only the centroids of each object are available (via the |
full geometry is fairly simple to do in the |
Updating this thread with the news from #1331 (reply in thread) :
|
It's more a general question, but is the geospatial support still deployed on the public QLever instance? And did you change the OSM data model somehow? I'm asking because I tried some queries from the examples, e.g.
Query
which fails with
Query:
fails with
Query
doesn't really run according to the UI, but I expect an error not even shown in the UI.
Query
leads to an empty result, debugging the query and it looks like there is no such triple matching
?way osmway:node ?m .
, i.e. no way has such a node assigned.Misc
"All 16 states of Germany" shows up a query that doesn't stick to Germany I guess.
The text was updated successfully, but these errors were encountered: