Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize queries using info from OSMF servers #4279

Closed
pnorman opened this issue Jan 9, 2021 · 3 comments
Closed

Optimize queries using info from OSMF servers #4279

pnorman opened this issue Jan 9, 2021 · 3 comments

Comments

@pnorman
Copy link
Collaborator

pnorman commented Jan 9, 2021

In openstreetmap/operations#495 I got a sysadmin to sample the queries, allowing for an analysis of what's slow.

Grouped by the first lines of the query, this identifies the slow queries

sum prefix
00:03:16.497222 SELECT ST_AsBinary("way") AS geom,"feature","is_building","name","religion","way_pixels" FROM (SELECT\n way, name, religion, way_pixels, is_building,\n COALESCE(aeroway, amenity, wetland, power, landuse, leisure, man_made, "natural", shop, tourism, highway, railway) AS feature
00:02:39.464784 SELECT ST_AsBinary("way") AS geom,"bridge","int_intermittent","int_tunnel","waterway" FROM (SELECT\n way,\n waterway,
00:01:01.774667 SELECT ST_AsBinary("way") AS geom,"feature","link" FROM (SELECT\n way,\n (CASE WHEN feature IN ('highway_motorway_link', 'highway_trunk_link', 'highway_primary_link', 'highway_secondary_link', 'highway_tertiary_link') THEN substr(feature, 0, length(feature)-4) ELSE feature END) AS feature,
00:00:44.244365 SELECT ST_AsBinary("way") AS geom,"access","bicycle","construction","feature","horse","int_surface","link","tracktype" FROM (SELECT\n way,\n (CASE WHEN feature IN ('highway_motorway_link', 'highway_trunk_link', 'highway_primary_link', 'highway_secondary_link', 'highway_tertiary_link') THEN substr(feature, 0, length(feature)-4) ELSE feature END) AS feature,
00:00:43.933502 SELECT ST_AsBinary("way") AS geom FROM (SELECT\n way\n FROM planet_osm_line
00:00:36.795382 SELECT ST_AsBinary("way") AS geom,"access","feature","generator:source","height","iata","icao","is_building","location","name","shop","sport","way_pixels" FROM (SELECT\n *\n FROM
00:00:30.663109 SELECT ST_AsBinary("way") AS geom,"access","feature","height","iata","icao","name","way_pixels" FROM (SELECT\n *\n FROM

The math to prove it is complicated, but the higher the sum of the durations is, the more time the render server spent on those queries.

The first query is the landcover layer. Breaking down this specific query by zoom its

zoom sum
z13 2:42
z14 0:27
z15 0:05

The second layer is the water-lines layer

@imagico
Copy link
Collaborator

imagico commented Jan 9, 2021

Seems logical - the number of waterways mapped has increased massively over the past years, waterway=stream is now among the top ten most frequent tags.

However what these numbers don't tell is where there is inefficiency in the way we organize the layers. There is likely room for improvement here. The layers we currently have are:

layer zoom classes
water-lines-casing z13+ stream, ditch, drain
water-lines-low-zoom z8 - z11 river
water-lines z12+ all

While what we render (since #3467 - which was and still is fairly controversial) is:

river: z8+
canal: z12+
stream/ditch/drain: z14+

What i wonder about the numbers in addition is two things:

  • these are for z13+ obviously - is the pre-rendering at z0-z12 a non-issue ressources wise or would a similar analysis for that be prudent?
  • these are the time spent on database queries. Does anyone have any ideas how that compares to the time for actual rendering? That might be more relevant for the lower zoom levels where there is a lot more data that needs to be drawn. Still it would be important to know.

@pnorman
Copy link
Collaborator Author

pnorman commented Jan 9, 2021

  • these are for z13+ obviously - is the pre-rendering at z0-z12 a non-issue ressources wise or would a similar analysis for that be prudent?

On a busy server it's only running for a small portion of the month. It's also something that can be tested more easily since it doesn't depend on traffic patterns in the same way.

  • these are the time spent on database queries. Does anyone have any ideas how that compares to the time for actual rendering? That might be more relevant for the lower zoom levels where there is a lot more data that needs to be drawn. Still it would be important to know.

Yes, it'd be nice to know the complete picture but it's difficult to get details on how long is spent in Mapnik per layer without instrumenting it which is a lot more difficult than a sample-based approach.

@pnorman
Copy link
Collaborator Author

pnorman commented Jul 8, 2022

Closing as the information is stale, some of it has been addressed, and it's worth getting new data after #4431

@pnorman pnorman closed this as completed Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants