Work-in-progress implementation of new flex backend

osm2pgsql-dev · Jan 11, 2020 · 8b14c67 · 8b14c67
1 parent 1268f10
commit 8b14c67
Show file tree

Hide file tree

Showing 30 changed files with 4,546 additions and 2 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -8,7 +8,7 @@ addons:
   # Here we install only packages that are the same for all builds on ubuntu.
   apt:
     packages: ['python3-psycopg2', 'libexpat1-dev', 'libpq-dev', 'libbz2-dev',
-               'libproj-dev', 'libluajit-5.1-dev',
+               'libproj-dev', 'libluajit-5.1-dev', 'lua-messagepack',
                'libboost-dev', 'libboost-system-dev', 'libboost-filesystem-dev']
 
 # env: T="...."     //  please set an unique test id (T="..")

diff --git a/flex-config/README.md b/flex-config/README.md
@@ -0,0 +1,150 @@
+# Flex Backend Configuration
+
+The "Flex" backend is configured through a Lua file which defines the structure
+of the output tables and is used to map OSM data to the data format to be used
+in the database. This way you have a lot of control over how the data should
+look like in the database.
+
+## Lua config file
+
+All configuration is done through the `osm2pgsql` object in Lua. It has the
+following fields:
+
+* `osm2pgsql.version`: The version of osm2pgsql as string.
+* `osm2pgsql.srid`: The SRID set on the command line (with `-l`, `-m`, or `-E`).
+* `osm2pgsql.mode`: Either `create` or `append` depending on the command line options.
+* `osm2pgsql.stage`: Either 1 or 2 (1st/2nd stage processing the data). See below.
+* `osm2pgsql.userdata`: To store your user data. See below.
+
+The following functions are defined:
+
+* `osm2pgsql.define_node_table(name, columns)`: Define a node table with the
+  specified name and columns.
+* `osm2pgsql.define_way_table(name, columns)`: Define a way table with the
+  specified name and columns.
+* `osm2pgsql.define_relation_table(name, columns)`: Define a relation table
+  with the specified name and columns.
+* `osm2pgsql.define_area_table(name, columns)`: Define an area table
+  with the specified name and columns.
+* `osm2pgsql.define_table(data)`: Define a table.
+* `osm2pgsql.mark(type, id)`: Mark the OSM object of the specified type ('w'
+  or 'r') with the specified id. The OSM object will trigger a call to the
+  processing function again in the second stage.
+* `osm2pgsql.get_bbox()`: Get the bounding box of the current node or way. Only
+  works inside the `osm2pgsql.process_node()` and `osm2pgsql.process_way()`
+  functions.
+
+You are expected to define one or more of the following functions:
+
+* `osm2pgsql.process_node(data)`: Called for each node.
+* `osm2pgsql.process_way(data)`: Called for each way.
+* `osm2pgsql.process_relation(data)`: Called for each relation.
+
+Any fields starting with an underscore (`_`) are reserved for internal use
+of osm2pgsql and must not be accessed in any way.
+
+### Defining a table
+
+You have to define one or more tables where your data should end up. This
+is done with the `osm2pgsql.define_table()` function or one of the slightly
+more convenient functions `osm2pgsql.define_(node|way|relation|area_table()`.
+
+Each table is either a *node table*, *way table*, *relation table*, or *area
+table*. This means that the data for that table comes primarily from a node,
+way, relation, or area respectively. Osm2pgsql makes sure that the OSM object
+id will be stored in the table so that later updates to those OSM objects (or
+deletions) will be properly reflected in the tables. Area tables are special,
+they can contain data derived from ways or from relations. Way ids will be
+stored as is, relation ids will be stored as negative numbers. (You can define
+tables that don't have any ids, but those tables will never be updated by
+osm2pgsql.)
+
+If you are using the `osm2pgsql.define_(node|way|relation|area_table()`
+convenience functions, osm2pgsql will automatically create an id column named
+`(node|way|relation|area)_id`, respectively. If you want more control over
+the id column(s), use the `osm2pgsql.define_table()` function.
+
+Most tables will have a geometry column. (Currently only zero or one geometry
+columns are supported.) The types of the geometry column possible depend on
+the type of the input data. For node tables you are pretty much restricted
+to point geometries, but there is a variety of options for relation tables
+for instance.
+
+Supported geometry types:
+* `geometry`: Any kind of geometry. Also used for area tables that should hold
+  both polygon and multipolygon geometries.
+* `point`: Point geometry, usually created from nodes.
+* `linestring`: Linestring geometry, usually created from ways.
+* `polygon`: Polygon geometry for area tables, created from ways or relations.
+* `multipoint`: Currently not used.
+* `multilinestring`: Created from (possibly split up) ways.
+* `multipolygon`: For area tables, created from ways or relations.
+
+The only thing you have to do here is to define the geometry type you want and
+osm2pgsql will create the right geometry for you from the OSM data and fill it
+in.
+
+In addition to id and geometry columns, each table can have any number of
+"normal" columns using any type supported be PostgreSQL. Some types are
+specially recognized by osm2pgsql and it adds some support for them. But
+you can use any SQL type you want, in which case you have to make sure are
+creating the right text format for these columns.
+
+Available column types:
+* `text`: Text string
+* `boolean`: Interprets values `"true"`, `"yes"` as `true` and everything else
+   as `"false"`
+* `int2`, `smallint`: 16bit signed integer
+* `int4`, `int`, `integer`: 32bit signed integer
+* `int8`, `bigint`: 64bit signed integer
+* `real`: A real number
+* `hstore`: Can be created automatically from a Lua table
+* `json` and `jsonb`: Not supported yet
+* `direction`: Interprets values `"true"`, `"yes"`, and `"1"` as 1, `"-1"` as
+  `-1`, and everything else as `0`. Useful for `oneway` tags etc.
+* `area`: The area of the (polygon) geometry.
+
+
+## Command line options
+
+Use the command line option `-O flex` or `--output=flex` to enable the flex
+backend and the `-S|--style` option to set the Lua config file.
+
+The following command line options have a somewhat different meaning when
+using the flex backend:
+
+* `-p|--prefix`: The table names you are setting in your Lua config files
+  will *not* get this prefix.
+* `-S|--style`: Use this to specify the Lua config file. Without it, osm2pgsql
+  will not work, because it will try to read the default style file which
+  the flex backend doesn't understand.
+* `-G|--multi-geometry` is not used. Set the column type of the output table
+  to the type you want instead, for instance `polygon` vs. `multipolygon`.
+
+The following command line options are ignored by `osm2pgsl` when using the
+flex backend, because they don't make sense in that context:
+
+* `-k|--hstore`
+* `-j|--hstore-all`
+* `-z|--hstore-column`
+* `--hstore-match-only`
+* `--hstore-add-index`
+* `-K|--keep-coastlines` (Coastline tags are not handled specially in the
+  flex backend.)
+* `--tag-transform-script` (Set the Lua config file with the `-S|--style`
+  option.)
+
+## Example config files
+
+This directory contains example config files for the flex backend. All config
+files contain comments as documentation.
+
+If you are learning about the flex backend, read the config files in the
+following order (from easiest to understand to the more complex ones):
+
+1. [simple.lua](simple.lua)
+2. [multipolygons.lua](multipolygons.lua)
+3. [advanced.lua](advanced.lua)
+4. [highway-shields.lua](highway-shields.lua)
+5. [unitable.lua](unitable.lua)
+
diff --git a/flex-config/advanced.lua b/flex-config/advanced.lua
@@ -0,0 +1,166 @@
+
+-- Read and understand simple.lua and multipolygons.lua first, before you try
+-- to understand this file.
+
+inspect = require('inspect')
+
+print("osm2pgsql version: " .. osm2pgsql.version)
+
+-- Are we running in "create" or "append" mode?
+print("osm2pgsql mode: " .. osm2pgsql.mode)
+
+-- Which stage in the data processing is this?
+print("osm2pgsql stage: " .. osm2pgsql.stage)
+
+-- Uncomment the following line to see the userdata (but, careful, it might be
+--                                                   a lot of data)
+-- print("osm2pgsql userdata: " .. inspect(osm2pgsql.userdata))
+
+tables = {}
+
+tables.pois = osm2pgsql.define_node_table("pois", {
+    { column = 'tags', type = 'hstore' },
+    { column = 'geom', type = 'point' },
+})
+
+tables.ways = osm2pgsql.define_way_table("ways", {
+    { column = 'tags', type = 'hstore' },
+    { column = 'geom', type = 'linestring' },
+})
+
+-- Using the define_table function allows some more control over the id columns
+-- than the more convenient define_(node|way|relation|area)_table functions.
+-- In this case we are setting the name of the id column to "osm_id".
+tables.polygons = osm2pgsql.define_table{
+    name = "polygons",
+    ids = { type = 'area', id_column = 'osm_id' },
+    columns = {
+        { column = 'tags', type = 'hstore' },
+        { column = 'geom', type = 'geometry' },
+    }
+}
+
+-- A table for all route relations
+tables.routes = osm2pgsql.define_relation_table("routes", {
+    { column = 'tags', type = 'hstore' },
+    { column = 'geom', type = 'multilinestring' },
+})
+
+-- A table for all individual members of route relations
+-- (Note that this script doesn't handle ways in multiple relations correctly.)
+tables.route_members = osm2pgsql.define_table{
+    name = "route_members",
+    ids = { type = 'way', id_column = 'way_id' },
+    columns = {
+        { column = 'rel_id', type = 'int8' }, -- not a specially handled id column
+        { column = 'tags',   type = 'hstore' }, -- tags from member way
+        { column = 'role',   type = 'text' }, -- role in the relation
+        { column = 'rtags',  type = 'hstore' }, -- tags from relation
+        { column = 'geom',   type = 'linestring' },
+    }
+}
+
+function is_empty(some_table)
+    return next(some_table) == nil
+end
+
+function clean_tags(tags)
+    tags.odbl = nil
+    tags.created_by = nil
+    tags.source = nil
+    tags["source:ref"] = nil
+    tags["source:name"] = nil
+end
+
+function osm2pgsql.process_node(data)
+    clean_tags(data.tags)
+    if is_empty(data.tags) then
+        return
+    end
+
+    tables.pois:add_row({
+        tags = data.tags
+    })
+end
+
+function osm2pgsql.process_way(data)
+--    print(inspect(data))
+
+    clean_tags(data.tags)
+    if is_empty(data.tags) then
+        return
+    end
+
+    -- osm2pgsql.stage: either 1 or 2 for first or second pass through the data
+    if osm2pgsql.stage == 2 then
+        local row = {
+            rel_id = 0,
+            tags = data.tags,
+            role = '',
+            rtags = {},
+        }
+        member_data = osm2pgsql.userdata.w2r[data.id]
+        if member_data then
+            row.rel_id = member_data.rel_id
+            row.role = member_data.role
+            row.rtags = osm2pgsql.userdata.route_tags[row.rel_id]
+        end
+        -- print(inspect(row))
+        tables.route_members:add_row(row)
+        return
+    end
+
+    if data.is_closed then
+         tables.polygons:add_row({
+            tags = data.tags
+        })
+    else
+         tables.ways:add_row({
+            tags = data.tags
+        })
+    end
+end
+
+function osm2pgsql.process_relation(data)
+--    print(inspect(data))
+
+    clean_tags(data.tags)
+    if is_empty(data.tags) then
+        return
+    end
+
+    if data.tags.type == 'multipolygon' or data.tags.type == 'boundary' then
+         tables.polygons:add_row({
+            tags = data.tags
+        })
+    elseif data.tags.type == 'route' and data.tags.route == 'hiking' then
+        tables.routes:add_row({
+            tags = data.tags
+        })
+
+        if not osm2pgsql.userdata.route_tags then
+            osm2pgsql.userdata.route_tags = {}
+        end
+
+        if not osm2pgsql.userdata.w2r then
+            osm2pgsql.userdata.w2r = {}
+        end
+
+        osm2pgsql.userdata.route_tags[data.id] = data.tags
+
+        -- Go through all the members...
+        for i, member in ipairs(data.members) do
+            if member.type == 'w' then
+                -- Mark the member way as "interesting", the "process_way"
+                -- callback will be triggered again in the second stage
+                osm2pgsql.mark('w', member.ref)
+                -- print("mark way id " .. member.ref)
+                osm2pgsql.userdata.w2r[member.ref] = {
+                    rel_id = data.id,
+                    role = member.role,
+                }
+            end
+        end
+    end
+end
+