Skip to content

Data mapping

Nicolas Colomer edited this page May 2, 2013 · 1 revision

OpenStreetMap data is organized in a relational model composed of data primitives: node, way and relation. These objects are linked to each other by their unique osmid. As relational, this model fits well in a RDBMS (commonly PostgreSQL + Postgis) and is exportable.

Even though XML is the official representation, OpenStreetMap also supports other compressed formats such as PBF (Protocol Buffers) or BZ2 (compressed XML). These files can be easily found on the Internet (see the Quick-start's Get OSM data section).

Osmosis is able to read both XML and PBF formats: it deserializes data into Java objects that can be processed through plugins. In our case, the elasticsearch-osmosis-plugin will convert these Java objects into their JSON equivalent prior to be inserted into elasticsearch using their osmid as document key.

1. OSM file example

Following examples and explanations are based on the following sample.osm file:

<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="OpenStreetMap server" copyright="OpenStreetMap and contributors" 
	attribution="http://www.openstreetmap.org/copyright" license="http://opendatacommons.org/licenses/odbl/1-0/">
  <node id="497017646" version="2" changeset="12638179" visible="true" timestamp="2012-08-06T19:59:42Z"
	lat="48.6757054" lon="2.3794174" user="inoskyh" uid="785001"/>
  <node id="497017647" version="2" changeset="12638179" visible="true" timestamp="2012-08-06T19:59:42Z"
	lat="48.6755698" lon="2.3795879" user="inoskyh" uid="785001"/>
  <node id="343866517" version="10" changeset="12638179" visible="true" timestamp="2012-08-06T19:59:42Z"
	lat="48.6752788" lon="2.3799338" user="inoskyh" uid="785001"/>
  <way id="40849832" visible="true" timestamp="2012-08-06T19:59:43Z" version="3" changeset="12638179" 
	user="inoskyh" uid="785001">
    <nd ref="497017646"/>
    <nd ref="497017647"/>
    <nd ref="343866517"/>
    <tag k="highway" v="residential"/>
    <tag k="name" v="Avenue Marc Sangnier"/>
  </way>
</osm>

2. Document model

The elasticsearch-osmosis-plugin converts and inserts OSM entities into an elasticsearch index, in different indices depending on the OSM entity type.

Produced JSON documents share common fields to allow geo-querying on multiple indices:

In addition, each way document contains the following fields:

  • The lenghtKm field represents the lenght (or perimeter if the way is closed) in kilometers
  • The areaKm2 field represents the area in square kilometers (equals 0 if the way is not closed)

Provided our precedent extract, you can expect the following:

  • All nodes will be stored into the node indice, with their osmid as elasticsearch id
{"centroid":[2.3794174,48.6757054],"shape":{"type":"point","coordinates":[2.3794174,48.6757054]},"tags":{}}
{"centroid":[2.3795879,48.6755698],"shape":{"type":"point","coordinates":[2.3795879,48.6755698]},"tags":{}}
{"centroid":[2.3799338,48.6752788],"shape":{"type":"point","coordinates":[2.3799338,48.6752788]},"tags":{}}
  • All ways will be store into the way indice, with their osmid as elasticsearch id
{
  "centroid": [2.379676881568899,48.67549366663964],
  "lengthKm": 0.07448669438396566,
  "areaKm2": 0,
  "shape": {
    "type": "linestring",
    "coordinates": [[2.3794174,48.6757054],[2.3795879,48.6755698],[2.3799338,48.6752788]]
  },
  "tags": {
    "highway": "residential",
    "name": "Avenue Marc Sangnier"
  }
}
  • All relations and bounds (not present in this exmaple) are ignored because not yet implemented.

3. Index mapping

The following mapping is applied by default on all indices of the index. You can override it if needed (see the Usage section).

{
  "_all": {"enabled": false},
  "dynamic_templates": [
    {
      "tags_exceptions": {
        "path_match": "tags.*",
        "match": "(name.*)",
        "match_pattern": "regex",
        "mapping": {
          "store": "no",
          "type": "multi_field",
          "fields": {
            "{name}": {"type": "string", "index": "not_analyzed"},
            "analyzed": {"type": "string", "index": "analyzed"}
          }
        }
      }
    },
    {
      "tags_default": {
        "path_match": "tags.*",
        "mapping": {"index": "not_analyzed", "store": "no"}
      }
    }
  ],
  "properties": {
    "centroid": {"type": "geo_point"},
    "shape": {"type": "geo_shape"}
  }
}