parses an OpenStreetmap PBF file and exports all addresses e.g. objects with addr:*
tags on them. It also reassembles admin and postcode boundarys and extends the addressinformation
with those.
A typical address looks like this:
"city": "Paderborn",
"geomcity": "Paderborn",
"geomcounty": "Kreis Paderborn",
"geompostcode": "33098",
"geomsuburb": "Paderborn",
"housenumber": "2a",
"housename": "foobar",
"id": "25906061",
"lat": "51.717383",
"lon": "8.752651",
"postcode": "33098",
"source": "node",
"street": "Marienplatz"
All the geom*
tags originate from the boundaries, be it postcode or admin boundary.
and source
tell the origin osm object be it node or way and its id.
Additionally you could run the extractor with "-e" so it will add an error field describing problems with this address - It could look like this:
"errors": [
"No housenumber",
"No postcode",
"No addr:street or addr:place"
addressextract -i mylocal.pbf >addresses.json
To filter or extract addresses from the resulting json there is either the tool addrfilter
addrfilter -p 33330 -i owl.json -c
To export all address with postcode 33330 and dump it as CSV. An alternative is to use jq
jq -r '.addresses[] | [ .id, .source, .postcode, .city, .street, .housenumber ] | @csv' owl.json
apt-get install build-essential cmake-data cmake libboost-all-dev \
libspatialindex-dev libgdal-dev libbz2-dev libexpat1-dev
git clone
cd addressextract
git clone
git clone
git clone
cmake .
Using Debian/Buster to setup a solr:
sudo apt-get install solr-jetty jq
sudo cp -rav searchasyoutype/solr36conf/* /etc/solr/
sudo service jetty9 restart
addressextract -i <mylittlepbf | jq .addresses >addresses.json
searchasyoutype/pushtosolr http://localhost:8080/solr addresses.json
Now you can query the solr for results:
curl http://localhost:8080/solr/address/?q=postcode:3333%20street:Heidestra%C3%9Fe
Typical query time is less than 50ms and the result looks like this:
"responseHeader": {
"status": 0,
"QTime": 13,
"params": {
"q": "postcode:33330 street:Heidestraße"
"response": {
"numFound": 26262,
"start": 0,
"docs": [
"city": "Gütersloh",
"geomcity": "Gütersloh",
"geompostcode": "33330",
"housenumber": "1c",
"id": "7380458133",
"postcode": "33330",
"street": "Heidestraße"
[ ... ]
- hausnummern mit spaces oder "," z.b. "4a,b" oder "5a,5b" 33602,Bielefeld,LOOM, Bahnhofstraße,28,52.025702,8.531427,node,5202495577,33602,Bielefeld,,