Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create new endpoint for geometries of cities #1121

Open
Joxit opened this issue Apr 9, 2018 · 6 comments
Open

Create new endpoint for geometries of cities #1121

Joxit opened this issue Apr 9, 2018 · 6 comments

Comments

@Joxit
Copy link
Member

Joxit commented Apr 9, 2018

Hi,

A new endpoint could be interesting.
An endpoint that would send the geometry of a city using Who's On First ID.
Thanks to this we could highlight cities (like Who's On First spelunker or Google).
This could be another microservice (like pip-service or a new one) and should only serve localities/localadmins (because the regions/countries will be too large).
The answer should also be filtered, do not send the raw data from WOF (keep proprieties about geometry and bbox for example).

Example of request with properties:

GET /v1/place/geom?id=101751119 HTTP/1.1

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{
  "id": 101751119,
  "type": "Feature",
  "properties": {
    "geom:area":0.012915,
    "geom:area_square_m":105073572.071737,
    "geom:bbox":"2.22407844723,48.8155766678,2.46976095835,48.9021619073",
    "geom:latitude":48.856626,
    "geom:longitude":2.342883,
    "src:geom":"mz",
    "wof:country":"FR",
    "wof:geomhash":"bcec769e4dc183e582c6cd5dbfd4de9a",
    "wof:id":101751119,
    "wof:lang":[
        "fre"
    ],
    "wof:name":"Paris",
    "wof:placetype":"locality",
},
  "bbox": [
    2.224078447233478,
    48.81557666781795,
    2.469760958345428,
    48.90216190732557
],
  "geometry": {"coordinates":[[],[]],"type":"MultiPolygon"}
}
@orangejulius
Copy link
Member

Yes we definitely want this. We've been talking about it for a long time. The entire point of the /v1/place endpoint is to return more details, such as geometry, but we never got around to implementing it.

The venicegeo team has been working on importing geometries into Elasticsearch as part of WOF records. I don't know if they've done a full planet import, in which case they have not had to worry about the 100MB New Zealand polygon. I like your idea of not returning countries, since those geometries do get quite large. We could also create simplified geometries for display in larger cases.

@missinglink
Copy link
Member

missinglink commented Apr 9, 2018

It's an interesting idea and a few people have asked for it.

In the past, we've stayed away from providing information for display, visual polygons weren't considered part of the 'core geocoding experience'.
On the other hand, it kind of makes sense, as we have all the data anyway so the feature wouldn't be super difficult to build.

An important consideration is that the API would need to be capable of performing douglas peucker style simplification on the polygons, some are very large (100's of MB) and wouldn't be suitable for display on mobile devices.

The simplification algorithm is CPU-bound and would be a heavy burden on the machines serving it.
An alternative solution would be to pre-simplify the geometries and store them in some sort of cache.

I personally feel that when these things are considered, another technology such as vector tiles is more suited to this job.
They would further expand on this API by adding a 'tiled' strategy, allowing a leaflet-style map to render tesselated polygons as they loaded.

We'd need to think a little more about which service features were going to be available and what the scope of the project would be.

@Joxit
Copy link
Member Author

Joxit commented Apr 9, 2018

@orangejulius Oh yes, venicegeo is doing it in a simple way. It may be a good idea to integrate it into Elasticsearch instead of having filesystem only like pip-service.
Countries are large and will use CPU + bandwidth (ES to pelias-api to Client), this will slow down requests, that's why I think we can drop them.

@missinglink That's right, this is not really related to geocoding but is a cool feature (and can provide a wow effect).
I don't know if it's the API which should simplify the polygon or WOF importer (like venicegeo). I think we can directly store the simplified version because the raw geometry will never be sent (because it's not the first goal).

@Joxit
Copy link
Member Author

Joxit commented Apr 12, 2018

I thought of something else, if data are stored in elasticsearch, you have to remember that the geometries should not be sent when we have search/autocomplete queries like this one: /v1/search?text=Paris&sources=wof.
I do not know if elasticsearch can filter the documents before sending a response. If that's not possible it will be the API that will do it, but it will slow down the queries.

@orangejulius
Copy link
Member

orangejulius commented Apr 12, 2018

Yes @Joxit that's a very good point. The overhead of sending the geometry JSON to the API with each request would be huge.

Fortunately it looks like it's possible to return only certain fields. It looks like it's different in Elasticsearch 5 vs 2 though, so we should watch out for that.

@Joxit
Copy link
Member Author

Joxit commented Sep 22, 2018

That's cool, there is also an exclude source 😄

Here is a preview of this issue using whosonfirst-data/whosonfirst-data repository.

orangejulius added a commit to pelias/schema that referenced this issue Oct 23, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the
_source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohitibive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our different
fields. In my testing of the Portland, Oregon Docker project, which has
about 1.8 million documents, this change reduces the disk space usage
from 551MB to 492MB, or about 10%!
orangejulius added a commit to pelias/schema that referenced this issue Oct 23, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the
_source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohitibive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our different
fields. In my testing of the Portland, Oregon Docker project, which has
about 1.8 million documents, this change reduces the disk space usage
from 551MB to 492MB, or about 10%!

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99

Fixes #99
orangejulius added a commit to pelias/schema that referenced this issue Oct 23, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.
In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
orangejulius added a commit to pelias/schema that referenced this issue Oct 23, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Efectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
orangejulius added a commit to pelias/schema that referenced this issue Oct 23, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
JWileczek pushed a commit to JWileczek/schema that referenced this issue Oct 26, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes pelias#99
orangejulius added a commit to pelias/schema that referenced this issue Nov 2, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
orangejulius added a commit to pelias/schema that referenced this issue Nov 3, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
orangejulius added a commit to pelias/schema that referenced this issue Nov 3, 2018
Background
==========

I always thought that it was important to use the `store` parameter to
specify whether a field should be stored, in addition to indexing, and
the default was to not store a field for later retrieval.

It turns out this isn't true, and that all fields are [copied to the _source](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/mapping-store.html)
field by default.

Setting `"store": "yes"` is only needed if, in addition to getting a
field back as part of the `_source` (which contains _every_ field
in the document), we wanted to be able to return a single field. Pelias
doesn't currently do this, we always ask Elasticsearch for the entire
`_source` field.

In addition, Elasticsearch has a [source filtering](https://www.elastic.co/guide/en/elasticsearch/reference/2.4/search-request-source-filtering.html)
feature, so if we ever wanted to return only some of `_source` (which
might someday be the case with something like
pelias/api#1121), the only reason we would
want to bother with `"store": "yes"` is if the size of the `_source`
field was so prohibitive we didn't even want Elasticsearch to fetch all
of it from disk. That might be a concern some day, but not today.

Changes
==========

This PR removes all `"store": "yes"` parameters for all of our fields.

Effectively, we were storing a lot of fields on disk twice, which was
wasting space.

In my testing of the Portland, Oregon Docker project, which has about
1.8 million documents, this change reduces the disk space usage from
551MB to 492MB, or about 10%!

_Sidenote:_ If there are other fields we _do_ want to keep out of the
`_source` field,
[`_source.exclude`](https://github.com/pelias/schema/blob/master/mappings/document.js#L158-L159) in our document mapping is how we can do it.

After this change, I'm now pretty confident we are doing the right thing
for all our fields when it comes to storing, and analyzers so this
closes #99
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants