-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
venue popularity #493
venue popularity #493
Conversation
508a1b3
to
b285f38
Compare
1fe6ee5
to
6afc4d8
Compare
Thank you @missinglink I am willing to test this. How did you end up with these scores? are they rather arbitrary? |
Yep, it requires reindexing Once you've done that you can pull the new image from dockerhub by executing Finally you can reimport the openstreetmap data with The query logic should already be in place to take advantage of updated The scores are totally arbitrary, I just made them up :) It will also be possible for users to modify the config on their own Pelias setup by editing the file directly, or, using docker, by bind-mounting their own config over the one in the image. Let me know how you get on |
@bboure you may find this useful pelias/docker#103 |
Thank you @missinglink |
db5e1c4
to
97595b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible to get the area of the object here, it would be nice to increase the score.
For exemple, an hospital with a important surface should have an higher score than a point or smaller hospital.
We can also do this with the type of the object eg way
versus node
?
Woah, this PR got real big real fast. I really liked the simple initial version, although I still think this is an important thing to add, and really can't believe we haven't done it till now :) I'm a little hesitant with the many many additions that came in later commits, although I also see how they can be useful. I've been trying to think through ways this PR could cause problems or regressions. I can't come up with much, but I bet it's possibly for a generically named venue (something like "Market") to drown out other more specific results. It might be worth testing this PR on either a continent or planet scale before merging, to see if we can identify any cases like that, or if there's any unexpected behavior. |
One thing that came later was only setting I think it's a great feature to add, a couple of hesitations on my part:
Overall I think it's a really nice feature to have and will make the product more professional feeling, we just need to be careful not to upset the balance as a result. |
We have bounding boxes for In the case of a school I think you're right, a big school is probably more important than a small school, however the larger school could be mapped as a The same is also true of monuments, a tower is physically small but could be a more popular tourist attraction than an old football stadium which is much larger. Total edits could be interesting, it shows at least that the place is popular with mappers and so it's probably important enough to get correct, although I don't know what sort of score we would assign based on how many edits were made. Thoughts? |
I agree on that. A popular place will have many edits in OSM. I am not sure about the size/are. For example, Manneken Pis is very small, but also very famous 😄 The number of translations (name.*) might also be a sign that the place is famous internationally? (similar/synonym to I have been playing a bit with the current implementation and it gives me pretty accurate results on a City scale map (Barcelona). It probably can improve though. |
It might also be worthwhile to have a mechanism to allow for a proportional score. |
🎉 Super exciting progress! For reference, we added collision_rank in v1.7.0 of Tilezen to solve similar "same but different" sort problem. |
Heya, nice to see a bunch of interest in this PR! I've been thinking about this some more over the weekend and I think we all agree that the concept of In order to deal with that subjectivity, I offer this explanation of
Some advantages of adopting a consistent popularity score across all layers:
Thoughts/feels? |
Couple notes on elasticsearch scoring based on global popularity values:
I think the latter bullet point will be easier to achieve and more consistent when the values are more consistent. |
I saw Sarah Hoffmann from the Nominatim Project last week and she said their popularity scoring is solely based on Wikipedia. They compute the 'internal inbound link count' in Wikipedia for each OSM place with a concordance and use that value (ie. 'wiki page rank'). She said they were pretty happy with the results, the dump is available for download, it's about 6 years old but still pretty relevant. They plan to update the file this year. |
After this change, would it make sense to add a way in the api to fetch the top |
Hey,
How about having customizable scoring system? For example, if you are building a transportation system, you might want to boost train/bus stations higher, while still showing other results. Which leads me to thinking that it would also be nice to have a query-time booster by layer or category on the API as well.
This would give a higher score to documents with the given categories, but would still show other results. |
d496230
to
4d037ae
Compare
We (Geocode Earth) are currently looking at this issue again and hope to merge some code which allows for improved venue scoring soon. related: #385 |
aerodrome: { | ||
international: { _score: 10000 }, | ||
regional: { _score: 5000 } | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw I found using the small/medium/large categories from https://ourairports.com/data/airports.csv to be more useful than "international" - there are some small and medium international airports that don't deserve such a boost.
// transportation | ||
aerodrome: { | ||
international: { _score: 10000 }, | ||
regional: { _score: 5000 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe want a downweight on aerodrome:type=military https://www.openstreetmap.org/node/369160593 ?
regional: { _score: 5000 } | ||
}, | ||
iata: { | ||
_score: 5000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how much will this hurt a query for "CVS" that doesn't want the airport?
supermarket: { _score: 2000 }, | ||
civic: { _score: 2000 }, | ||
government: { _score: 2000 }, | ||
hospital: { _score: 2000 }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add historic here for Wrigley field? https://www.openstreetmap.org/relation/1407988
}, | ||
|
||
// transportation | ||
aerodrome: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another tag that might make sense is aeroway:aerodrome
. It looks like its on most of the major international airports like Phoenix and Miami.
_score: 5000, | ||
none: { _score: -5000 } | ||
}, | ||
railway: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like another related tag is public_transport=station
:
While the more common tag railway=station is used for all railway stations (i.e. including cargo), the public_transport=station tag is used only on the stations interesting for passenger transport.
This would hopefully help boost some more public transit stops like Park & Market which currently does not do very well in our tests
architect: { _score: 5000 }, | ||
heritage: { _score: 5000 }, | ||
'heritage:operator': { _score: 2000 }, | ||
historic: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding historic=heritage
for La Sagrada Familia?
As described in #537, the default set in #493, where all venues that have a calculated popularity below `0` are not imported, is a bit strict. This adds a config flag, `imports.openstreetmap.removeDisusedVenues` that controls whether or not that behavior is activated. In addition, when enabled, a `warning` is displayed for each removed record.
This feature has been requested in pelias/pelias#171
This PR adds the ability to increase a document
popularity
score based on which tags it has.I'd love to see some suggestions from the community on what they would like to see regarding 'importance' scoring in OSM.
Please suggest additional tags and scoring methodologies!
I will tune the final numbers in testing, these numbers are normalized using
log1p
before the scoring is applied in elasticsearch.We should only really use these numbers as a 'tie-breaker' for multiple venues with the same name, eg
Eiffel Tower
I also added some functionality to detect
abandoned
anddisused
places and give them negative popularity.In the case where a document has negative popularity, it is discarded.
resolves pelias/pelias#171