-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Synonyms for french addresses #301
Comments
This all sounds good. I am always nervous about changing the tokenizers because it changes the system so much. But, yes; in this case, I think this change would be for the best. In order to do this change safely we will need to run a build and test to ensure we haven't missed any edge cases. I would suggest we start a new branch which is clean, from there we make the minimal amount of changes required to improve |
On that note, I'm going to split out just the change to tokenize on whitespace from #291, so that we can test that change individually as well. |
This would also be a good opportunity to start putting together a French test suite. |
Okay, I suppose you'll want to do the change to the tokenizer yourself ? Please tell me if I can help with anything. |
@missinglink, following up on this topic, this might be a game-changer for the adoption of Pelias in the french community :) . @orangejulius any update on the split of #291 ? @adefarge is available to work for a few weeks on the project, are you guys open to getting organized so that we take advantage of the momentum to safely make and test those schema changes? If yes, please let us know how :) Cheers! |
@adefarge, we are looking for primarily french acceptance-tests. we need your local knowledge :) of course we will not say no to unit tests @loicortola yes, if you want to do a video/phone chat sometime soon to organize, that would be great. Feel free also to join the community call later this week. I have now split out just the whitespace tokenizing change from #291 in #307 and we can follow up with more parts of that PR. I'd also very much like to investigate tokenizing on |
That's cool :D |
Yes, very interested in this, apologies if I haven't been giving it the attention it deserves. Let's set it as an agenda item for the community call :) |
Just to summarize, we need to :
We will do that on one of our servers on the next couple of weeks. |
@adefarge that would be great. it's only important that the build includes france. so if you have a europe or even global build already, that's fine too |
Good point, but the full planet build we have is the one we use in production which is a big machine. |
Relative performance comparison should be ok to get us to the point where we can merge to master :) |
I added a new test suite here pelias/acceptance-tests#477. Here is the result with a build based on pelias/schema:master with data from france.
And here is the result with a build based on pelias/schema:portland_synonyms.
The execution times vary a lot but are roughly identical. Both builds are available on:
Both have data from France, NYC and London (except for polylines which is only for France as the import script doesn't work for multiple osm files). Feel free to test them as you like. We will keep them online for one or two weeks. |
This is great. Against the current geocode.earth build, the first two tests fail, as it looks like the OSM and OA records are not returned in the same order as your build (this sort of thing often happens as the order of two equally-scoring documents is randomly determined depending on the build) http://pelias.github.io/compare/#/v1/search%3Ftext=20%20Boulevard%20Saint-Martin,%20Paris This is actually great and I think it exposes two separate bugs: |
After looking at the source code, the dedupe logic should already handle capitalization differences. This is the result on our build :
Apparently there is 1 result from OSM and 2 from OA. But there is only one deduplication between the result from OSM and one of the result from OA. Anyway, this should not make the tests fail. |
Hello there, Here is a link with the compare for the geocode queries on our two version: https://pelias.jawg.io/. |
Wow very nice. I see you are using your own tiles too :) |
Thanks, yes, I changed the links to use our tiles. :) If you need tiles for pelias demos, we're here and it's free 😉 |
Hi @missinglink @orangejulius |
Hey @adefarge. We're going to kick off a full planet build this week with these changes. Stay tuned :) |
Hi everyone! However, we've found that there's an Elasticsearch query parameter called Once #310 is merged, a change to tokenize on hyphens as well would hopefully be straightforward. |
Connects pelias/schema#301 Connects pelias/api#1268 Connects pelias/api#1279
Connects pelias/schema#301 Connects pelias/schema#375 Connects pelias/schema#65
Connects pelias/schema#301 Connects pelias/schema#375 Connects pelias/schema#65
Connects pelias/schema#301 Connects pelias/schema#375 Connects pelias/schema#65
Fixed by pelias/schema#453 Connects pelias/schema#301
French addresses are relevant here: pelias/schema#301
This should be fixed with #453 In fact only {
"parser": "libpostal",
"parsed_text": {
"street": "boulevard saint",
"city": "martin"
}
}
The result on autocomplete is {
"parser": "pelias",
"parsed_text": {
"subject": "boulevard",
"locality": "boulevard",
"region": "saint martin",
"admin": "saint martin"
}
}
So I close this 🎉 |
Fixed by pelias/schema#453 Connects pelias/schema#301
French addresses are relevant here: pelias/schema#301
Fixed by pelias/schema#453 Connects pelias/schema#301
French addresses are relevant here: pelias/schema#301
related: openvenues/libpostal#499 |
Hi,
French addresses don't work when using the search feature with abbreviations.
For example:
boulevard saint-martin
worksbd saint-martin
doesn't workboulevard saint martin
doesn't work eitherIdeally, we would like to find the address with the input
bd st martin
.I see two problems here:
custom_street.txt
don't work because thepeliasStreetTokenizer
doesn't split on whitespace. But I see you are fixing this in Portland synonyms #291saint martin
doesn't matchsaint-martin
because of the hyphen. This can be solved by adding-
to the name and street tokenizer.A more complete list of localized synonyms is also needed but that is less a problem because we can just add our own in our local install of pelias.
What do you think ?
The text was updated successfully, but these errors were encountered: