You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The issue with this is that the original housenumber (including alpha characters) is lost to the document, meaning we can't do later fine-grained sorting on it.
The disadvantage of phrase.default is that it will contain tokens from both the street and the housenumber, potentially producing undesirable matches. For non-address queries it will also contain additional tokens.
In this issue I would like to float the idea of having a 'subfield' of address_parts.number, call it something like address_parts.number.raw and use a different analyzer on it, such as peliasUnit (which doesn't strip the alpha chars).
This would remain backwards compatible while also adding an additional field address_parts.number.raw which contains both alpha and numeric tokens.
The benefits would be that we can then target this 'raw' field directly in our queries to do unit number sorting, et al.
The only minor disadvantage would be that the new field would increase the index size on-disk, although I expect this to be insubstantial (<~1%).
Also, if we're not going to use it then there's no sense in adding it.
Yeah, this is a really good idea. I can't remember if we've discussed it in GitHub issues before, but we should even consider expanding it and having a "strict" and a "loose" subfield for most of our fields.
Housenumbers with separating characters like Via del Ponticello 38/2 Trieste italy) (There's no issue for this yet AFAIK)
I'm sure there's more, right?
Using Elasticsearch subfields is pretty critical for this, we've known about it for a long time and IIRC it's fairly efficient compared to adding an entire new field
The
peliasHousenumber
analyzer strips non-numeric tokens.As discussed in pelias/pelias#810 this is somewhat unintuitive but actually works very well.
schema/settings.js
Lines 124 to 128 in 41bd2d1
The issue with this is that the original housenumber (including alpha characters) is lost to the document, meaning we can't do later fine-grained sorting on it.
As a workaround we're using the
phrase.default
field to get access to those tokens.The disadvantage of
phrase.default
is that it will contain tokens from both the street and the housenumber, potentially producing undesirable matches. For non-address queries it will also contain additional tokens.In this issue I would like to float the idea of having a 'subfield' of
address_parts.number
, call it something likeaddress_parts.number.raw
and use a different analyzer on it, such aspeliasUnit
(which doesn't strip the alpha chars).This would remain backwards compatible while also adding an additional field
address_parts.number.raw
which contains both alpha and numeric tokens.The benefits would be that we can then target this 'raw' field directly in our queries to do unit number sorting, et al.
The only minor disadvantage would be that the new field would increase the index size on-disk, although I expect this to be insubstantial (<~1%).
Also, if we're not going to use it then there's no sense in adding it.
cc/ @orangejulius @ianthetechie @Joxit
The text was updated successfully, but these errors were encountered: