-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar support for Russian way names #102
Conversation
Thanks @yuryleb. As mentioned before, we need to do some more thinking here. Blockers seem to be:
|
28e2300
to
3eb40ee
Compare
@freenerd, @1ec5, @oxidase, finally I solved to add grammatical case labels in new I don't think Transifex or any other localization service will allow arbitrary keywords expansion. Actually So let's meet the first route instructions with proper grammatical cases at least in Russian 🎉 BTW other OSRMTI-related projects (osrm-text-instructions.swift, osrm-text-instructions.java, etc?) should also be updated to expect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I like the general approach. It looks like there's a to-do or two in here, so I'll hold off on a final review until you're ready.
|
||
Many languages - all Slavic (Russian, Ukrainian, Polish, Bulgarian, etc), Finnic (Finnish, Estonian) and others - have [grammatical case feature](https://en.wikipedia.org/wiki/Grammatical_case) that could be supported in OSRM Text Instructions too. | ||
Originally street names are being inserted into instructions as they're in OSM map - in [nominative case](https://en.wikipedia.org/wiki/Nominative_case). | ||
To be grammatically correct, street names should be changed according to target language rules and instruction context before insertion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the various cases always regular and easy to determine from the nominative case? I noticed a few features have been tagged name:dative
in OpenStreetMap – would it be desirable or necessary for OSRM to pass such tags along to OSRM Text Instructions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO it's not deal of OSM - it's impossible to store all case variants in all languages for each name. Perhaps CLDR will be the better place (if will 😉 )
|
||
- Russian regular expressions are based on [Garmin Russian TTS voices update](https://github.com/yuryleb/garmin-russian-tts-voices) project; see [file with regular expressions to apply to source text before pronouncing by TTS](https://github.com/yuryleb/garmin-russian-tts-voices/blob/master/src/Pycckuu__Milena%202.10/RULESET.TXT). | ||
- There is another grammar-supporting module - [jquery.i18n](https://github.com/wikimedia/jquery.i18n) - but unfortunately it has very poor implementation in part of grammatical case applying and is supposed to work with single words only. | ||
- Actually it would be great to get street names also in target language not from default OSM `name` only - there are several multi-lingual countries supporting several `name:<lang>` names for streets. But this the subject to address to [OSRM engine](https://github.com/Project-OSRM/osrm-backend) first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea – please file an issue in osrm-backend.
``` | ||
- All such JSON files should be registered in common [languages.js](languages.js) | ||
- Instruction text formatter ([index.js](index.js) in this module) should: | ||
- check `{way_name}` and `{rotary_name}` variables for optional grammar case after colon: `{way_name:accusative}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this syntax – it leaves open the possibility of additional transformations like capitalization, which could replace the "capitalizeFirstLetter" meta property currently set on some localizations.
|
||
### Notes | ||
|
||
- Russian regular expressions are based on [Garmin Russian TTS voices update](https://github.com/yuryleb/garmin-russian-tts-voices) project; see [file with regular expressions to apply to source text before pronouncing by TTS](https://github.com/yuryleb/garmin-russian-tts-voices/blob/master/src/Pycckuu__Milena%202.10/RULESET.TXT). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for dedicating this rule set to the public domain!
var rules = grammars[language][version][grammar]; | ||
if (rules) { | ||
// Pass original name to rules' regular expressions enclosed with spaces for simplier parsing | ||
var n = ' ' + name + ' '; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's currently possible for clients to manipulate a step's name before compilation. For example, the Mapbox Directions API currently inserts SSML markup around numbers within the name so that speech synthesizers pronounce them more casually. I think that could potentially interfere with this feature. That would be a good argument for implementing #52, so that clients can reliably manipulate the name after grammaticalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, if clients will not damage {way_name:*}
vars - for example, "Restore names highlighting" PR adds HTML tags around these vars and all works as on snapshot above.
But if Mapbox Directions inserts SSML tags right instead {way_name}
, this looks wrong - for example, Nuance uses \tn=address\
tags in its Vocalizer voices to specify start of address text specially to pronounce numbers and known street abbreviations as address but doesn't change original address/name text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually it's no problem, even if Mapbox will catch and replace {way_name}
and skip {way_name:accusative}
etc - this allows further "grammarization" to work 😉 SSML tags are actually important only for English addresses, Russian & many others have no special requirements for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m referring to a JavaScript port of mapbox/mapbox-navigation-ios#552 for the Directions API’s (not yet documented) voice_instructions
parameter. Essentially we’re working around a problem where Amazon Polly assumes an address is abbreviated, whereas OSM never abbreviates road names. The workaround is to wrap only numbers in <say-as interpret-as="address">
tags, since we still want numbers to be optimized for speech even if the rest of the words can’t be treated as part of an address. For the JavaScript port, we’re changing the value of way_name
itself, so that by the time compile()
runs, way_name
contains SSML code. 😬
I’ll be the first to admit it’s a hack, but I wanted to point this out because clients may be inclined to do similar things to the value of way_name
unless we advise against doing so in this project’s documentation. If this PR forces us to implement #52, that’d be a wonderful outcome in my opinion. 😉
/cc @allierowan
index.js
Outdated
// Pass original name to rules' regular expressions enclosed with spaces for simplier parsing | ||
var n = ' ' + name + ' '; | ||
rules.forEach(function(rule) { | ||
var re = new RegExp(rule[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these rules be case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no (and no for g
option for all replaces not only for the first match), at least for Russian OSM street names this is unnecessary - there are many working street names validators those keep them compliant to Russian street names agreement. And also all current grammar regular expressions take into account potentially capital first word letter.
But for general case it looks useful to add some meta
block into grammar JSON to specify required regular expression options explicitly.
['Третье Транспортное кольцо', 'genitive', 'Третьего Транспортного кольца'], | ||
['Третье Транспортное кольцо', 'prepositional', 'Третьем Транспортном кольце'] | ||
] | ||
// TODO add your language grammar tests to call grammarize() and check result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what the code below does, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the place to add other language's test strings to check other language's grammar when it will be.
Thanks @yuryleb! I’m going to merge this PR now, but feel free to iterate on the feature in separate PRs if you have any additional changes to make. |
New feature for OSRM Text Instructions - grammatical cases support for way names. This is quite important for many languages (Russian, Ukrainian, Polish, Bulgarian, Finnish, Estonian and many others) - otherwise translated strings with inserted way names (in nominative case as in OSM) look grammatically incorrect in most cases.
This feature extends {way_name} and {rotary_name} in instructions with optional grammar case label and so requires update of other applications used OSRM Text Instructions data (Ruby #26 and Swift #53).
Unfortunately this proposal is incompatible with Transifex (see #101) and so can't be easily merged. This PR is supposed mostly for proposed changes review in details and to illustrate this problem for public.