Project-OSRM · 1ec5 · Oct 4, 2017 · Apr 6, 2017 · Oct 4, 2017 · 1ec5
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,6 +1,10 @@
 # Change Log
 All notable changes to this project will be documented in this file. For change log formatting, see http://keepachangelog.com/
 
+## master
+
+- Added grammatical cases support for Russian way names [#102](https://github.com/Project-OSRM/osrm-text-instructions/pull/102)
+
 ## 0.7.1 2017-09-26
 
 - Added Castilian Spanish localization. [#163](https://github.com/Project-OSRM/osrm-text-instructions/pull/163)
@@ -73,7 +77,7 @@ All notable changes to this project will be documented in this file. For change
 
 ## 0.1.0 2016-11-17
 
-- Improve chinese translation
+- Improve Chinese translation
 - Standardize capitalizeFirstLetter meta key
 - Change instructions object customization to options.hooks.tokenizedInstruction
 

diff --git a/Grammar.md b/Grammar.md
@@ -0,0 +1,53 @@
+## Grammar support
+
+Many languages - all Slavic (Russian, Ukrainian, Polish, Bulgarian, etc), Finnic (Finnish, Estonian) and others - have [grammatical case feature](https://en.wikipedia.org/wiki/Grammatical_case) that could be supported in OSRM Text Instructions too.
+Originally street names are being inserted into instructions as they're in OSM map - in [nominative case](https://en.wikipedia.org/wiki/Nominative_case).
+To be grammatically correct, street names should be changed according to target language rules and instruction context before insertion.
+
+Actually grammatical case applying is not the simple and obvious task due to real-life languages complexity.
+It even looks so hard so, for example, all known native Russian navigation systems don't speak street names in their pronounceable route instructions at all.
+
+But fortunately street names have restricted lexicon and naming rules and so this task could be relatively easily solved for this particular case.
+
+### Implementation details
+
+The quite universal and simplier solution is the changing street names with the prepared set of regular expressions grouped by required grammatical case.
+The required grammatical case should be specified right in instruction's substitution variables:
+
+- `{way_name}` and `{rotary_name}` variables in translated instructions should be appended with required grammar case name after colon: `{way_name:accusative}` for example
+- [languages/grammar](languages/grammar/) folder should contain language-specific JSON file with regular expressions for specified grammar case:
+```json
+{
+    "v5": {
+        "accusative": [
+            ["^ (\\S+)ая-(\\S+)ая [Уу]лица ", " $1ую-$2ую улицу "],
+            ["^ (\\S+)ая [Уу]лица ", " $1ую улицу "],
+            ...
+```
+- All such JSON files should be registered in common [languages.js](languages.js)
+- Instruction text formatter ([index.js](index.js) in this module) should:
+  - check `{way_name}` and `{rotary_name}` variables for optional grammar case after colon: `{way_name:accusative}`
+  - find appropriate regular expressions block for target language and specified grammar case
+  - call standard [string replace with regular expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace) for each expression in block passing result from previous call to the next; the first call should enclose original street name with whitespaces to make parsing words in names a bit simplier.
+- Strings replacement with regular expression is available in almost all other programming language and so this should not be the problem for other code used OSRM Text Instructions' data only.
+- If there is no regular expression matched source name (that's for names from foreign country for example), original name is returned without changes. This is also expected behavior of standard [string replace with regular expression](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace). And the same behavior is expected in case of missing grammar JSON file or grammar case inside it.
+
+### Example
+
+Russian _"Большая Монетная улица"_ street from St Petersburg (_Big Monetary Street_ in rough translation) after processing with [Russian grammar rules](languages/grammar/ru.json) will look in following instructions as:
+- _"Turn left onto `{way_name}`"_ => `ru`:_"Поверните налево на `{way_name:accusative}`"_ => _"Поверните налево на Большую Монетную улицу"_
+- _"Continue onto `{way_name}`"_ => `ru`:_"Продолжите движение по `{way_name:dative}`"_ => _"Продолжите движение по Большой Монетной улице"_
+- _"Make a U-turn onto `{way_name}` at the end of the road"_ => `ru`:_"Развернитесь в конце `{way_name:genitive}`"_ => _"Развернитесь в конце Большой Монетной улицы"_
+- _"Make a U-turn onto `{way_name}`"_ => `ru`:_"Развернитесь на `{way_name:prepositional}`"_ => _"Развернитесь на Большой Монетной улице"_
+
+### Design goals
+
+- __Cross platform__ - uses the same data-driven approach as OSRM Text Instructions
+- __Test suite__ - has [prepared test](test/grammar_tests.js) to check available expressions automatically and has easily extendable language-specific names testing pattern
+- __Customization__ - could be easily extended for other languages with adding new regular expressions blocks into [grammar support](languages/grammar/) folder and modifying `{way_name}` and other variables in translated instructions only with necessary grammatical case labels
+
+### Notes
+
+- Russian regular expressions are based on [Garmin Russian TTS voices update](https://github.com/yuryleb/garmin-russian-tts-voices) project; see [file with regular expressions to apply to source text before pronouncing by TTS](https://github.com/yuryleb/garmin-russian-tts-voices/blob/master/src/Pycckuu__Milena%202.10/RULESET.TXT).
+- There is another grammar-supporting module - [jquery.i18n](https://github.com/wikimedia/jquery.i18n) - but unfortunately it has very poor implementation in part of grammatical case applying and is supposed to work with single words only.
+- Actually it would be great to get street names also in target language not from default OSM `name` only - there are several multi-lingual countries supporting several `name:<lang>` names for streets. But this the subject to address to [OSRM engine](https://github.com/Project-OSRM/osrm-backend) first.
diff --git a/Readme.md b/Readme.md
@@ -8,6 +8,10 @@ OSRM Text Instructions transforms [OSRM](http://www.project-osrm.org/) route res
 
 OSRM Text Instructions has been translated into [several languages](https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/translations/). Please help us add support for the languages you speak [using Transifex](https://www.transifex.com/project-osrm/osrm-text-instructions/).
 
+OSRM Text Instructions could support [grammatical cases](https://github.com/Project-OSRM/osrm-text-instructions/tree/master/Grammar.md) for street names for [some languages](https://github.com/Project-OSRM/osrm-text-instructions/tree/languages/grammar/).
+
+Grammatical cases and other translated strings customization after [Transifex](https://www.transifex.com/project-osrm/osrm-text-instructions/) is handled by [override scripts](https://github.com/Project-OSRM/osrm-text-instructions/tree/master/languages/overrides/).
+
 [![NPM](https://nodei.co/npm/osrm-text-instructions.png)](https://npmjs.org/package/osrm-text-instructions/)
 
 ### Design goals

diff --git a/index.js b/index.js
@@ -1,5 +1,6 @@
 var languages = require('./languages');
 var instructions = languages.instructions;
+var grammars = languages.grammars;
 
 module.exports = function(version, _options) {
     var opts = {};
@@ -104,7 +105,6 @@ module.exports = function(version, _options) {
             switch (type) {
             case 'use lane':
                 laneInstruction = instructions[language][version].constants.lanes[this.laneConfig(step)];
-
                 if (!laneInstruction) {
                     // If the lane combination is not found, default to continue straight
                     instructionObject = instructions[language][version]['use lane'].no_lanes;
@@ -199,10 +199,37 @@ module.exports = function(version, _options) {
 
             return this.tokenize(language, instruction, replaceTokens);
         },
+        grammarize: function(language, name, grammar) {
+            // Process way/rotary name with applying grammar rules if any
+            if (name && grammar && grammars && grammars[language] && grammars[language][version]) {
+                var rules = grammars[language][version][grammar];
+                if (rules) {
+                    // Pass original name to rules' regular expressions enclosed with spaces for simplier parsing
+                    var n = ' ' + name + ' ';
+                    var flags = grammars[language].meta.regExpFlags || '';
+                    rules.forEach(function(rule) {
+                        var re = new RegExp(rule[0], flags);
+                        n = n.replace(re, rule[1]);
+                    });
+
+                    return n.trim();
+                }
+            }
+
+            return name;
+        },
         tokenize: function(language, instruction, tokens) {
-            var output =  Object.keys(tokens).reduce(function(memo, token) {
-                return memo.replace('{' + token + '}', tokens[token]);
-            }, instruction)
+            // Keep this function context to use in inline function below (no arrow functions in ES4)
+            var that = this;
+            var output = instruction.replace(/\{(\w+):?(\w+)?\}/g, function(token, tag, grammar) {
+                var name = tokens[tag];
+                if (typeof name !== 'undefined') {
+                    return that.grammarize(language, name, grammar);
+                }
+
+                // Return unknown token unchanged
+                return token;
+            })
             .replace(/ {2}/g, ' '); // remove excess spaces
 
             if (instructions[language].meta.capitalizeFirstLetter) {

diff --git a/languages.js b/languages.js
@@ -1,4 +1,4 @@
-// Load all language files excplicitely to allow integration
+// Load all language files explicitly to allow integration
 // with bundling tools like webpack and browserify
 var instructionsDe = require('./languages/translations/de.json');
 var instructionsEn = require('./languages/translations/en.json');
@@ -19,6 +19,8 @@ var instructionsUk = require('./languages/translations/uk.json');
 var instructionsVi = require('./languages/translations/vi.json');
 var instructionsZhHans = require('./languages/translations/zh-Hans.json');
 
+// Load all grammar files
+var grammarRu = require('./languages/grammar/ru.json');
 
 // Create a list of supported codes
 var instructions = {
@@ -42,7 +44,13 @@ var instructions = {
     'zh-Hans': instructionsZhHans
 };
 
+// Create list of supported grammar
+var grammars = {
+    'ru': grammarRu
+};
+
 module.exports = {
     supportedCodes: Object.keys(instructions),
-    instructions: instructions
+    instructions: instructions,
+    grammars: grammars
 };