Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial French grammar rules to insert articles to way names #252

Merged
merged 16 commits into from
Jul 6, 2018

Conversation

yuryleb
Copy link
Contributor

@yuryleb yuryleb commented Jun 19, 2018

Issue

Initial French grammar rules prototype to insert articles to way names (#251).

Currently only Rue status street name is supported:
french articles

Tasklist

  • Add other status street names handling (?)
  • Add changelog entry
  • Test with osrm-frontend
  • Review

Requirements / Relations

New languages/overrides/fr.js script adds article option to most way_name keywords during translations export from Transifex, languages/grammar/fr.json contains regular expressions (currently one) to process street names, changed languages.js loads and registers that French expressions file. test/grammar_test.js is extended with test for French.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 19, 2018

The next step:
french elisions

Copy link

@Penegal Penegal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammatical notes, but nothing grave.

["^ All[ée]e ", " l’allée "],
["^ (L['’])?Autoroute ", " l’autoroute "],
["^ Avenue ", " l’avenue "],
["^ Boulevard ", " la boulevard "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We say « le boulevard ».

["^ (L['’])?Autoroute ", " l’autoroute "],
["^ Avenue ", " l’avenue "],
["^ Boulevard ", " la boulevard "],
["^ Chemin ", " la chemin "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We say « le chemin ».

["^ Cours ", " la cours "],
["^ Entr[ée]e ", " l’entrée "],
["^ Impasse ", " l’impasse "],
["^ Parvis ", " la parvis "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here.

["^ Entr[ée]e ", " l’entrée "],
["^ Impasse ", " l’impasse "],
["^ Parvis ", " la parvis "],
["^ Passage ", " la passage "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

["^ Parvis ", " la parvis "],
["^ Passage ", " la passage "],
["^ Place ", " la place "],
["^ Petit[\\- ]Pont ", " la petit-pont "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

["^ Petit[\\- ]Pont ", " de la petit-pont "],
["^ Pont ", " de la pont "],
["^ Promenade ", " de la promenade "],
["^ Quai ", " de la quai "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be more precise, it would be "du quai"

["^ Route ", " de la route "],
["^ Rue ", " de la rue "],
["^ Sortie ", " de la sortie "],
["^ Souterrain ", " de la souterrain "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"du souterrain"

["^ Rue ", " de la rue "],
["^ Sortie ", " de la sortie "],
["^ Souterrain ", " de la souterrain "],
["^ Square ", " de la square "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"du square"

["^ Sortie ", " de la sortie "],
["^ Souterrain ", " de la souterrain "],
["^ Square ", " de la square "],
["^ Tunnel ", " de la tunnel "],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"du tunnel"

["^ Tunnel ", " de la tunnel "],
["^ Voie ", " de la voie "],

["^ ([AÂÀEÈÉÊËHIÎÏOÔUÙÛÜYŸÆŒ])", " d’$1"],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is no real rule for elision before an h, so it can't be easily done apart with a list of all concerned words. I think you can remove it from this list and the other one. If you want to do it, here is a list of all h-beginning words which don't provoke an elision: https://fr.wikipedia.org/wiki/H_aspir%C3%A9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list looks too big to be inserted into this humble expressions array 😕 So it's really simpler just to exclude H from there

Copy link

@benjamintd benjamintd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

There are a few instances of masculine vs. feminine nouns to correct that @Penegal already noted, otherwise that's good.

@Penegal is that test case correct? I'm not sure whether the capital letter should go or not:

  • ['La Rochelle', 'elision', 'de la Rochelle']
    If it is I think that's a test case that we should add.

@Penegal
Copy link

Penegal commented Jun 20, 2018

@benjamintd: I would say the correct string would be de La Rochelle: La is an article which is not subject to elision, so it would have to stay, and, as it is part of the name, it should keep its capital first letter.

This would have been different if the name was, for instance, Les Marches, as the elision would form des Marches, without capital since de has none, but, since the La article is not subject to this kind of elision, it should keep its capital first letter.

@Penegal
Copy link

Penegal commented Jun 20, 2018

I also notice that grammar rules should be applied on two other cases:

  1. when a name is placed after at: for instance, the correct translation for You have arrived at Les Marches is Vous êtes arrivé aux Marches, with elision, and not Vous êtes arrivé à Les Marches;
  2. when a name is placed after onto: the correct translation of Merge left onto Avenue des Champs-Élysées is S’insérer à gauche sur l’avenue des Champs-Élysées, not S’insérer à gauche sur Avenue des Champs-Élysées: the second case have a faulty capital and misses the required article la, which is elided in l’ in this case.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 20, 2018

@Penegal, I need more detailed cases for replacing à to aux in Vous êtes arrivés à {waypoint_name} strings 😉 Or just aux should be used everywhere with {waypoint_name}?

@Penegal
Copy link

Penegal commented Jun 20, 2018

@yuryleb: à le is to be replaced by au, and à les by aux. This is not related to the first letter, as with l’, but to the first word; it is to be applied even if le or les is in the variable instead of the OSRM string, and is case insensitive: à Le is also to be replaced by au, and à Les by aux. You can see an example in my last comment.

"name": "Tourner {modifier} pour rester {way_name}",
"destination": "Tourner {modifier} en direction de {destination}",
"exit": "Tourner {modifier} sur {way_name}"
"name": "Tourner {modifier} pour rester {way_name:article}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct translation would be Tourner {modifier} pour rester sur {way_name:article}.

},
"uturn": {
"default": "Faire demi-tour à la fin de la route",
"name": "Faire demi-tour à la fin de la route {way_name}",
"destination": "Faire demi-tour à la fin de la route en direction de {destination}"
"name": "Faire demi-tour à la fin de la route {way_name:article}",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct translation would be Faire demi-tour à la fin {way_name:elision}.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: Faire demi-tour à la fin de {way_name:article}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Faire demi-tour à la fin de le boulevard Saint-Germain not Faire demi-tour à la fin du boulevard Saint-Germain?

Copy link

@Penegal Penegal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the name of the classes should be changed for a more descriptive version? Both classes elide, but one for an article, le/la/les, and the other for a preposition, de/des/du; naming the second case preposition could do it.

There could also be a redundancy in the computed instructions using the current rotary strings, but this should be tested, as I don't know enough real-world examples of roundabout names to be sure of it.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 20, 2018

Example of rotary:
french rotary

@Penegal
Copy link

Penegal commented Jun 21, 2018

Here is what I expected: the correct second instruction would be Prendre le rond-point de la place Charles de Gaulle. It should change according to the name of the roundabout; if it is named like Rond-point x, then the correct sentence would be Prendre le rond-point x, but, if it is another name, it should use the new elision rules: Place x would give Prendre le rond-point de la place x.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 21, 2018

@Penegal, now Transifex's recently changed uturn/name phrase becomes Faire demi-tour à la fin de la route {way_name:article} not Faire demi-tour à la fin {way_name:preposition} as early, that's no de/du/des preposition will be inserted before {way_name} after grammar processing - is it correct or override script should be also updated?

@Penegal
Copy link

Penegal commented Jun 21, 2018

@yuryleb: you should go with Faire demi-tour à la fin {way_name:preposition}, it's the correct sentence.

@Penegal
Copy link

Penegal commented Jun 21, 2018

@yuryleb: I just checked, and, according to an orthographic reform, rond-point can also be written rondpoint.

Also note that I recently updated some Transifex strings; in case of conflict, you can drop the new ones. Just warn me to allow me to restore them.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 21, 2018

Yes, I can support both but what should be in final text - rondpoint or rond-point?

And actually I could also merge your changes from Transifex if you completed - anyway I download them every time I check how override script works 😉

@Penegal
Copy link

Penegal commented Jun 22, 2018

The main orthograph is rond-point; the other one is controversial and most people would see it as an error.

If you can merge the Transifex changes with the one you did, perfect! I have a backup of the first ones, if needed, and I finished my edits, so you can go and merge.

@Penegal
Copy link

Penegal commented Jun 22, 2018

I noticed you added way types in your regex list; there could also be le sentier, which is something like a feminine form of sente.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 22, 2018

It seems all is done. @Penegal, @benjamintd, do you have anything to add?

@Penegal
Copy link

Penegal commented Jun 25, 2018

@yuryleb: de/du/des/le/la/l’ should not use a bold typeface unless they are part of the name/destination, as, in this case, they are technically not part of it.

Edit: also, village has 2 L.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 25, 2018

@Penegal, village has two L in expressions, it was just misprint in commit comment.

Formatting with bold typeface is part of external formatToken() implementation inside OSRM sample frontend (see https://github.com/Project-OSRM/osrm-frontend/blob/gh-pages/src/itinerary_builder.js#L11). Perhaps I could extend my last formatToken() fix with your de/du/des/le/la/l’ formatting proposal.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 25, 2018

@Penegal, so an expression to exclude appended articles/prepositions should look like ^(à )|(au )|(aux )|(d’)|(de )|(des )|(du )|(l’)|(la )|(le )|(les ) (case-sensitive to distinct appended words from existing in name)?

This expression will help to format as below:

  • la cour d’Honneur
  • de la cour d’Honneur
  • le rond-point de la cour d’Honneur

Is it enough or all added words should be also non-bold to keep only cour d’Honneur bolded?

@Penegal
Copy link

Penegal commented Jun 26, 2018

@yuryleb: I would say the correct typography would be de la cour d’Honneur.

@yuryleb
Copy link
Contributor Author

yuryleb commented Jun 26, 2018

@Penegal, please look how this is now implemented.

@Penegal
Copy link

Penegal commented Jun 28, 2018

@yuryleb: very good work. I'm OK with this.

@Penegal
Copy link

Penegal commented Jul 5, 2018

Is there a problem preventing an approving review?

Copy link

@benjamintd benjamintd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @yuryleb and @Penegal.

I left a small comment to understand one regexp better, but that can be left as-is or fixed in another PR.

Looks good to merge!

["^ Traverse ", " la traverse "],
["^ Tunnel ", " le tunnel "],
["^ Viaduc ", " le viaduc "],
["^ Villa(ge)? ", " le villa$1 "],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we looking to match other than Village? Unfortunately Villa is feminine in French (la villa)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I add other rules for Villa then

Copy link
Member

@1ec5 1ec5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you all for your attention to detail!

@1ec5 1ec5 merged commit 4e2c051 into Project-OSRM:master Jul 6, 2018
@yuryleb yuryleb deleted the fr-grammar-proto branch July 6, 2018 05:24
@Penegal
Copy link

Penegal commented Jul 6, 2018

How often is the code of map.project-osrm.org updated? Just wanted to see it live… 😁

@1ec5
Copy link
Member

1ec5 commented Jul 9, 2018

map.project-osrm.org is powered by OSRM-frontend. That repository will need to be updated, but first I have to publish a new version of OSRMTI.

1ec5 added a commit that referenced this pull request Jul 19, 2018
- Updated French localization with articles and prepositions insertion using grammar rules. #252
- Added a Burmese localization. #247
- Added a Finnish localization. #239
- Added a Korean localization. #243
- Updated translations in Simplified Chinese. #233 #248
- Updated Russian grammar with 'chord' status street name. #245
- Added Russian church names abbreviations. #237
- Corrected Lithuanian streets abbreviations. #238
1ec5 added a commit that referenced this pull request Jul 19, 2018
- Updated French localization with articles and prepositions insertion using grammar rules. #252
- Added a Burmese localization. #247
- Added a Finnish localization. #239
- Added a Korean localization. #243
- Updated translations in Simplified Chinese. #233 #248
- Updated Russian grammar with 'chord' status street name. #245
- Added Russian church names abbreviations. #237
- Corrected Lithuanian streets abbreviations. #238
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants