Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Replace or upgrade Zend_Translate #6221

Closed
chillu opened this issue Oct 25, 2016 · 11 comments
Closed

RFC Replace or upgrade Zend_Translate #6221

chillu opened this issue Oct 25, 2016 · 11 comments

Comments

@chillu
Copy link
Member

chillu commented Oct 25, 2016

Split out from RFC Replace Zend_Locale with new i18n backend

Introduction

SilverStripe uses Zend Framework v1 for message translations (i18n::_t()) through Zend_Translate. This dependency is out of date, Zend has declared EOL in September 2016. Our goal is to either upgrade or replace the library.

We've written a custom YAML adapter, since Zend_Translate doesn't support the format natively. The Zend_Translate API is used both indirectly in i18n::_t(), and directly through i18n:: register_translator(<Zend_Translate>).

Solution Requirements

Note: Some of these requirements might be fulfilled by the existing i18n class.

  • Include translations by module (i18n:: include_by_locale())
  • Translation fallbacks from full locale to lang (e.g. de_AT => de)
  • Translation fallbacks to i18n::$default_locale
  • Caching of parsed translation files (e.g. YAML)
  • Lookup performance: i18n::_t() is called dozens of times on each CMS render
  • Default message fallback defined on _t() call
  • Sprintf-style string interpolation (e.g. _t('There are %s monkeys', [5]))
  • Translator cascade for multiple format support (not really used in practice, was an effort to smoothly upgrade from PHP to YAML based translations)
  • Format compatibility with Transifex translation service
  • Message "groups" in translation files through dot notation (e.g. LeftAndMain.SAVE)
  • Rails YAML format support (or rewriting our existing RailsYaml.php adapter). If we switch to a different format, we also need to fix text collection

Options

1. Symfony Translate

Currently at version 3.1. Seems to have native support for string placeholders. Initial impression is that the symfony localisation is slightly more polished and feature-rich than zend. Documentation http://symfony.com/doc/current/translation.html

Dependencies: (400k)

  • symfony/translation

2. CakePHP

Currently at version 3.3. This library is similar to zend framework in that it provides a wrapper around php-intl extension. However, internally message translation is handled via the aua.intl extension (https://github.com/auraphp/Aura.Intl). If necessary we could extract the message translation directly from aura.intl, and rely directly on php-intl for string and number formatting. Documentation http://book.cakephp.org/3.0/en/core-libraries/internationalization-and-localization.html

Dependencies: (600k)

  • php-intl
  • cakephp/chronos
  • cakephp/i18n
  • cakephp/core
  • cakephp/utility
  • aura/intl
  • aura/installer-default

3. Zf2-i18n

Currently at version 2.7, which is also the current version for this component in the current zend-framework 3.0. The good news is that this implies the 2.x branch of this module will be supported within the immediate future. This is probably the least disruptive upgrade as zendframework/zend-i18n replaces Zend_Translate API.

The API has seen a major rewrite between 1.x and 2.x (before vs. after), so it's unclear how much rewrite effort is required on our side (e.g. for the custom Rails YAML adapter). It's likely limited since we don't use a lot of its API surface.

Documentation at https://docs.zendframework.com/tutorials/i18n/ and https://framework.zend.com/manual/2.4/en/modules/zend.i18n.translating.html

Dependencies: (1.5mb)

  • php-intl
  • zendframework/zend-i18n
  • zendframework/zend-eventmanager
  • zendframework/zend-loader
  • zendframework/zend-view
  • zendframework/zend-stdlib

4. Write our own i18n wrapper around php-intl

Take the i18n class as a base and replicate the message loading and locale fallback management via a new SS API. Use php-intl for the actual message formatting (placeholders, plurals). This would essentially replicate the API surface existing for Zend/Symfony/CakePHP libraries. It also means we're likely just writing our own YAML loader, leaving devs who rely on other formats in SS3 through Zend_Translate stranded in SS4.

Notes

  • Currently we have a large investment in transifex as a database source for managing and releasing new translations (see https://www.transifex.com/silverstripe/silverstripe-framework/). Any future solution will be required to support the existing translation structure.
  • The php-intl MessageFormatter API is pretty minimal (single-message only), so we need a library solution on top of it either way
  • We've written a custom RailsYAML Zend adapter for Zend (so would need to port this to ZF2)
  • With the increased frontend-based views in SS4 and beyond, more translations need to move from PHP (YAML format) into JS (JSON format). Since there's hundreds of translation hours contained in Transifex, we shouldn't just duplicate untranslated messages from YAML into JSON. We could duplicate YAML into JSON as part of the translation download from Transifex, or just use JSON on both sides (Symony has a JSONFileLoader, Transifex supports it as well)
  • Comments on individual translations are very helpful to provide context (when exposed to translators on transifex). We've lost that ability when migrating from PHP to YAML formats.
  • We should look into better pluralisation, some countries have plurals based on count
  • Symfony Translation supports quite a few formats
  • Multi-line YAML has always been a pain so we'll need to test that specifically
  • Symfony Translation uses the same YAML parser dependency we already need for other SS work (e.g. Config YAML)
  • Symfony Translation has message domains which are the eqivalent of our nested YAML formats. It looks like the YamlFileLoader can import nested keys.

Related

@chillu chillu added this to the CMS 4.0.0-alpha4 milestone Oct 25, 2016
@sminnee sminnee modified the milestones: CMS 4.0.0-beta1, CMS 4.0.0-alpha4 Oct 31, 2016
@sminnee sminnee modified the milestones: CMS 4.0.0-alpha5, CMS 4.0.0-beta1 Jan 11, 2017
@tractorcow tractorcow self-assigned this Jan 17, 2017
@tractorcow
Copy link
Contributor

@tractorcow
Copy link
Contributor

Added fixes for cms at https://github.com/open-sausages/silverstripe-cms/tree/pulls/4.0/i18n-symfony

framework tests are more or less passing finally, after discovering a big hole in my initial proof of concept. :)

@sminnee
Copy link
Member

sminnee commented Jan 24, 2017

A couple of notes after a discussion with @tractorcow:

  • Skipping the default values in a _t() call should be treated as a mistake and throw a warning. In the docs / wherever else appropriate, we should indicate that for reusable modules we recommend the convention of providing a UK English string as the default value.
  • If the default value is missed, fix the current bug where textcollector doesn't pick up the string. Strings without default values should still be available for translation.
  • The context option can be dropped as it's API bloat.
  • Pluralisation support can be added as a separate pluralise method. Pluralised translations will sent to Symfony as a pipe-delimited string, and if you attempt to use these strings with _t() instead of pluralise() you'll get a warning.

@tractorcow
Copy link
Contributor

Implemented PR at #6558

@chillu
Copy link
Member Author

chillu commented Jan 24, 2017

A bit more discussion with Damian:

Text collection is only one workflow to manage translations in your SilverStripe code. More advanced teams might skip this step in favour of managing translations separately. I've worked on enterprisey projects where each UI message was collected in a wiki by BAs before the development started, and it's easy to see how that would apply to translations as well (managed in dedicated external tools). This approach might not be in line with our Agile values, but that doesn't mean we should prohibit it from a SilverStripe API perspective.

With this in mind, it doesn't make sense to require a default $string argument - since it forces developers to duplicate the defaults from their canonical version managed directly in the master language file (en.yml). Incidentally, we already do that in core for JavaScript translations because we don't have a JavaScript-compatible text collector. So we should throw a warning when you try to run text collection without a default, but the _t() call shouldn't require it.

The $context argument is only relevant because of our core text collection practices. Transifex supports context through YAML comments, but we can't pass on that information because the Symfony YAML Dumper doesn't support it. This is an issue in the internals of SilverStripe core, and can be worked around either by using another dumper, or doing a second string pattern matching pass on the generated YAML.

I don't see "context" as an advanced use case, it's a practice that should become more rather than less frequent in core. Translators are looking at strings in transifex completely out of context, so rely on these annotations e.g. to identify if something is a noun or verb, a short button label or a message window with lots of room for longer translations.

That being said, the $context argument is only present because we need it for text collection. If you skip this step, you can add comments directly to YAML. I'm starting to think that we should just move text collection into a module, and stop using it in core - rather than fixing it up for JavaScript, and dealing with its inherent quirks. This would mean every new _t() call would require a manual edit of a YAML file - an acceptable practice that's common in other frameworks (e.g. vanilla Rails). There's wider implications with removing text collection from core though (e.g. how we deal with computed translation keys in i18nEntityCollector). I don't want to block this card to sort this out first, so my recommendation is to keep $context.

On a related note, I'm recommending to keep JavaScript translations in the YAML master files instead of JSON (see "Notes" in RFC description here). This works around the issue that JSON can't store comments, hence we can't pass on context through to transifex there.

@tractorcow
Copy link
Contributor

tractorcow commented Jan 25, 2017

I've updated based on feedback and requested changes. I've deviated slightly from my expected plan slightly, which I'll describe below.

Skipping the default values in a _t() call should be treated as a mistake and throw a warning. In the docs / wherever else appropriate, we should indicate that for reusable modules we recommend the convention of providing a UK English string as the default value.

I've made this a warning, but it's possible to turn this warning off.

If the default value is missed, fix the current bug where textcollector doesn't pick up the string. Strings without default values should still be available for translation.

It was troublesome to resolve duplicate strings where there are many non-empty versions of a string across modules, so I've opted to continue not collecting empty strings. Based on discussion with @chillu, we feel that the i18n system is presented as the standard localisation system we expect to be used within silverstripe, but the i18nTextCollector is specific to the format we use for our own core and release process (based on yml / transifex). With that in mind, if someone had a custom localisation format (and wrote their own localisation reader / writer) then they could also write their own text collector.

It being a warning is probably sufficient for our needs.

The context option can be dropped as it's API bloat.

After discussion with @chillu I've restored this, but solidified the API slightly. The "normalised localisation" data for any key can now have a "comment" key, as well as a "default" key. This normalised data is available to be read from / written to the Reader / Writer interfaces. However, our standard YmlReader / YmlWriter is currently discarding this key intentionally. However, it opens the door for custom localisation formats, such as those listed on https://docs.transifex.com/formats/introduction, which may support this metadata.

Pluralisation support can be added as a separate pluralise method. Pluralised translations will sent to Symfony as a pipe-delimited string, and if you attempt to use these strings with _t() instead of pluralise() you'll get a warning.

Instead of doing this, I've simply rolled pluralisation into the _t method. I've used string-inspection to detect if we are attempting to localise (contains '{count}' and a single '|' character). This means we get text-collection for plurals for free (which I'd otherwise have to do another day).

@tractorcow
Copy link
Contributor

Updated at #6558

@wernerkrauss
Copy link
Contributor

I often see strings like "This {item} saved sucessfully", where "This" can be translated - depeding on the item - as "Dieser", "Diese" or "Dieses" (male, female, thing) in German. Are there concepts available to address this issue?

@tractorcow
Copy link
Contributor

tractorcow commented Jan 25, 2017

We could use pluralisation, which allows us to present record-specific grammar.

1 item saved successfully? where {pluralised} saved successfully is the root localisation string, and you could localise the {pluralised} on a per-record type basis, e.g. 1 page|{count} pages

However the obvious downside is the generic usage of numbers, rather than the "this" or "the" grammar which might be more appropriate in certain contexts.

How do you feel about this?

@wernerkrauss
Copy link
Contributor

this/the is according to the saved /deleted etc. DataObject, so i guess the DataObject should know if it's male/female/thing?

E.g. CMSMain.NEWPAGE: "Neue {pagetype}" could be "Neue FooSeite" but "Neuer Kalender", depending on the (german) name of my page type.

@chillu
Copy link
Member Author

chillu commented Jan 30, 2017

@wernerkrauss Yeah the {pluralised} only works when you have a predetermined amount of variations (CMSMain.NEWPAGE_FooSeite vs CMSMain.NEWPAGE_Kalender). We could check if a specific translation exists for this composite key, and fall back to CMSMain.NEWPAGE? I think trying to resolve prenoun rules across all languages in a PHP API will end in tears. Unless you have an example of a library doing that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants