Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC Replace Zend_Locale with new i18n backend #6194

Closed
15 tasks done
tractorcow opened this issue Oct 17, 2016 · 24 comments
Closed
15 tasks done

RFC Replace Zend_Locale with new i18n backend #6194

tractorcow opened this issue Oct 17, 2016 · 24 comments

Comments

@tractorcow
Copy link
Contributor

tractorcow commented Oct 17, 2016

See #5359
dev list post: https://groups.google.com/forum/#!topic/silverstripe-dev/BVgFrxb7ISw

Introduction

Currently SilverStripe framework relies on Zend framework v1 for several core localisation features. This includes the below Zend components:

  • Zend_Locale
    • Text direction
    • Geographic information
    • Numeric formatting
    • Date formatting
  • Zend_Currency
    • Currency formatting
  • Zend_Date
    • Date formatting

The libraries these features are built on top of are several years out of date and require upgrading.

Exclusions

Note that the following other zend components remain a dependency of other parts of the CMS and are not covered by this RFC, even if these may be affected by the outcome of the decided direction.

Solution Requirements:

The chosen solution will need to either provide the below functionality, or have added on top of the core library to meet the necessary functionality:

  • Support for mi_NZ locale
  • Date formatting
  • Numeric formatting
  • Currency formatting

PHP INTL Dependency

Many modern i18n Solutions require the php intl library. Currently SilverStripe does not, although evading this as a requirement in future releases will limit our options.

Symfony currently provides a polyfill for intl https://github.com/symfony/polyfill.

Our options are:

  1. Choose a solution that does rely on php-intl and require it as a core dependency
  2. Choose a solution that does rely on php-intl, but rely on a polyfill
  3. Choose a solution that does not rely on php-intl
  4. Require php-intl to be installed to support i18n, but allow i18n to be disabled.

Investigation of symfony's polyfill implementation seems extremely incomplete, or implemented at a very shallow level only. E.g. stubs containing MethodNotImplementedException and hard-coded support for en locale only. Supporting or offering an incomplete solution in my mind is worse than no solution at all.

I recommend option 4, as it allows us to build a full i18n backend that will operate with all necessary features, but provides room for the community to develop a custom backend (or backends) to support environments without php-intl.

Example i18n Options:

Given that there are a huge number of possible i18n solution libraries, below is a non-exhaustive list of examples of solutions that could be considered.

1. Write i18n wrapper over php-intl

If we rely on php-intl, we could simplify the current i18n class into a wrapper over this extension. This solution would be the most work and be the hardest to maintain, but would allow us to minimise our dependencies.

Separate components would need to be built for string / number manipulation, as well as message translations, with the bulk of the work in implementing a translation mechanism which is compatible with our current yml translation system.

Dependencies:

  • php-intl

2. Zf2-i18n

Currently at version 2.7, which is also the current version for this component in the current zend-framework 3.0. The good news is that this implies the 2.x branch of this module will be supported within the immediate future.

This is probably the least disruptive upgrade as zendframework/zend-i18n replaces Zend_Translate API. This provides standard formatters for Currency, Date, Number, and Pluralisation.

Documentation at:

Dependencies: (1.5mb)

  • php-intl
  • zendframework/zend-i18n
  • zendframework/zend-eventmanager
  • zendframework/zend-loader
  • zendframework/zend-view
  • zendframework/zend-stdlib

4. Cakephp intl

Mainly a message formatting library, but has some php-intl wrappers around date and number formatting. Documentation http://book.cakephp.org/3.0/en/core-libraries/internationalization-and-localization.html

Dependencies: (600k)

  • php-intl
  • cakephp/chronos
  • cakephp/i18n
  • cakephp/core
  • cakephp/utility
  • aura/intl
  • aura/installer-default

Conclusion

Based on the solution we choose to pursue, it appears that there may be as many as two separate components; One for message localisation, and another for formatting of strings / date / time / numbers.

The desired response to this RFC should be in the form of either suggestion of an unnamed solution, or support for one of the prior named solutions.

The desired outcome of this RFC will be a mutually agreed upon technical direction, and a next step to explore. This will include our desired response to php-intl as a requirement.

Related

Tasks

  • Upgrade i18n locale / country / language transformation and localisation to use php-intl
  • Standardise FormField::setSubmittedValue() for user-facing localised input
  • Upgrade DateField
  • Upgrade TimeField
  • Upgrade DatetimeField
  • Upgrade Member
  • Upgrade MemberDatetimeOptionsetField
  • Upgrade NumericField
  • Upgrade DBDate
  • Upgrade DBTime
  • Upgrade DBDateTime
  • Upgrade DMoney
  • Upgrade MoneyField
  • Test locally
  • Raise PRs API Substitute Zend_Locale / Zend_Date with php-intl #6607

Notes:

  • Investigate rtl / ltr language support
@chillu
Copy link
Member

chillu commented Oct 18, 2016

I think within the "Date formatting" requirements, we need to list a couple of sub-requirements:

  • ISO/CU format support (Zend default, php-intl IntlDateFormatter default, different from PHP's date() and DateTime::format()). Used for Member.Locale preferences (and corresponding localised output). For example, two-digit months in ICU is MM, in date() it's m. So we can't easily fall back to date() here, unless we create a conversion layer similar to DateField::convert_iso_to_jquery_format)
  • Localised day and month names, both long and short (preconfigured so authors can select their language). Could use date_format()+set_locale())
  • Localised country/language lookups (dropdown in CMS author profile - we could extract that information from CLDR into our YAML files) We currently hardcode this to English via i18n::$all_locales
  • Preferred date format for locale (long/short). Might get away with that and just getting authors to choose from a few custom formats. See Zend_Locale_Format::getDateFormat() and Member->getCMSFields()
  • Script direction for a given locale (see i18n::get_script_direction())

For "Number formatting":

  • Default number formatting (thousands/decimal separators) based on current locale (used in NumericField with Zend_Locale_Format::getNumber(). We could solve this through number_format() and localeconf() instead.

And for message translation, any new solution we choose should also support the following new feature:

  • Define count-based plurals (some languages have more than one plural form), see parseMessage()

Write i18n wrapper over php-intl

I don't think there's much value in a full wrapper (the API surface is huge) - it's an extension supported by the PHP project itself, so creating an abstraction layer in order to minimise risk of third party deprecations seems overkill. In the short term, we should pass through existing i18n methods to php-intl, but deprecate those in favour of using php-intl directly (for date/number/currency formats).

But by adding another half a dozen methods to our existing i18n API (with some refactoring and dependency injection), we should be able to cover off the use cases listed above. The problem is removing reliance on ISO date formats - we'd have to investigate how feasible (non-ambiguous) a format conversion is.

Require php-intl to be installed to support i18n, but allow i18n to be disabled.

According to Google Analytics on silverstripe.org, about 30% of our visitors have a browser locale other than en-us or en-gb. In terms of developer directory (and hence a likely content author location), it's about 45% countries where English isn't the official language. So I'd expect a significant part of SilverStripe installs to require internationalisation (at least a third).

Requiring php-intl will raise the bar for getting SilverStripe installed (e.g. check some SO posts).

Other projects requiring php-intl: CakePHP, Magento

Projects not requiring it: Drupal, Typo3, Concrete5, ModX, Moodle, OctoberCMS, Yii, Wordpress, Laravel (unsurprisingly since it's just a framework without UI)

Transifex Dependency

We could deprecate the YAML translation format (and provide a conversion script) if it limits our choices too much. Transifex can deal with lots of other formats: http://docs.transifex.com/formats/, incl. XLIFF and PHP.

We'll also need to check if the locale list in i18n::$all_locales (and the lang/ file names) is the same as the ones used in the php-intl API or any other i18n libs, and if you can add potentially custom locales like mi_NZ there (discussion)

@chillu
Copy link
Member

chillu commented Oct 18, 2016

BTW, Drupal originally supported date() only, and had added a magic switch between date() and ISO date formats in Drupal 8: https://www.drupal.org/node/2276183. As far as I can tell, they've decided to revert this back to date() only before the Drupal 8 final release: drupal/drupal@949e7fe

This is a bit different to our situation though: We're already using ISO formats, so the challenge is the other way around.

@tractorcow
Copy link
Contributor Author

tractorcow commented Oct 18, 2016

ISO/CU format support (Zend default, php-intl IntlDateFormatter default, different from PHP's date() and DateTime::format()). Used for Member.Locale preferences (and corresponding localised output). For example, two-digit months in ICU is MM, in date() it's m. So we can't easily fall back to date() here, unless we create a conversion layer similar to DateField::convert_iso_to_jquery_format)

Yes zend used RFC 2822 format date, which I had originally documented, but I decided to keep an open mind about format specificity. :) I'd be happy to add this back into the requirements.

Based on what you've stated above, even allowing i18n to be disabled with a lack of a core dependency could be a pretty huge cost. Perhaps it's worth looking at an i18n custom option that negated the need to use php-intl (or at least approximated the functionality adequately in its absence).

This is most definitely the highest maintenance of the options.

@tractorcow
Copy link
Contributor Author

Preferred date format for locale (long/short). Might get away with that and just getting authors to choose from a few custom formats. See Zend_Locale_Format::getDateFormat() and Member->getCMSFields()

The problem is that there may not be an author in certain cases (e.g. anonymous front-end users). The i18n system should really be able to infer standard formats without requiring human interaction.

@bummzack
Copy link
Contributor

Is php intl really such a big hurdle?
AFAIK it is bundled with PHP since 5.3 and I've checked some of the hosting providers I use and didn't find an instance where it's not installed. I don't think php intl falls into the category of an "exotic" extension, does it?

Do you collect information about the installed PHP extensions with the SilverStripe installer? If yes, it might be useful to see what the coverage of php intl is like?

As for the preferred solution: I'd stick with Zend. Reasoning: Easier migration and it seems to cover the requirements.

@sminnee
Copy link
Member

sminnee commented Oct 19, 2016

It would be good to understand whether php-intl is provided by default in the current stable versions of:

  • Debian
  • centos
  • ubuntu
  • php.net windows download
  • xampp
  • Mamp
  • homebrew

@sminnee
Copy link
Member

sminnee commented Oct 19, 2016

I was initially nervous about requiring php-intl as a dependency for all users but if half of users are likely to benefit from l10n, making it optional adds substantial cost, and php-intl is widespread amongst current installs, maybe it's okay to require.

@sminnee
Copy link
Member

sminnee commented Oct 19, 2016

So,

  • Debian and Ubuntu require sudo apt-get install php5-intl, then it works
  • Homebrew requires brew install php56-intl

I'm assuming that CentOS is similar.

@chillu
Copy link
Member

chillu commented Oct 20, 2016

The problem is that there may not be an author in certain cases (e.g. anonymous front-end users). The i18n system should really be able to infer standard formats without requiring human interaction.

It does (through i18n::$date_format, but if an author is logged in then LeftAndMain will overwrite this configuration with their preference.

I was initially nervous about requiring php-intl as a dependency for all users but if half of users are likely to benefit from l10n, making it optional adds substantial cost, and php-intl is widespread amongst current installs, maybe it's okay to require.

I think for self-hosting we should be fine. For managed hosting, it's a bit harder to find out about support. I've looked at a few common hosting providers. Basically anybody claiming to host Magento needs to have php-intl, which should make it pretty common already. Also anyone allowing SSH access and/or CPanel should have it.

So we'll be trailblazers with this decision, but it seems doable. Worth putting a vote to the community? Phrased as "if we require php-intl, we can spend more resources on other features"

@sminnee
Copy link
Member

sminnee commented Oct 20, 2016

OK so on the basis of all that my recommendation is that we require php-intl, and refactor our current Zend code just to use the php-intl library directly, at least for date/time/currency/number stuff.

I don't have a clear idea of what we would need between php-intl and transifex to to our string replacements, but I think that the current family of _t() helpers should continue working more or less without changes, as these are peppered throughout everyone's code.

@sminnee
Copy link
Member

sminnee commented Oct 20, 2016

@chillu I've drafted a voting ticket that we can post on twitter and silverstripe-dev: #6207

Any objections to the content, anyone?

@gregsmirnov
Copy link

gregsmirnov commented Oct 21, 2016

There should be a way to customise localised data anyway, Zend 1 libraries were great for customisation.
For example when more countries in Europe have changed to Euro currency in 2004, CLDR database was outdated. Or bugs in ICU/CLDR, e.g. http://unicode.org/cldr/trac/ticket/9747

There is new development in Symfony: Simplify access to CLDR/ICU data

@chillu
Copy link
Member

chillu commented Oct 24, 2016

I've updated this RFC to limit discussion scope to date/number localisation, and leave the related message translation discussion to it's own RFC: #6221

@sminnee
Copy link
Member

sminnee commented Oct 24, 2016

So it sounds like symfony/translation might be a better bet than ZF2?

@sminnee
Copy link
Member

sminnee commented Oct 27, 2016

OK the results of the vote on #6207 are that we shouldn't have too much of a problem adding php-intl as a requirement for ss4.

@sminnee sminnee modified the milestones: CMS 4.0.0-beta1, CMS 4.0.0-alpha4 Oct 31, 2016
@craigh
Copy link

craigh commented Jan 3, 2017

I'm wondering how you solved the text direction (LTR or RTL) issue. I don't see a solution in pnp-intl or symfony.

@tractorcow
Copy link
Contributor Author

@craigh I've added a note; Since it's not an explicit acceptance criteria at this stage (although I'd like it to be) I'll raise it with @sminnee when we come to writing the stories for implementation.

@sminnee sminnee modified the milestones: CMS 4.0.0-alpha5, CMS 4.0.0-beta1 Jan 11, 2017
@tractorcow
Copy link
Contributor Author

I'm wondering how you solved the text direction (LTR or RTL) issue. I don't see a solution in pnp-intl or symfony.

@craigh I just coded a list of RTL locales in a config var in i18n. It's not perfect, but the list is short enough we can just list them.

@craigh
Copy link

craigh commented Jan 25, 2017

👍

@dhensby
Copy link
Contributor

dhensby commented Jan 26, 2017

can this be closed?

@tractorcow
Copy link
Contributor Author

We will close this RFC with the above being merged.

@tractorcow
Copy link
Contributor Author

PR at #6607

@chillu
Copy link
Member

chillu commented Feb 15, 2017

Created a follow up issue with #6626

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants