Skip to content

Unicode Isolation

Staś Małolepszy edited this page Oct 25, 2018 · 3 revisions

The FluentBundle constructor takes the useIsolating option which controls whether placeables should be wrapped in the Unicode FSI and PDI isolation marks. The option is set to true by default which can . This article provides t

Why is isolation needed?

Most of the time, the Unicode BiDi Algorithm handles bi-directional text very well. In some rare cases, however, it needs a little bit of help. This happens sometimes when a run of RTL text is embedded into a larger run of LTR text, or the other way round. It's due to the fact that some characters accord their directionality with the characters around them.

Each character has an implicit bidirectional type. The bidirectional types left-to-right and right-to-left are called strong types, and characters of those types are called strong directional characters. The bidirectional types associated with numbers are called weak types, and characters of those types are called weak directional characters. With the exception of the directional formatting characters, the remaining bidirectional types and characters are called neutral. The algorithm uses the implicit bidirectional types of the characters in a text to arrive at a reasonable display ordering for text.

If the embedded text starts or ends with a weak character (such as /, : or !), the algorithm needs help to determine if the character should inherit the directionality of the embedded text, or the directionality of the text around the embedding.

What does it look like?

Assume the following Fluent message which receives a URL as an argument from the app:

# In English
privacy-more = Visit {$PRIVACY_URL} to learn more.
# In Arabic (courtesy of Google Translate)
privacy-more = تفضل بزيارة {$PRIVACY_URL} لمعرفة المزيد.

Let the URL end with a forward slash, like so:

{"PRIVACY_URL": "https://www.mozilla.org/privacy/"}

Without isolation, the rendered result shows the trailing slash on the wrong side of the URL:

تفضل بزيارة https://www.mozilla.org/privacy/ لمعرفة المزيد. ❌

With isolation, the forward slash is displayed correctly:

تفضل بزيارة ⁨https://www.mozilla.org/privacy/⁩ لمعرفة المزيد. ✅

Click here to verify if your browser renders the example correctly. For reference, here's what the above two examples should look like:

Why is it on by default?

useIsolating is set to true by default for two reasons:

  1. The Unicode Consortium recommends that isolation be used as the default for all future inline bidirectional text embeddings. From UAX #9:

    [T]he use of the directional isolates instead of embeddings is encouraged in new documents – once target platforms are known to support them.

    The W3C recommends the same practice in situations when markup cannot be used.

  2. For HTML, where markup is available, both the Unicode Consortium and the W3C recommend using the <bdi> element or the dir="auto" attribute. However, fluent.js can't always rely on these to be available. Translations formatted by Fluent might be displayed to the user in other environments than HTML, even if they're still using JS. Think: node.js, alert() or React, which would require the use of dangerouslySetInnerHTML to allow arbitrary HTML markup in translations.

Can I turn it off?

If useIsolation is set to false, there's a risk that some translations won't look correctly in bidi scenarios. For RTL languages (Arabic, Hebrew, Persian and more), this happens when interpolated variables are LTR and start or end with weak characters (URLs, book or website titles, citations, Wi-Fi network names, addon names, etc.). For LTR languages, this happens when the interpolated variables are RTL and have leading or trailing weak characters (names and title of all sorts are the most common use-case here).

If all translations are formatted on the server side and inserted into HTML templates, turning useIsolating off and using markup instead might be a good alternative:

privacy-more = Visit <bdi>{$PRIVACY_URL}</bdi> to learn more.

This will have the same effect but it requires a bit more work. Ideally, every translation using interpolations would be reviewed to decide if it needs <bdi> or not in en-US. A group comment within the Fluent file could also be used to tell localizers that they can:

  1. Insert <bdi> around interpolations, or
  2. Add dir="auto" to elements already found in the translations, e.g. <span class="bold">{$featuredBreach}</span>.

Group comments start with two hashes (##) and Pontoon shows them next to every message in the group, i.e. until the next group comment.