From 87d0463aaca0431357ae181636013401eb2de302 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 27 Mar 2024 09:29:17 -0700 Subject: [PATCH 01/21] [DESIGN] Bidi usability Addresses #746. DO NOT REVIEW YET --- exploration/bidi-usability.md | 90 +++++++++++++++++++++++++++++++++++ 1 file changed, 90 insertions(+) create mode 100644 exploration/bidi-usability.md diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md new file mode 100644 index 000000000..261c8be7f --- /dev/null +++ b/exploration/bidi-usability.md @@ -0,0 +1,90 @@ +# Bidi Usability + +Status: **Proposed** + +
+ Metadata +
+
Contributors
+
@aphillips
+
First proposed
+
2024-03-27
+
Pull Requests
+
#000
+
+
+ +## Objective + +_What is this proposal trying to achieve?_ + +The MessageFormat v2 syntax uses whitespace as a required delimiter +as well as permitting the use of whitespace to make _messages_ easier to read. +In addition, a _message_ can include bidirectional text in identifiers and literal values. + +MessageFormat's syntax also uses a variety of "sigils" and markers to form the structure of a _message_. +These sigils are ASCII punctuation characters that have neutral directionality. +This means that the inclusion of right-to-left ("RTL") identifiers or literals in a _message_ +can result in the syntax looking "scrambled" or, in extreme cases, appearing to have a different meaning +due to [spillover](https://www.w3.org/TR/i18n-glossary/#dfn-spillover-effects). + +To prevent spillover effects and to allow users (particularly RTL language users) +to author _messages_ in a straightforward way, we want to allow the syntax to include appropriate +bidirectional support and to recommend to tool and translation technology implementers +mechanisms to make _messages_ that include RTL characters easy to work with +without introducing spoofing or "Trojan Source" attack vectors. + +## Background + +_What context is helpful to understand this proposal?_ + +If you are unfamiliar with bidirectional or right-to-left text, there is a basic introduction +[here](https://www.w3.org/International/articles/inline-bidi-markup/uba-basics). + +## Use-Cases + +_What use-cases do we see? Ideally, quote concrete examples._ + +## Requirements + +_What properties does the solution have to manifest to enable the use-cases above?_ + +To prevent RTL _literals_ from having spillover effects with surrounding syntax, +it should be possible to bidi isolate a _quoted_ or _unquoted_ _literal_. + +To prevent _patterns_ from having spillover effects with other parts of a _message_, +particularly with _keys_ in a _variant_, +it should be possible to bidi isolate a _quoted-pattern_. + +To prevent _placeholders_ or _expressions_ from having spillover effects with other parts of a _message_ +it should be possible to bidi isolate the contents of an _expression_. + +To prevent RTL identifiers from having spillover effects with other parts of an _expression_, +it should be possible to include "local effect" bidi controls following an _identifier_, +_name_, +_option value_, +or _literal_. +These controls must not be included into the _identifier_, _name_, _option value_, or _literal_, +that is, it must be possible to distinguish these characters from the value in question. + +## Constraints + +_What prior decisions and existing conditions limit the possible design?_ + +Users cannot be expected to create or manage bidirectional controls or +marks in _messages_, since the characters are invisible and can be difficult +to manage. +Tools (such as resource editors or translation editors) +and other implementations of MessageFormat 2 serialization are strongly +encouraged to provide paired isolates around any right-to-left +syntax as described in this design so that _messages_ display appropriately as plain text. + +## Proposed Design + +_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ + +## Alternatives Considered + +_What other solutions are available?_ +_How do they compare against the requirements?_ +_What other properties they have?_ From 5a752ec34ef1992b054e109bafe665866ff062e7 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 27 Mar 2024 09:34:23 -0700 Subject: [PATCH 02/21] Update bidi-usability.md --- exploration/bidi-usability.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 261c8be7f..bdc502104 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -52,10 +52,20 @@ _What properties does the solution have to manifest to enable the use-cases abov To prevent RTL _literals_ from having spillover effects with surrounding syntax, it should be possible to bidi isolate a _quoted_ or _unquoted_ _literal_. +>``` +> .local $title = {|البحرين مصر الكويت!|} +> .local $egypt = {مصر :string} +>``` + To prevent _patterns_ from having spillover effects with other parts of a _message_, particularly with _keys_ in a _variant_, it should be possible to bidi isolate a _quoted-pattern_. +>``` +> .match {$foo :string} +> isolate {{البحرين مصر الكويت!}} +>``` + To prevent _placeholders_ or _expressions_ from having spillover effects with other parts of a _message_ it should be possible to bidi isolate the contents of an _expression_. From 280d520d7835911da289da749307b5f7c1b9315f Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 27 Mar 2024 12:13:49 -0700 Subject: [PATCH 03/21] Add more examples and some use cases --- exploration/bidi-usability.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index bdc502104..876c17667 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -45,6 +45,27 @@ If you are unfamiliar with bidirectional or right-to-left text, there is a basic _What use-cases do we see? Ideally, quote concrete examples._ +Presentation of keys can change if values are not isolated: +``` +.match {$م2صر :string}{$num :integer} +م2صر 0 {{The {$م2صر} is actually the first key}} +م2صر * {{This one appears okay}} +``` + +Presentation in an expression can change if values are not isolated or restore LTR order: +> In the following example, we use the same string with a number inserted into the middle of +> the string to make the bidi effects visible. +> The numbers correspond to: +> 1. operand +> 2. function +> 3. option name +> 4. option value + +``` +You have {$م1صر :م2صر م3صر=م4صر} <- no controls +You have {$م1صر‎ :م2صر‎ م3صر‎=م4صر‎} <- LRM after each RTL token +``` + ## Requirements _What properties does the solution have to manifest to enable the use-cases above?_ @@ -69,6 +90,10 @@ it should be possible to bidi isolate a _quoted-pattern_. To prevent _placeholders_ or _expressions_ from having spillover effects with other parts of a _message_ it should be possible to bidi isolate the contents of an _expression_. +>``` +> You can find it in {$مصر}. +>``` + To prevent RTL identifiers from having spillover effects with other parts of an _expression_, it should be possible to include "local effect" bidi controls following an _identifier_, _name_, @@ -77,6 +102,10 @@ or _literal_. These controls must not be included into the _identifier_, _name_, _option value_, or _literal_, that is, it must be possible to distinguish these characters from the value in question. +>``` +> You can use {$م1صر‎ :م2صر‎ م3صر‎=م4صر‎} +>``` + ## Constraints _What prior decisions and existing conditions limit the possible design?_ From d98dd7198f437fd671fac1d5937a1d80b2f06b0e Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 27 Mar 2024 12:31:42 -0700 Subject: [PATCH 04/21] Update bidi-usability.md --- exploration/bidi-usability.md | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 876c17667..1094fe8f4 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -87,8 +87,9 @@ it should be possible to bidi isolate a _quoted-pattern_. > isolate {{البحرين مصر الكويت!}} >``` -To prevent _placeholders_ or _expressions_ from having spillover effects with other parts of a _message_ -it should be possible to bidi isolate the contents of an _expression_. +To prevent _markup_, _placeholders_, or _expressions_ from having spillover effects +with other parts of a _message_ +it should be possible to bidi isolate the contents of a _markup_ or an _expression_. >``` > You can find it in {$مصر}. @@ -122,6 +123,18 @@ syntax as described in this design so that _messages_ display appropriately as p _Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ +Permit isolating bidi controls to be used on the **outside** of the following: +- unquoted literals +- quoted literals +- quoted patterns + +Permit the use of LRM or RLM controls immediately following: +- name (note that this includes _identifiers_ as well as names of + _functions_, _variables_, and _unquoted_ literals + +> The one tricky part with `name` is whether we permit it between the `namespace` and `name` +> part of an `identifier`. + ## Alternatives Considered _What other solutions are available?_ From d6e3b387675bcc9594b77c252499298acba5b514 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 27 Mar 2024 14:00:46 -0700 Subject: [PATCH 05/21] Add ABNF changes and alternative designs --- exploration/bidi-usability.md | 94 +++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 1094fe8f4..61a149785 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -41,6 +41,52 @@ _What context is helpful to understand this proposal?_ If you are unfamiliar with bidirectional or right-to-left text, there is a basic introduction [here](https://www.w3.org/International/articles/inline-bidi-markup/uba-basics). +MessageFormat _message_ strings are created and edited primarily by humans. +The original _message_ is often written by a software developer or user experience designer. +Translators need to work with the target-language versions of each _message_. +Like many templating or domain-specific languages, MFv2 uses neutrally-directional symbols +to form portions of the syntax. +When the _message_ contains right-to-left (RTL) translations or uses values that are RTL, +the plain-text of the message and the Unicode Bidirectional Algorithm (UBA, UAX#9) +interact in ways that make the _message_ unintelligible or difficult to parse visually. + +Machines do not have a problem parsing _messages_ that contain RTL characters, +but users need to be able to discern what a _message_ does, +what _variant_ will be selected, +or what a _placeholder_ will evaluate into. + +In addition, it is possible to construct messages that use bidi characters to spoof +users into believing that a _message_ does something different than what it actually does. + +The current syntax does not permit bidi controls in _name_ tokens, +_unquoted_ literal values, +or in the whitespace portions of a _message_. + +Permitting the **isolate** controls and the standalone strongly-directional markers +would enable tools, including translation tools, and users who speak RTL languages +to format a _message_ so that it's plain-text representation and its function +are unambiguous. + +The isolate controls are paired invisible control characters inserted around a portion of a string. +The start of an isolate sequence is one of: +- U+2066 LEFT-TO-RIGHT ISOLATE (LRI) +- U+2067 RIGHT-TO-LEFT ISOALTE (RLI) +- U+2068 FIRST-STRONG ISOLATE (FSI) + +The end of an isolate sequence is U+2069 POP DIRECTIONAL ISOLATE (PDI). + +The characters inside an isolate sequence have the initial string (paragraph) direction +corresponding to the starting control (LTR for LRI, RTL for RLI, auto for FSI). +The isolate sequence is **isolated** from surrounding text. +This means that the surrounding text treats it as-if the sequence were a single neutral character. + +> [!NOTE] +> One of the side-effects of using `{`/`}` and `{{`/`}}` to delimit _expressions_ +> and _patterns_ is that these paired enclosing punctuations provide a measure of +> isolation in UBA. +> This is an additional reason not to change over to quote marks (which are not enclosing) +> around patterns. + ## Use-Cases _What use-cases do we see? Ideally, quote concrete examples._ @@ -128,6 +174,22 @@ Permit isolating bidi controls to be used on the **outside** of the following: - quoted literals - quoted patterns +This would change the ABNF as follows: +```abnf +literal = ( open-isolate (quoted / unquoted) close-isolate) + / (quoted / unquoted) +quoted-pattern = ( open-isolate "{{" pattern "}}" close-isolate) + / ("{{" pattern "}}") + +open-isolate = %x2066-2068 +close-isolate = %x2069 +``` + +> [!IMPORTANT] +> The isolating controls go on the **_outside_** of the various _literal_ and _pattern_ +> productions because characters on the **_inside_** of these are part of the normal text. +> We need to allow users to include bidi controls in the output of MFv2. + Permit the use of LRM or RLM controls immediately following: - name (note that this includes _identifiers_ as well as names of _functions_, _variables_, and _unquoted_ literals @@ -135,8 +197,40 @@ Permit the use of LRM or RLM controls immediately following: > The one tricky part with `name` is whether we permit it between the `namespace` and `name` > part of an `identifier`. +This would change the ABNF as follows: +```abnf +namespace = name-start *name-char ; same as name but lacks bidi close +name = name-start *name-char [%x200E-200F] +``` + +> [!NOTE] +> Ideally we do not want RLM/LRM to be part of the `name` or part of any +> production that consumes `name` (such as `variable`, `reserved-keyword`, or `unquoted`). +> This is complicated to do in ABNF because each of these tokens is followed either by +> whitespace or by some closing marker such as `}`. +> The workaround in #763 is to permit these characters _before_ or _after_ whitespace +> using the various whitespace productions. +> This works at the cost of allowing spurious markers. + ## Alternatives Considered _What other solutions are available?_ _How do they compare against the requirements?_ _What other properties they have?_ + +### Nothing +We could do nothing. + +A likely outcome of doing nothing is that RTL users would insert bidi controls into +_messages_ in an attempt to make the _pattern_ and/or _placeholders_ to display correctly. +These controls would become part of the output of the _message_, +showing up inappropriately at runtime. +Because these characters are invisible, users might be very frustrated trying to manage +the results or debug what is wrong with their messages. + +By contrast, if users insert too many or the wrong controls using the recommended design, +the _message_ would still be functional and would emit no undesired characters. + +### Deeper Syntax Changes +We could alter the syntax to make it more "bidi robust", +such as by using strongly directional instead of neutrals. From b3298c235307d4592d30c8566aea5ad2434375f0 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 27 Mar 2024 14:26:33 -0700 Subject: [PATCH 06/21] Update exploration/bidi-usability.md Co-authored-by: Mark Davis --- exploration/bidi-usability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 61a149785..4c02ae624 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -64,7 +64,7 @@ or in the whitespace portions of a _message_. Permitting the **isolate** controls and the standalone strongly-directional markers would enable tools, including translation tools, and users who speak RTL languages -to format a _message_ so that it's plain-text representation and its function +to format a _message_ so that its plain-text representation and its function are unambiguous. The isolate controls are paired invisible control characters inserted around a portion of a string. From 1086487781afbe62db74faef09563a090869dc10 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Wed, 27 Mar 2024 14:26:42 -0700 Subject: [PATCH 07/21] Update exploration/bidi-usability.md Co-authored-by: Mark Davis --- exploration/bidi-usability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 4c02ae624..53682d7da 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -126,7 +126,7 @@ it should be possible to bidi isolate a _quoted_ or _unquoted_ _literal_. To prevent _patterns_ from having spillover effects with other parts of a _message_, particularly with _keys_ in a _variant_, -it should be possible to bidi isolate a _quoted-pattern_. +it should be possible to bidi-isolate a _quoted-pattern_. >``` > .match {$foo :string} From 83e9d0f469d6021f8886a154c56ffd419d0851fd Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Thu, 28 Mar 2024 07:22:49 -0700 Subject: [PATCH 08/21] Update exploration/bidi-usability.md Co-authored-by: Tim Chevalier --- exploration/bidi-usability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 53682d7da..f4b8e9be7 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -70,7 +70,7 @@ are unambiguous. The isolate controls are paired invisible control characters inserted around a portion of a string. The start of an isolate sequence is one of: - U+2066 LEFT-TO-RIGHT ISOLATE (LRI) -- U+2067 RIGHT-TO-LEFT ISOALTE (RLI) +- U+2067 RIGHT-TO-LEFT ISOLATE (RLI) - U+2068 FIRST-STRONG ISOLATE (FSI) The end of an isolate sequence is U+2069 POP DIRECTIONAL ISOLATE (PDI). From 0f52131dc7a3fb2e5c20080f3c43d013ef72bb38 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Thu, 28 Mar 2024 07:55:29 -0700 Subject: [PATCH 09/21] Add additional user stories, clean up MF2 mentions, add ALM --- exploration/bidi-usability.md | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index f4b8e9be7..1f17d6512 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -18,7 +18,7 @@ Status: **Proposed** _What is this proposal trying to achieve?_ -The MessageFormat v2 syntax uses whitespace as a required delimiter +The MessageFormat 2 syntax uses whitespace as a required delimiter as well as permitting the use of whitespace to make _messages_ easier to read. In addition, a _message_ can include bidirectional text in identifiers and literal values. @@ -44,7 +44,7 @@ If you are unfamiliar with bidirectional or right-to-left text, there is a basic MessageFormat _message_ strings are created and edited primarily by humans. The original _message_ is often written by a software developer or user experience designer. Translators need to work with the target-language versions of each _message_. -Like many templating or domain-specific languages, MFv2 uses neutrally-directional symbols +Like many templating or domain-specific languages, MF2 uses neutrally-directional symbols to form portions of the syntax. When the _message_ contains right-to-left (RTL) translations or uses values that are RTL, the plain-text of the message and the Unicode Bidirectional Algorithm (UBA, UAX#9) @@ -91,14 +91,14 @@ This means that the surrounding text treats it as-if the sequence were a single _What use-cases do we see? Ideally, quote concrete examples._ -Presentation of keys can change if values are not isolated: +1. Presentation of keys can change if values are not isolated: ``` .match {$م2صر :string}{$num :integer} م2صر 0 {{The {$م2صر} is actually the first key}} م2صر * {{This one appears okay}} ``` -Presentation in an expression can change if values are not isolated or restore LTR order: +2. Presentation in an expression can change if values are not isolated or restore LTR order: > In the following example, we use the same string with a number inserted into the middle of > the string to make the bidi effects visible. > The numbers correspond to: @@ -112,6 +112,14 @@ You have {$م1صر :م2صر م3صر=م4صر} <- no controls You have {$م1صر‎ :م2صر‎ م3صر‎=م4صر‎} <- LRM after each RTL token ``` +3. As a developer or translator, I want to make RTL literal or names appear correctly + in my plain-text editing environment. + I don't want to have to manage a lot of paired controls, when I can get the right effect using + strongly directional mark characters (LRM, RLM, ALM) + +4. As a translation tool or MF2 implementation, I want to automatically generate normalized + _messages_ that display correctly in RTL languages or containing RTL substrings with minimal user intervention. + ## Requirements _What properties does the solution have to manifest to enable the use-cases above?_ @@ -188,9 +196,9 @@ close-isolate = %x2069 > [!IMPORTANT] > The isolating controls go on the **_outside_** of the various _literal_ and _pattern_ > productions because characters on the **_inside_** of these are part of the normal text. -> We need to allow users to include bidi controls in the output of MFv2. +> We need to allow users to include bidi controls in the output of MF2. -Permit the use of LRM or RLM controls immediately following: +Permit the use of LRM, RLM, or ALM controls immediately following: - name (note that this includes _identifiers_ as well as names of _functions_, _variables_, and _unquoted_ literals @@ -200,11 +208,11 @@ Permit the use of LRM or RLM controls immediately following: This would change the ABNF as follows: ```abnf namespace = name-start *name-char ; same as name but lacks bidi close -name = name-start *name-char [%x200E-200F] +name = name-start *name-char [%x200E-200F / %x061C] ``` > [!NOTE] -> Ideally we do not want RLM/LRM to be part of the `name` or part of any +> Ideally we do not want RLM/LRM/ALM to be part of the `name` or part of any > production that consumes `name` (such as `variable`, `reserved-keyword`, or `unquoted`). > This is complicated to do in ABNF because each of these tokens is followed either by > whitespace or by some closing marker such as `}`. From 308fc059a8f983ee4b3c370790d2417ac238f4a2 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Fri, 29 Mar 2024 11:58:22 -0700 Subject: [PATCH 10/21] Update exploration/bidi-usability.md Co-authored-by: Tim Chevalier --- exploration/bidi-usability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 1f17d6512..e3c6fd263 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -53,7 +53,7 @@ interact in ways that make the _message_ unintelligible or difficult to parse vi Machines do not have a problem parsing _messages_ that contain RTL characters, but users need to be able to discern what a _message_ does, what _variant_ will be selected, -or what a _placeholder_ will evaluate into. +or what a _placeholder_ will evaluate to. In addition, it is possible to construct messages that use bidi characters to spoof users into believing that a _message_ does something different than what it actually does. From 125a7ae0f05a83dd8408515884e2b782e80f0cdc Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Fri, 29 Mar 2024 12:06:33 -0700 Subject: [PATCH 11/21] Update exploration/bidi-usability.md Co-authored-by: Tim Chevalier --- exploration/bidi-usability.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index e3c6fd263..d444e0038 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -212,8 +212,9 @@ name = name-start *name-char [%x200E-200F / %x061C] ``` > [!NOTE] -> Ideally we do not want RLM/LRM/ALM to be part of the `name` or part of any -> production that consumes `name` (such as `variable`, `reserved-keyword`, or `unquoted`). +> Ideally we do not want RLM/LRM/ALM to be part of the parsed +> `name`, `variable`, `reserved-keyword`, `unquoted`, or any other term +> defined in terms of `name`. > This is complicated to do in ABNF because each of these tokens is followed either by > whitespace or by some closing marker such as `}`. > The workaround in #763 is to permit these characters _before_ or _after_ whitespace From 239f9ed547cad5a7c8c691b4d017b0a7ba48614f Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Fri, 29 Mar 2024 12:08:06 -0700 Subject: [PATCH 12/21] Update exploration/bidi-usability.md Co-authored-by: Tim Chevalier --- exploration/bidi-usability.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index d444e0038..8b0d4bf42 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -231,7 +231,7 @@ _What other properties they have?_ We could do nothing. A likely outcome of doing nothing is that RTL users would insert bidi controls into -_messages_ in an attempt to make the _pattern_ and/or _placeholders_ to display correctly. +_messages_ in an attempt to make the _pattern_ and/or _placeholders_ display correctly. These controls would become part of the output of the _message_, showing up inappropriately at runtime. Because these characters are invisible, users might be very frustrated trying to manage From 4cf35cf5c4ef027fccc84802f3e53097793f49b6 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Fri, 29 Mar 2024 12:24:24 -0700 Subject: [PATCH 13/21] Address comments - replace the ambiguous term `value` with unambiguous terms (note that the term value remains for cases where we mean value) - add @eemeli's alternative considered --- exploration/bidi-usability.md | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 8b0d4bf42..66fc1bbf8 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -20,7 +20,7 @@ _What is this proposal trying to achieve?_ The MessageFormat 2 syntax uses whitespace as a required delimiter as well as permitting the use of whitespace to make _messages_ easier to read. -In addition, a _message_ can include bidirectional text in identifiers and literal values. +In addition, a _message_ can include bidirectional text in identifiers and literals. MessageFormat's syntax also uses a variety of "sigils" and markers to form the structure of a _message_. These sigils are ASCII punctuation characters that have neutral directionality. @@ -46,9 +46,10 @@ The original _message_ is often written by a software developer or user experien Translators need to work with the target-language versions of each _message_. Like many templating or domain-specific languages, MF2 uses neutrally-directional symbols to form portions of the syntax. -When the _message_ contains right-to-left (RTL) translations or uses values that are RTL, +When the _message_ contains right-to-left (RTL) characters in translations or +in portions of the syntax, the plain-text of the message and the Unicode Bidirectional Algorithm (UBA, UAX#9) -interact in ways that make the _message_ unintelligible or difficult to parse visually. +can interact in ways that make the _message_ unintelligible or difficult to parse visually. Machines do not have a problem parsing _messages_ that contain RTL characters, but users need to be able to discern what a _message_ does, @@ -59,11 +60,11 @@ In addition, it is possible to construct messages that use bidi characters to sp users into believing that a _message_ does something different than what it actually does. The current syntax does not permit bidi controls in _name_ tokens, -_unquoted_ literal values, +_unquoted_ literals, or in the whitespace portions of a _message_. Permitting the **isolate** controls and the standalone strongly-directional markers -would enable tools, including translation tools, and users who speak RTL languages +would enable tools, including translation tools, and users who are writing in RTL languages to format a _message_ so that its plain-text representation and its function are unambiguous. @@ -91,14 +92,15 @@ This means that the surrounding text treats it as-if the sequence were a single _What use-cases do we see? Ideally, quote concrete examples._ -1. Presentation of keys can change if values are not isolated: +1. Presentation of _keys_ can change if the text of the _key's_ _literal_ is not isolated: ``` .match {$م2صر :string}{$num :integer} م2صر 0 {{The {$م2صر} is actually the first key}} م2صر * {{This one appears okay}} ``` -2. Presentation in an expression can change if values are not isolated or restore LTR order: +2. Presentation in an expression can change if portions of the expression + are not isolated or do not restore LTR order: > In the following example, we use the same string with a number inserted into the middle of > the string to make the bidi effects visible. > The numbers correspond to: @@ -155,7 +157,8 @@ _name_, _option value_, or _literal_. These controls must not be included into the _identifier_, _name_, _option value_, or _literal_, -that is, it must be possible to distinguish these characters from the value in question. +that is, it must be possible to distinguish these characters from the identifier, +name, value, or literal in question. >``` > You can use {$م1صر‎ :م2صر‎ م3صر‎=م4صر‎} @@ -243,3 +246,16 @@ the _message_ would still be functional and would emit no undesired characters. ### Deeper Syntax Changes We could alter the syntax to make it more "bidi robust", such as by using strongly directional instead of neutrals. + +### Forbid RTL characters in `name` and/or `unquoted` +We could alter the syntax to forbid using RTL characters in names and unquoted literals. +This would make the syntax consist solely of LTR and neutral characters. +One flavor of this would be to restrict tokens to US ASCII. + +Cons: +- This would break compatibility with NCName/QName; we would be back to + defining our own idiosyncratic namespace +- Unicode could define more RTL characters in the future, making the syntax + brittle +- This is not friendly to non-English/non-Latin users and represents a usability + restriction in environments in which names can be non-ASCII values From b5e602ee0af84a23e3e274550c26deea22b1d08d Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Fri, 29 Mar 2024 14:41:46 -0700 Subject: [PATCH 14/21] Update bidi-usability.md - add definitions for LRM/RLM/ALM - clarify all instances of value - remove the word 'normalize' - add example of namespace spillover --- exploration/bidi-usability.md | 47 ++++++++++++++++++++++++++++------- 1 file changed, 38 insertions(+), 9 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 66fc1bbf8..878a108a7 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -88,6 +88,17 @@ This means that the surrounding text treats it as-if the sequence were a single > This is an additional reason not to change over to quote marks (which are not enclosing) > around patterns. +This design also allows for the use of strongly directional marker characters. +These include: +- U+200E LEFT-TO-RIGHT MARK (LRM) +- U+200F RIGHT-TO-LEFT MARK (RLM) +- U+061C ARABIC LETTER MARK (ALM) + +These characters are invisible strongly-directional characters used in bidirectional +text to coerce certain directional behavior (usually to mark the end of +a sequence of characters that would otherwise be ambiguous or interact with +neutrals or opposite direction runs in an unhelpful way). + ## Use-Cases _What use-cases do we see? Ideally, quote concrete examples._ @@ -99,6 +110,13 @@ _What use-cases do we see? Ideally, quote concrete examples._ م2صر * {{This one appears okay}} ``` +> ![NOTE] +> The first _variant_ in the use case above is actually: +>``` +> \u06452\u0635\u0631 0 {{The {$\u06452\u0635\u0631} is actually the first key}} +>``` + + 2. Presentation in an expression can change if portions of the expression are not isolated or do not restore LTR order: > In the following example, we use the same string with a number inserted into the middle of @@ -119,8 +137,8 @@ You have {$م1صر‎ :م2صر‎ م3صر‎=م4صر‎} <- LRM after each RTL t I don't want to have to manage a lot of paired controls, when I can get the right effect using strongly directional mark characters (LRM, RLM, ALM) -4. As a translation tool or MF2 implementation, I want to automatically generate normalized - _messages_ that display correctly in RTL languages or containing RTL substrings with minimal user intervention. +4. As a translation tool or MF2 implementation, I want to automatically generate + _messages_ which display correctly when they contain RTL text or substring with minimal user intervention. ## Requirements @@ -158,7 +176,7 @@ _option value_, or _literal_. These controls must not be included into the _identifier_, _name_, _option value_, or _literal_, that is, it must be possible to distinguish these characters from the identifier, -name, value, or literal in question. +name, option value, or literal in question. >``` > You can use {$م1صر‎ :م2صر‎ م3صر‎=م4صر‎} @@ -201,12 +219,13 @@ close-isolate = %x2069 > productions because characters on the **_inside_** of these are part of the normal text. > We need to allow users to include bidi controls in the output of MF2. -Permit the use of LRM, RLM, or ALM controls immediately following: -- name (note that this includes _identifiers_ as well as names of - _functions_, _variables_, and _unquoted_ literals - -> The one tricky part with `name` is whether we permit it between the `namespace` and `name` -> part of an `identifier`. +Permit the use of LRM, RLM, or ALM controls immediately following any of the items that +**end** with the `name` production the ABNF. +This includes _identifiers_ found in the names of +_functions_ +and _options_, +plus the names of _variables_, +as well as the contents of _unquoted_ literals. This would change the ABNF as follows: ```abnf @@ -214,6 +233,16 @@ namespace = name-start *name-char ; same as name but lacks bidi close name = name-start *name-char [%x200E-200F / %x061C] ``` +> The one tricky part with `name` is whether we permit it between the `namespace` and `name` +> part of an `identifier`. +> Consider: +>``` +> {$a1 :b2:c3} +> {⁦$م1‎ :م2:ن⁩3‎} +>``` +> Notice that the namespace is `:م2` and the name is `:ن⁩3`, but the sequence is displayed +> with a spillover effect. + > [!NOTE] > Ideally we do not want RLM/LRM/ALM to be part of the parsed > `name`, `variable`, `reserved-keyword`, `unquoted`, or any other term From 405810a7dcd138606cf86133df68939d20924c2d Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Fri, 29 Mar 2024 14:56:30 -0700 Subject: [PATCH 15/21] Update bidi-usability.md --- exploration/bidi-usability.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 878a108a7..cb2dc2ab8 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -110,7 +110,8 @@ _What use-cases do we see? Ideally, quote concrete examples._ م2صر * {{This one appears okay}} ``` -> ![NOTE] +> [!NOTE] +> > The first _variant_ in the use case above is actually: >``` > \u06452\u0635\u0631 0 {{The {$\u06452\u0635\u0631} is actually the first key}} @@ -241,7 +242,7 @@ name = name-start *name-char [%x200E-200F / %x061C] > {⁦$م1‎ :م2:ن⁩3‎} >``` > Notice that the namespace is `:م2` and the name is `:ن⁩3`, but the sequence is displayed -> with a spillover effect. +> with a spillover effect (the number, in each case, _trails_ the Arabic letter). > [!NOTE] > Ideally we do not want RLM/LRM/ALM to be part of the parsed From dab39485b8a10bddfa721fd630ac01fc0ed0a1be Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 30 Mar 2024 08:38:14 -0700 Subject: [PATCH 16/21] change the LRM/RLM/ALM approach --- exploration/bidi-usability.md | 69 ++++++++++++++++++++++------------- 1 file changed, 44 insertions(+), 25 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index cb2dc2ab8..0739292b4 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -183,6 +183,18 @@ name, option value, or literal in question. > You can use {$م1صر‎ :م2صر‎ م3صر‎=م4صر‎} >``` +To prevent RTL _namespace_ names from having spillover effects with _function_ names, +it should be possible to include "local effect" strongly directional marks in an _identifier_: +> In this example, the _namespace_ is `:م2` and the _name_ is `:ن⁩3`, but the sequence is displayed +> with a spillover effect. +> (Note that the number in each name _trails_ the Arabic letter: it appears to the left because the +> string is RTL!). +>``` +> {$a1 :b2:c3} +> {⁦$م1‎ :م2:ن⁩3‎} +>``` + + ## Constraints _What prior decisions and existing conditions limit the possible design?_ @@ -195,6 +207,15 @@ and other implementations of MessageFormat 2 serialization are strongly encouraged to provide paired isolates around any right-to-left syntax as described in this design so that _messages_ display appropriately as plain text. +Ideally we do not want RLM/LRM/ALM to be part of the parsed +`name`, `variable`, `reserved-keyword`, `unquoted`, or any other term +defined in terms of `name`. +This is complicated to do in ABNF because each of these tokens is followed either by +whitespace or by some closing marker such as `}`. +The workaround in #763 was to permit these characters _before_ or _after_ whitespace +using the various whitespace productions. +This works at the cost of allowing spurious markers. + ## Proposed Design _Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ @@ -205,9 +226,11 @@ Permit isolating bidi controls to be used on the **outside** of the following: - quoted patterns This would change the ABNF as follows: +(Notice that this change includes a production `bidi` described further down +in this document) ```abnf -literal = ( open-isolate (quoted / unquoted) close-isolate) - / (quoted / unquoted) +literal = ( open-isolate (quoted / (unquoted [bidi])) close-isolate) + / (quoted / (unquoted [bidi])) quoted-pattern = ( open-isolate "{{" pattern "}}" close-isolate) / ("{{" pattern "}}") @@ -228,31 +251,27 @@ and _options_, plus the names of _variables_, as well as the contents of _unquoted_ literals. -This would change the ABNF as follows: -```abnf -namespace = name-start *name-char ; same as name but lacks bidi close -name = name-start *name-char [%x200E-200F / %x061C] -``` - -> The one tricky part with `name` is whether we permit it between the `namespace` and `name` -> part of an `identifier`. -> Consider: ->``` -> {$a1 :b2:c3} -> {⁦$م1‎ :م2:ن⁩3‎} ->``` -> Notice that the namespace is `:م2` and the name is `:ن⁩3`, but the sequence is displayed -> with a spillover effect (the number, in each case, _trails_ the Arabic letter). +> [!NOTE] +> Notice that _unquoted_ literals can also be surrounded by bidi isolates +> using the previous syntax modification just above. > [!NOTE] -> Ideally we do not want RLM/LRM/ALM to be part of the parsed -> `name`, `variable`, `reserved-keyword`, `unquoted`, or any other term -> defined in terms of `name`. -> This is complicated to do in ABNF because each of these tokens is followed either by -> whitespace or by some closing marker such as `}`. -> The workaround in #763 is to permit these characters _before_ or _after_ whitespace -> using the various whitespace productions. -> This works at the cost of allowing spurious markers. +> Notice that `reserved-annotation` is not in the ABNF changes because it already +> permits the marks in question. +> Any syntax derived from `reserved-annotation` +> (i.e. when unreserving a new statement in a future addition) +> would need to handle bidi explicitly using the model already established here. + +```abnf +variable-expression = "{" [s] variable [bidi] [s annotation] *(s attribute) [s] "}" +function = ":" identifier [bidi] *(s option) +option = identifier [bidi] [s] "=" [s] (literal / (variable [bidi]) +attribute = "@" identifier [bidi] [[s] "=" [s] (literal / (variable [bidi])] +markup = "{" [s] "#" identifier [bidi] *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone + / "{" [s] "/" identifier [bidi] *(s option) *(s attribute) [s] "}" ; close +identifier = [(namespace [bidi] ":")] name +bidi = [ %x200E-200F / %x061C ] +``` ## Alternatives Considered From 68b4803b25d570d94b658da23f5698c0a1518416 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 30 Mar 2024 08:42:20 -0700 Subject: [PATCH 17/21] improve namespace example --- exploration/bidi-usability.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 0739292b4..579fabd72 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -191,7 +191,8 @@ it should be possible to include "local effect" strongly directional marks in an > string is RTL!). >``` > {$a1 :b2:c3} -> {⁦$م1‎ :م2:ن⁩3‎} +> {⁦$م1‎ :م2:ن⁩3‎} bad +> {⁦$م1‎ :م2‎:ن3‎⁩} with isolates and LRMs >``` From fd41cceb555662c93d01c8ca9c6783d510ce7f92 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sat, 30 Mar 2024 08:50:22 -0700 Subject: [PATCH 18/21] Update bidi-usability.md --- exploration/bidi-usability.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 579fabd72..4b26d82ce 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -191,7 +191,7 @@ it should be possible to include "local effect" strongly directional marks in an > string is RTL!). >``` > {$a1 :b2:c3} -> {⁦$م1‎ :م2:ن⁩3‎} bad +> {$م1 :م2:ن3} spillover effects > {⁦$م1‎ :م2‎:ن3‎⁩} with isolates and LRMs >``` @@ -241,11 +241,12 @@ close-isolate = %x2069 > [!IMPORTANT] > The isolating controls go on the **_outside_** of the various _literal_ and _pattern_ -> productions because characters on the **_inside_** of these are part of the normal text. +> productions because characters on the **_inside_** of these are part of the _literal_'s +> or _pattern_'s textual content. > We need to allow users to include bidi controls in the output of MF2. -Permit the use of LRM, RLM, or ALM controls immediately following any of the items that -**end** with the `name` production the ABNF. +Permit the use of LRM, RLM, or ALM stronly directional marks immediately following any of the items that +**end** with the `name` production in the ABNF. This includes _identifiers_ found in the names of _functions_ and _options_, From 5ac8dd972c1438245d548ea645ccc8ea51a77ca8 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Mon, 8 Apr 2024 14:15:56 -0700 Subject: [PATCH 19/21] Add support for isolates in expressions and markup --- exploration/bidi-usability.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 4b26d82ce..3340f1fa9 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -245,6 +245,23 @@ close-isolate = %x2069 > or _pattern_'s textual content. > We need to allow users to include bidi controls in the output of MF2. +Permit isolating bidi controls to be used **immediately inside** the following: +- expressions +- markup + +This would change the ABNF as follows (assuming the above changes are also incorporated): +```abnf +expression = "{" open-isolate (literal-expression / variable-expression / annotation-expression) close-isolate "}" + / "{" (literal-expression / variable-expression / annotation-expression) "}" +literal-expression = [s] literal [s annotation] *(s attribute) [s] +variable-expression = [s] variable [s annotation] *(s attribute) [s] +annotation-expression = [s] annotation *(s attribute) [s] +markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone + / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close + / "{" open-isolate [s] "#" identifier *(s option) *(s attribute) [s] ["/"] close-isolate "}" ; open and standalone + / "{" open-isolate [s] "/" identifier *(s option) *(s attribute) [s] close-isolate "}" ; close +``` + Permit the use of LRM, RLM, or ALM stronly directional marks immediately following any of the items that **end** with the `name` production in the ABNF. This includes _identifiers_ found in the names of From df1cd1d975ef29cf584b9675998e0aac39dce1ed Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Tue, 9 Apr 2024 13:36:31 -0700 Subject: [PATCH 20/21] Setting line-by-line base direction --- exploration/bidi-usability.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 3340f1fa9..85aff715d 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -195,6 +195,17 @@ it should be possible to include "local effect" strongly directional marks in an > {⁦$م1‎ :م2‎:ن3‎⁩} with isolates and LRMs >``` +Newlines inside of messages should not harm later syntax. + +``` +* * {{\u0645
\u0646}} 123 456 {{ No LRM==bad }} +* * {{م +ن}} 123 456 {{ No LRM==bad }} + +* * {{\u0645
\u0646}}\u200e 123 456 {{ LRM }} +* * {{م +ن}}‎ 123 456 {{ LRM }} +``` ## Constraints @@ -221,6 +232,16 @@ This works at the cost of allowing spurious markers. _Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ +To start with, we should establish that _message_ editing should always use a left-to-right +base direction. +Further, each _line_ of a message should be displayed for editing with a base paragraph direction of LTR. +This is because the syntax of a _message_ depends on LTR word tokens, +as well as token ordering (as in a placeholder or with variant keys). +This is not the disadvantage to RTL languages that it might first appear: +- Bidi inside of patterns works normally; + only placeholders/markup have special usage of bidi controls and this usage is isolated + so that placeholders and markup are treated as neutrals. + Permit isolating bidi controls to be used on the **outside** of the following: - unquoted literals - quoted literals From 2e1419cf127c69af6d36f8967cd541fb0d199cf8 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Sun, 14 Apr 2024 08:35:10 -0700 Subject: [PATCH 21/21] Address comments Make patterns strictly LTR. - Only allow LRI/PDI in _expression_ and _markup_ - Require LTR display/edit - Add an alternative matching my original proposal - Add illustrations of some of the problems with RTL editing --- exploration/bidi-usability.md | 88 +++++++++++++++++++++++++++++------ 1 file changed, 73 insertions(+), 15 deletions(-) diff --git a/exploration/bidi-usability.md b/exploration/bidi-usability.md index 85aff715d..67ca0c0e4 100644 --- a/exploration/bidi-usability.md +++ b/exploration/bidi-usability.md @@ -10,7 +10,7 @@ Status: **Proposed**
First proposed
2024-03-27
Pull Requests
-
#000
+
#754
@@ -232,21 +232,29 @@ This works at the cost of allowing spurious markers. _Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._ -To start with, we should establish that _message_ editing should always use a left-to-right -base direction. -Further, each _line_ of a message should be displayed for editing with a base paragraph direction of LTR. -This is because the syntax of a _message_ depends on LTR word tokens, +Editing and display of a _message_ SHOULD always use a left-to-right base direction +both for the complete text of the _message_ as well as for each line (paragraph) +contained therein. + +We use LTR display because the syntax of a _message_ depends on LTR word tokens, as well as token ordering (as in a placeholder or with variant keys). -This is not the disadvantage to RTL languages that it might first appear: -- Bidi inside of patterns works normally; - only placeholders/markup have special usage of bidi controls and this usage is isolated - so that placeholders and markup are treated as neutrals. + +This is not the disadvantage to right-to-left languages that it might first appear: +- Bidi inside of _patterns_ works normally +- _Placeholders_ and _markup_ are isolated (treated as neutrals) so that they appear + in the correct location in an RTL _pattern_ +- _Expressions_ use isolates and directional marks to display internal tokens in the + correct order and without spillover effects Permit isolating bidi controls to be used on the **outside** of the following: - unquoted literals - quoted literals - quoted patterns +We permit any of the isolate starting controls (LRI, RLI, FSI) because we want to allow +the user to set the base direction of a _literal_ or _pattern_ according to its respective +actual contents. + This would change the ABNF as follows: (Notice that this change includes a production `bidi` described further down in this document) @@ -266,21 +274,27 @@ close-isolate = %x2069 > or _pattern_'s textual content. > We need to allow users to include bidi controls in the output of MF2. -Permit isolating bidi controls to be used **immediately inside** the following: +Permit **left-to-right** isolating bidi controls (`U+2066`...`U+2069`) to be used **immediately inside** the following: - expressions - markup +We only permit the LTR isolates because the contents of an _expression_ +or _markup_ must be laid out left-to-right. +_Literal_ values can be right-to-left isolated within that or use strongly +directional marks to ensure correct display. + This would change the ABNF as follows (assuming the above changes are also incorporated): ```abnf -expression = "{" open-isolate (literal-expression / variable-expression / annotation-expression) close-isolate "}" +expression = "{" LRI (literal-expression / variable-expression / annotation-expression) close-isolate "}" / "{" (literal-expression / variable-expression / annotation-expression) "}" literal-expression = [s] literal [s annotation] *(s attribute) [s] variable-expression = [s] variable [s annotation] *(s attribute) [s] annotation-expression = [s] annotation *(s attribute) [s] -markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone - / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close - / "{" open-isolate [s] "#" identifier *(s option) *(s attribute) [s] ["/"] close-isolate "}" ; open and standalone - / "{" open-isolate [s] "/" identifier *(s option) *(s attribute) [s] close-isolate "}" ; close +markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] ["/"] "}" ; open and standalone + / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}" ; close + / "{" LRI [s] "#" identifier *(s option) *(s attribute) [s] ["/"] close-isolate "}" ; open and standalone + / "{" LRI [s] "/" identifier *(s option) *(s attribute) [s] close-isolate "}" ; close +LRI = %x2066 ``` Permit the use of LRM, RLM, or ALM stronly directional marks immediately following any of the items that @@ -348,3 +362,47 @@ Cons: brittle - This is not friendly to non-English/non-Latin users and represents a usability restriction in environments in which names can be non-ASCII values + +### Allow more permissive use of bidi controls + +We could permit RLI/FSI to be used inside _expressions_ and _markup_. +This would be an advantage for simple _expressions_ containing only or primarily +RTL content. +For example: +``` +{⁧لت-123-م...⁩} // RLI isolated +{لت-123-م...} +``` + +We could also permit users/editors to use RTL base direction for editing. +This is tricky, as the syntax promotes the use of left-to-right runs +that will "stick together" unless isolated. +This is most visible in _selectors_ and _variant_ _keys_. + +Consider this message: +``` +.match {$\u06451\u0645}{$\u06462\u0646} +one two {{normal LTR}} +\u2067one\u2069 \u2067two\u2069 {{RLI around each key}} +\u2066one\u2069 \u2066two\u2069 {{LRI around each key}} +\u0645 \u0646 {{RTL}} +* \u0646 {{star is first}} +\u0645 * {{star is second}} +``` + +In an LTR context the _message_ displays like this (red lines around display errors): +![image](https://github.com/unicode-org/message-format-wg/assets/69082/f19cbf99-94f2-4f36-805b-8da0750bc5f2) + +In an RTL context, there is an equivalent case: +![image](https://github.com/unicode-org/message-format-wg/assets/69082/1b2e1c67-aebc-455b-98e9-99f9e620c543) + +Coercing proper display in both LTR and RTL contexts requires +complex sets of controls. + +**Pros** +- Can provide both LTR and RTL native editing experiences + +**Cons** +- Requires complex sets of bidi controls +- RTL editing/display is mostly a special case; + we already afford the ability to edit RTL in _patterns_ and _literals_