[DESIGN] Bidi usability #754

aphillips · 2024-03-27T16:29:25Z

Addresses #746.

Provide a design for bidi isolation of MFv2 syntax elements, allowing users and tools to provide messages in right-to-left languages that look normal or have minimal disruption from the Unicode Bidirectional Algorithm.

This design does not require the use of isolates or markers, so it is still possible to create messages that look funky.

This design ignores the UAX31 whitespace definition requirements addressed by #673. Note that the design provided here is more closely tailored to ensure isolate sequences, once opened, are closed and that they tightly wrap only the things desired. There is some funkiness around name that we might improve (see the doc).

Addresses #746. DO NOT REVIEW YET

exploration/bidi-usability.md

Co-authored-by: Mark Davis <[email protected]>

eemeli

I didn't add line comments for them, but I continue to note that we should be consistent in calling this spec "MessageFormat 2" or "MF2".

eemeli · 2024-03-27T21:36:38Z

exploration/bidi-usability.md

+Permit the use of LRM or RLM controls immediately following:
+- name (note that this includes _identifiers_ as well as names of
+  _functions_, _variables_, and _unquoted_ literals


Why LRM/RLM rather than isolates?

Why allow for RLM?

Some implementations don't handle bidi isolates well yet.

The LRM and RLM are not stateful, and may be preferred in some circumstances.

We might allow paired isolates inside expressions. It might not be an either/or. Allowing LRM makes it easy to clean up the contents of an expression:

.input {$م1صر :م2صر} <- no bidi controls .input {$م1صر‎ :م2صر} <- one LRM right after the id with 1 in it

I included RLM (and probably should have included ALM U+061C) so that an RTL literal or name that ends with a neutral can display correctly:

{م123+ :foo} <- with no RLM {م123+‏ :foo} <- with RLM after +

Given that we allow a number of neutral direction characters also in name-start, doesn't the same apply to the beginning of the name as well?

From an automation PoV, using FSI/PDI to wrap names seems like it would "just work", whereas LRM/RLM/ALM would require inspecting the contents of the string to figure out what might be needed.

If we are strictly talking about machines doing the "bidi annotation", tightly wrapping with isolates will generally work.

FSI is not always going to get the right results, as not all tokens have the correct direction strongly directional character nearest the front. There is an element of judgement (machines don't have enough information, generally, to decide this, although sometimes they do).

But going back to my initial statement: humans write these strings and create translations of these strings. Sometimes the easiest way for them to make the message look correct is to add a strongly directional mark vs. wrapping. (Note that ICU produces marks on some number and date formats to coerce proper display).

You are correct that the design document should call out these use cases separately so that the reader (and ultimately the WG) can weight supporting these mechanisms appropriately.

@macchiati:

Some implementations don't handle bidi isolates well yet.

Is there a list available anywhere of software that does not yet support bidi isolates?

exploration/bidi-usability.md

eemeli · 2024-03-27T21:40:16Z

exploration/bidi-usability.md

+Permit isolating bidi controls to be used on the **outside** of the following:
+- unquoted literals
+- quoted literals
+- quoted patterns


Should we also allow for an LRI/FSI pair immediately inside expressions and markup, or is there a reason not to do so?

We could do that also. It doesn't solve the problem of expression/markup internal bidi, though.

I'm mostly here thinking of content like:

a = 'אחד' b = 'שתיים' s = a + '{$' + b + '}'

where we have an RTL variable name inside a placeholder in an RTL pattern.

How, except with an LRI/FSI pair inside the braces, can we get that to render so that the $ is to the left of the name?

See #discussion_r1542105763
For those implementations, RLM/LRM are the best one can do.

exploration/bidi-usability.md

eemeli · 2024-03-27T21:46:58Z

exploration/bidi-usability.md

+The characters inside an isolate sequence have the initial string (paragraph) direction
+corresponding to the starting control (LTR for LRI, RTL for RLI, auto for FSI).


Do all editors reset the paragraph direction after a newline? For example, if there's a newline between an LRI and an FSI, how is the paragraph direction of the second line determined?

The normal application of the bidi algorithm requires a reset on each paragraph, wherein a newline breaks paragraphs.

"The algorithm reorders text only within a paragraph; characters in one paragraph have no effect on characters in a different paragraph. Paragraphs are divided by the Paragraph Separator or appropriate Newline Function (for guidelines on the handling of CR, LF, and CRLF, see Section 4.4, Directionality, and Section 5.8, Newline Guidelines of [Unicode]). Paragraphs may also be determined by higher-level protocols: for example, the text in two different cells of a table will be in different paragraphs."

@macchiati is correct. That's why it's called "paragraph direction". Note that newlines don't help us that much: they are optional in our syntax (outside literals) and technically normalize to space (or nothing). That is, the newline doesn't help us if we end up writing the message as a single-line.

Ok, so given that we allow for newlines within "code" and, specifically, expressions, I think we need to account for that so that we can keep the direction of the code as left-to-right, even when the first strongly directional character on the line is RTL.

As I understand it, not even an LRI/FSI pair inside the braces is always enough to keep the $ on the left side of its name if it's preceded by a newline:

a = 'אחד' b = 'שתיים' s = a + '{\u2066\n$' + b + '\u2069}'

That's correct. Getting the sigils to stay on the left side needs a base direction of LTR. An LRM doesn't help in your example either (except to prevent spillover with the following annotation if there were any). My proposal is not 100% bulletproof (and requires some action on the part of tools or users).

A bulletproof design would require more isolates and would probably be limited to using LRI/PDI pairs. It would be difficult to work with, given that there would be a lot of invisible control characters inside subcomponents of an expression, e.g.:

<LRI><LRI>option<PDI>[whitespace]=[whitespace]<LRI>value<PDI><PDI>

exploration/bidi-usability.md

Co-authored-by: Tim Chevalier <[email protected]>

macchiati · 2024-03-28T22:11:00Z

The W3C internationalization group might have some information on browsers, but one would also have to consider other applications and tools.

…

On Thu, Mar 28, 2024 at 12:59 PM Eemeli Aro ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In exploration/bidi-usability.md <#754 (comment)> : > +Permit the use of LRM or RLM controls immediately following: +- name (note that this includes _identifiers_ as well as names of + _functions_, _variables_, and _unquoted_ literals @macchiati <https://github.com/macchiati>: Some implementations don't handle bidi isolates well yet. Is there a list available anywhere of software that does not yet support bidi isolates? — Reply to this email directly, view it on GitHub <#754 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMFCLVV6ICVGRTSJZATY2RSAHAVCNFSM6AAAAABFLG4F7GVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSNRXGE2TOMJWGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aphillips · 2024-03-28T23:55:28Z

@macchiati noted:

The W3C internationalization group might have some information on browsers,
but one would also have to consider other applications and tools.

@eemeli already spotted this: https://www.w3.org/International/i18n-tests/results/bidi-algorithm

The browsers already support isolates. Other environments, of course, may not (although much progress has been made there too)

catamorphism

Feel free to ignore some of my comments if you think it's already clear to a less naïve reader. However, there are a few things that are really ambiguous to me, like the syntax of examples.

exploration/bidi-usability.md

catamorphism · 2024-03-29T18:11:38Z

exploration/bidi-usability.md

+> [!IMPORTANT]
+> The isolating controls go on the **_outside_** of the various _literal_ and _pattern_
+> productions because characters on the **_inside_** of these are part of the normal text.
+> We need to allow users to include bidi controls in the output of MF2.


Is this worth adding to the "constraints" section? Constraint: we must allow bidi controls as either literal (interpreted by the MF2 parser) or escaped (treated as regular text). (Their position introduces an implicit escape.)

Not sure I follow? The point of the note is to show that bidi controls are just normal text inside literal contexts (the body of a pattern or inside of quoted literals)

catamorphism · 2024-03-29T18:15:27Z

exploration/bidi-usability.md

+- name (note that this includes _identifiers_ as well as names of
+  _functions_, _variables_, and _unquoted_ literals


Suggested change

- name (note that this includes _identifiers_ as well as names of

_functions_, _variables_, and _unquoted_ literals

- name (note that this includes _unquoted_ literals_, _identifiers_, and _variables_;

and that _identifiers_ include the names of _functions_.)

(It's a bit confusing to say that an unquoted literal has a name.)

Perhaps, but that's how unquoted is defined:

unquoted = name / number-literal

exploration/bidi-usability.md

Co-authored-by: Tim Chevalier <[email protected]>

@eemeli

- replace the ambiguous term `value` with unambiguous terms (note that the term value remains for cases where we mean value) - add @eemeli's alternative considered

- add definitions for LRM/RLM/ALM - clarify all instances of value - remove the word 'normalize' - add example of namespace spillover

catamorphism

I don't have time to re-read in detail, but given that it's a design doc, I think that's fine; I followed it for the most part, and the parts that weren't clear to me are probably due to my lack of familiarity with the area.

aphillips · 2024-04-08T21:23:10Z

@eemeli Per today's (2024-04-08) call, I added optional isolates in expressions and markup.

A couple of tricky things here. One is closing markup. The syntax currently makes clear that the opening markup sigil is not attached to the opening bracket. but the standalone closing sigil is ambiguous. It's currently attached to the the closing bracket (as no space is permitted): /}. However, if we put the closing isolate before the closing slash, the slash will be mirrored in an RTL context (like so):

(Recall that the brackets are mirrored because they are outside the isolate. The one on the left in the above picture is the trailing bracket.)

This adds pressure on the parser, since there are four "end of markup" sequences: }, \u2069}, /} and /\u2069}.

eemeli

This adds pressure on the parser, since there are four "end of markup" sequences: }, \u2069}, /} and /\u2069}.

Only two of those are valid at a time, though, as it's not ok to skip the end if you start it.

exploration/bidi-usability.md

eemeli · 2024-04-14T09:41:28Z

exploration/bidi-usability.md

+Permit the use of LRM, RLM, or ALM stronly directional marks immediately following any of the items that
+**end** with the `name` production in the ABNF. 


I still think this is too messy, and doesn't solve the problem as well as isolation, but I'm ok with considering that separately.

My preferred overall solution would be to:

Optionally LR/RL/FS -isolate quoted-pattern, quoted, and name;

Optionally LR-isolate expression; and

Allow for a single LRM after a newline in code, or may at the end of whitespace containing a newline.

Put together, that should allow for rendering all code as LTR, and all possibly-RTL content as RTL.

Make patterns strictly LTR. - Only allow LRI/PDI in _expression_ and _markup_ - Require LTR display/edit - Add an alternative matching my original proposal - Add illustrations of some of the problems with RTL editing

aphillips added 4 commits March 27, 2024 09:29

[DESIGN] Bidi usability

87d0463

Addresses #746. DO NOT REVIEW YET

Update bidi-usability.md

5a752ec

Add more examples and some use cases

280d520

Update bidi-usability.md

d98dd71

aphillips added syntax Issues related with MF Syntax design Design principles, decisions Action-Item Action item assigned by the WG labels Mar 27, 2024

Add ABNF changes and alternative designs

d6e3b38

aphillips requested review from stasm, catamorphism, eemeli, echeran, mihnita and macchiati March 27, 2024 21:05

macchiati approved these changes Mar 27, 2024

View reviewed changes

exploration/bidi-usability.md Outdated Show resolved Hide resolved

exploration/bidi-usability.md Outdated Show resolved Hide resolved

aphillips added the LDML46 LDML46 Release (Tech Preview - October 2024) label Mar 27, 2024

aphillips and others added 2 commits March 27, 2024 14:26

Update exploration/bidi-usability.md

b3298c2

Co-authored-by: Mark Davis <[email protected]>

Update exploration/bidi-usability.md

1086487

Co-authored-by: Mark Davis <[email protected]>

eemeli reviewed Mar 27, 2024

View reviewed changes

catamorphism reviewed Mar 28, 2024

View reviewed changes

exploration/bidi-usability.md Outdated Show resolved Hide resolved

aphillips and others added 2 commits March 28, 2024 07:22

Update exploration/bidi-usability.md

83e9d0f

Co-authored-by: Tim Chevalier <[email protected]>

Add additional user stories, clean up MF2 mentions, add ALM

0f52131

macchiati approved these changes Mar 28, 2024

View reviewed changes

catamorphism requested changes Mar 29, 2024

View reviewed changes

aphillips and others added 4 commits March 29, 2024 11:58

Update exploration/bidi-usability.md

308fc05

Co-authored-by: Tim Chevalier <[email protected]>

Update exploration/bidi-usability.md

125a7ae

Co-authored-by: Tim Chevalier <[email protected]>

Update exploration/bidi-usability.md

239f9ed

Co-authored-by: Tim Chevalier <[email protected]>

Address comments

4cf35cf

- replace the ambiguous term `value` with unambiguous terms (note that the term value remains for cases where we mean value) - add @eemeli's alternative considered

aphillips requested a review from catamorphism March 29, 2024 19:24

aphillips added 5 commits March 29, 2024 14:41

Update bidi-usability.md

b5e602e

- add definitions for LRM/RLM/ALM - clarify all instances of value - remove the word 'normalize' - add example of namespace spillover

Update bidi-usability.md

405810a

change the LRM/RLM/ALM approach

dab3948

improve namespace example

68b4803

Update bidi-usability.md

fd41cce

catamorphism approved these changes Apr 5, 2024

View reviewed changes

Add support for isolates in expressions and markup

5ac8dd9

aphillips requested a review from eemeli April 8, 2024 21:16

eemeli requested changes Apr 9, 2024

View reviewed changes

exploration/bidi-usability.md Show resolved Hide resolved

exploration/bidi-usability.md Outdated Show resolved Hide resolved

Setting line-by-line base direction

df1cd1d

aphillips requested a review from eemeli April 13, 2024 20:48

eemeli requested changes Apr 14, 2024

View reviewed changes

Address comments

2e1419c

Make patterns strictly LTR. - Only allow LRI/PDI in _expression_ and _markup_ - Require LTR display/edit - Add an alternative matching my original proposal - Add illustrations of some of the problems with RTL editing

eemeli approved these changes Apr 14, 2024

View reviewed changes

aphillips merged commit 41285c2 into main Apr 15, 2024
1 check passed

aphillips deleted the aphillips-bidi-usability branch April 15, 2024 16:47

ZL91 approved these changes May 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DESIGN] Bidi usability #754

[DESIGN] Bidi usability #754

aphillips commented Mar 27, 2024 •

edited

Loading

eemeli left a comment

eemeli Mar 27, 2024

macchiati Mar 27, 2024

aphillips Mar 27, 2024 •

edited

Loading

eemeli Mar 27, 2024

aphillips Mar 28, 2024

eemeli Mar 28, 2024

eemeli Mar 27, 2024

aphillips Mar 27, 2024

eemeli Mar 27, 2024 •

edited

Loading

macchiati Mar 28, 2024

eemeli Mar 27, 2024

macchiati Mar 27, 2024

aphillips Mar 27, 2024

eemeli Mar 28, 2024

aphillips Mar 30, 2024

macchiati commented Mar 28, 2024 via email

aphillips commented Mar 28, 2024

catamorphism left a comment

catamorphism Mar 29, 2024

aphillips Mar 29, 2024

catamorphism Mar 29, 2024

aphillips Mar 29, 2024

catamorphism left a comment

aphillips commented Apr 8, 2024

eemeli left a comment

eemeli Apr 14, 2024

		The characters inside an isolate sequence have the initial string (paragraph) direction
		corresponding to the starting control (LTR for LRI, RTL for RLI, auto for FSI).

		- name (note that this includes _identifiers_ as well as names of
		_functions_, _variables_, and _unquoted_ literals

		Permit the use of LRM, RLM, or ALM stronly directional marks immediately following any of the items that
		end with the `name` production in the ABNF.

[DESIGN] Bidi usability #754

[DESIGN] Bidi usability #754

Conversation

aphillips commented Mar 27, 2024 • edited Loading

eemeli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aphillips Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eemeli Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

macchiati commented Mar 28, 2024 via email

aphillips commented Mar 28, 2024

catamorphism left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

catamorphism left a comment

Choose a reason for hiding this comment

aphillips commented Apr 8, 2024

eemeli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aphillips commented Mar 27, 2024 •

edited

Loading

aphillips Mar 27, 2024 •

edited

Loading

eemeli Mar 27, 2024 •

edited

Loading