Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DESIGN] Bidi usability #754
[DESIGN] Bidi usability #754
Changes from 9 commits
87d0463
5a752ec
280d520
d98dd71
d6e3b38
b3298c2
1086487
83e9d0f
0f52131
308fc05
125a7ae
239f9ed
4cf35cf
b5e602e
405810a
dab3948
68b4803
fd41cce
5ac8dd9
df1cd1d
2e1419c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all editors reset the paragraph direction after a newline? For example, if there's a newline between an LRI and an FSI, how is the paragraph direction of the second line determined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normal application of the bidi algorithm requires a reset on each paragraph, wherein a newline breaks paragraphs.
"The algorithm reorders text only within a paragraph; characters in one paragraph have no effect on characters in a different paragraph. Paragraphs are divided by the Paragraph Separator or appropriate Newline Function (for guidelines on the handling of CR, LF, and CRLF, see Section 4.4, Directionality, and Section 5.8, Newline Guidelines of [Unicode]). Paragraphs may also be determined by higher-level protocols: for example, the text in two different cells of a table will be in different paragraphs."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@macchiati is correct. That's why it's called "paragraph direction". Note that newlines don't help us that much: they are optional in our syntax (outside literals) and technically normalize to space (or nothing). That is, the newline doesn't help us if we end up writing the message as a single-line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so given that we allow for newlines within "code" and, specifically, expressions, I think we need to account for that so that we can keep the direction of the code as left-to-right, even when the first strongly directional character on the line is RTL.
As I understand it, not even an LRI/FSI pair inside the braces is always enough to keep the
$
on the left side of its name if it's preceded by a newline:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. Getting the sigils to stay on the left side needs a base direction of LTR. An LRM doesn't help in your example either (except to prevent spillover with the following annotation if there were any). My proposal is not 100% bulletproof (and requires some action on the part of tools or users).
A bulletproof design would require more isolates and would probably be limited to using LRI/PDI pairs. It would be difficult to work with, given that there would be a lot of invisible control characters inside subcomponents of an expression, e.g.:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not sure if this is what you meant, but "that... containing" doesn't read right to me.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spillover can also occur in declarations and the
.match
statement. It won't have an effect on the parsing, but the appearance to a user.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we omit "identifier" since an identifier ends with a name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identifiers end with names, but also contain names in the namespace position. I wanted to be clear that we meant the end of an identifier in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Value" is very confusing since there are also "option values". I suggest "term" since _names, option values, etc. are all terms in the grammar.
Or, it might be more precise to say something like "the characters should not appear in parsed output" (i.e. the relevant nodes in the data model).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also allow for an LRI/FSI pair immediately inside expressions and markup, or is there a reason not to do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that also. It doesn't solve the problem of expression/markup internal bidi, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly here thinking of content like:
where we have an RTL variable name inside a placeholder in an RTL pattern.
How, except with an LRI/FSI pair inside the braces, can we get that to render so that the
$
is to the left of the name?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #discussion_r1542105763
For those implementations, RLM/LRM are the best one can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with this change. Perhaps omit "normal" or perhaps:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this worth adding to the "constraints" section? Constraint: we must allow bidi controls as either literal (interpreted by the MF2 parser) or escaped (treated as regular text). (Their position introduces an implicit escape.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow? The point of the note is to show that bidi controls are just normal text inside literal contexts (the body of a pattern or inside of quoted literals)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(It's a bit confusing to say that an unquoted literal has a name.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, but that's how
unquoted
is defined: