Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra spaces in markup #650

Closed
mihnita opened this issue Feb 15, 2024 · 16 comments
Closed

Extra spaces in markup #650

mihnita opened this issue Feb 15, 2024 · 16 comments
Labels
Agenda+ Requested for upcoming teleconference resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. Seek-Feedback-in-Preview Issue should be something we seek feedback on in the tech preview period syntax Issues related with MF Syntax

Comments

@mihnita
Copy link
Collaborator

mihnita commented Feb 15, 2024

Our current grammar for markup is this:

markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] "}"  ; open
       / "{" [s] "#" identifier *(s option) *(s attribute) [s] "/}" ; standalone
       / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}"  ; close

That's what it looks like after adding adding options to close.
But the issue is independent.

I would like to make the case that the first space after { is not only unnecessary, but somewhat problematic.

  1. It forces us into a (possibly big) lookahead.
    We see the {, and then we might have to "consume" a lot of spaces to know where we are (markup or expression)

    I decided to strikethrough because it is the least compelling argument. It is not about the parsing, at all. [mihnita]
  2. The open / close attribute is something associated to the whole markup, not the identifier.
    It is this whole marker that is standalone or close, not the attribute
  3. It is somewhat internally inconsistent.
    The / at the end of the standalone is tied with the closing }, there is no [s] in between them.
    This was true before PR Add options to close (spec) #649, but it is more visible now.
  4. It is even invalid in HTML and XML.
    None of this works: < b> and < /b> and </ b>
@mihnita
Copy link
Collaborator Author

mihnita commented Feb 15, 2024

For convenience I will include the HTML below unprotected, so that we can see it in the issue:

<p>Say something <i>italic</i> and last (works).</p>
<p>Say something <  i>italic</i> and last (fails).</p>
<p>Say something <i  >italic</i> and last (works).</p>
<p>Say something <i>italic<  /i> and last (fails).</p>
<p>Say something <i>italic</  i> and last (fails).</p>
<p>Say something <i>italic</i  > and last (fails).</p>

Say something italic and last (works).

Say something < i>italic and last (fails).

Say something italic and last (works).

Say something italic< /i> and last (fails).

Say something italic and last (fails).

Say something italic and last (fails).

@mihnita
Copy link
Collaborator Author

mihnita commented Feb 15, 2024

Proposed change:

markup = "{" [s] "#" identifier *(s option) *(s attribute) [s] "}"  ; open
       / "{" [s] "#" identifier *(s option) *(s attribute) [s] "/}" ; standalone
       / "{" [s] "/" identifier *(s option) *(s attribute) [s] "}"  ; close

to:

markup = "{#" identifier *(s option) *(s attribute) [s] "}"  ; open
       / "{#" identifier *(s option) *(s attribute) [s] "/}" ; standalone
       / "{/" identifier *(s option) *(s attribute) [s] "}"  ; close

@mihnita
Copy link
Collaborator Author

mihnita commented Feb 15, 2024

Note: this is not the same as the expressions.
So there is no need to be consistent, because the similarity is only superficial.
They might look the same, but the meaning is not the same.

There detecting { tells us we are in an expression, no need for extra lookahead.
And the "decorations" { |foo| } or { $foo } or { :foo } DO belong to "foo".
It is "foo" that is a literal / operand / function, not the expression itself.

@aphillips
Copy link
Member

I think this is a non-starter? Our syntax is very consistent about optional whitespace: we are very lax when it's optional, especially inside expressions. The {/} are never part of any construct.

I think the thing that might be confusing here is that when you write the syntax using identifier directly in the markup production (which I think is a natural thing to do), so you don't have the insulation that function or annotation do, in which the sigil is clearly attached to the identifier:

markup = "{" markup-identifier *(s option) *(s attribute) [s] "}" <- this doesn't work because of standalone
markup-identifier = ( "#" / "/" ) identifier

All of our other identifiers and sigil-introduced tokens are of the sigil-identifier flavor. For consistency, this should be to. There is some lookahead to find type, but it's consuming whitespace (so can be optimized).

@aphillips aphillips added syntax Issues related with MF Syntax Agenda+ Requested for upcoming teleconference out-of-scope? LDML45 LDML45 Release (Tech Preview) labels Feb 15, 2024
@mihnita
Copy link
Collaborator Author

mihnita commented Feb 15, 2024

Our syntax is very consistent about optional whitespace: we are very lax when it's optional

Agree, but I don't think this is optional, and the sigils should not be attached to the identifier.
They mark the start / end of the markup.
In the same class as {{ and }}, where we don't allow spaces between the curly braces.
And with the end of standalone, where we don't allow space between / and }.

Right above your comment I explain why I think this is not at all the same.

the thing that might be confusing here is that when you write the syntax using identifier directly in the markup production

I am not judging this based on the grammar, I am judging it as a user seeing the syntax.

Start, {#foo}some text{/foo} and more.

What I see is {# ..... }, delimiters.

These are spaces that I agree are optional:

Start, {#   foo    }some text{/    foo   } and more.

{# marks the beginning of an open / standalone marker,
{/ the beginning of a close marker,
and /} marks the end of a standalone marker.
They are delimiters, same as {{...}}, or |...|.
One "squints" at {/....} and says: ah, this is a closing marker. The / is not a sigil of the name part.

There is not markup (that I know of) that allows these kind of spaces.
If there is one, I would like to see it.

@eemeli
Copy link
Collaborator

eemeli commented Feb 15, 2024

I think it's way too late to introduce this syntax change for consideration, and I do not think we should consider this for LDML 45.

As an implementer, I'd like to note that I had no difficulty dealing with the current syntax. In a pattern, from the { you can see that you have some expression or markup starting, and then you need to look past any subsequent whitespace to determine what that is.

@mihnita
Copy link
Collaborator Author

mihnita commented Feb 15, 2024

As an implementer, I'd like to note that I had no difficulty dealing with the current syntax

This is not about implementing. That is point 1 of 4, and in fact the weakest one.
Should not have been the first one (or included at all, probably)

It is that it does not make sense semantically, as a user, and no other system does that.

@aphillips aphillips added Seek-Feedback-in-Preview Issue should be something we seek feedback on in the tech preview period LDML46 LDML46 Release (Tech Preview - October 2024) and removed Agenda+ Requested for upcoming teleconference out-of-scope? LDML45 LDML45 Release (Tech Preview) labels Feb 18, 2024
@aphillips
Copy link
Member

We agreed to the syntax for 45 in the F2F. I'm going to mark this for consideration during tech preview.

@mihnita
Copy link
Collaborator Author

mihnita commented Feb 22, 2024

BTW, if we are to be "loose" with the spaces, why not allow spaces between the # and identifiers?

Current:

"{" [s] "#" identifier *(s option) *(s attribute) [s] "}" 

Extra space:

"{" [s] "#" [s] identifier *(s option) *(s attribute) [s] "}" 

The current syntax allows ...{ /bold }... but not ...{/ bold }...
Why make it mandatory to "glue" the / and # on the identifier?

I think this is all based on a superficial (visual) similarity with the placeholders.

@lucacasonato
Copy link
Contributor

I'd also just like to chime in that I agree with @mihnita - restricting the syntax here to ensure that there is no whitespace between { and # and { and /, aligning with the standalone close token (/}) would be great - it is visually much clearer than { # and { /

@aphillips
Copy link
Member

I think some care should be exercised about how we discuss the whitespace here. If one just looks at the sigil, the spaces looks weird. But notice that in the remainder of our syntax, the sigil is attached to something, e.g. .keyword, :function, $variable, |literal|. So one argument would be that the # and / sigils should attach to the markup identifier:

{   #strong   }

The counter argument would seem to be that markup is a fundamentally different type of expression, so it's not really a sigil, it's a different introducing sequence:

{#   strong   }

Looking at what HTML does is not really instructive, since any HTML would be produced from the markup syntax and we need to think about what other syntax's needs are as well.

I'd be curious how markup is currently parsed by implementations? Attaching the sigil to the starter would require a one character lookahead in each expression to check if the expression is markup. Attaching the sigil to the identifier would be more similar to seeking the next token (see list above). That doesn't mean that the lookahead is evil. I'm just curious if the difference in parsing is worth it.

@eemeli
Copy link
Collaborator

eemeli commented Jul 1, 2024

I'd be curious how markup is currently parsed by implementations?

I parse markup together with expressions. Because the constructions are syntactically so similar, it's easier to have just one handler for the stuff between curly braces.

As a user, I would find variance between the whitespace requirements of expressions and markup very confusing. I don't find the /} confusing, because it's the only such terminal syntax we have besides } and }}.

@catamorphism
Copy link
Collaborator

My parser does the lookahead after consuming the '{' -- if the next character is '#' or '/' a separate parseMarkup() function is called, which reuses the options and attributes parsing code but is separate from parseExpression(). It wouldn't cause a problem for me to discard whitespace between '#' and the identifier. I don't have a strong opinion on this, though.

@mihnita
Copy link
Collaborator Author

mihnita commented Aug 28, 2024

This is not about how hard / easy it is to parse.
It is what makes sense.
# is not a sigil on the markup, it is part of the { prefix.

{# says "this is a closing markup, we don't care what markup"
The same way that "/}" says "this is a standalone markup, we don't care what markup"

And the closing markup {# foo} closes the opening markup { foo}.
The markup is one and the same, foo.

Same as there is one strong element in html, not a strong and a /strong one.

@eemeli
Copy link
Collaborator

eemeli commented Aug 29, 2024

# is not a sigil on the markup, it is part of the { prefix.

That's not our syntax, though? As you note in the first post on this issue, the # and / sigils are separated from the { by optional whitespace, and they're attached to the following identifier. This matches what we're doing elsewhere in the syntax with $foo, :bar, and @baz. Attaching the # to the { would be a significant change to what we're doing now.

{# says "this is a closing markup, we don't care what markup" The same way that "/}" says "this is a standalone markup, we don't care what markup"

And the closing markup {# foo} closes the opening markup { foo}. The markup is one and the same, foo.

I'm very confused by this passage. Maybe you typo'd some of the sigils here?

@aphillips
Copy link
Member

The current syntax permits space between the bracket and the opening sigil (either # or /) and does not permit whitespace between the sigil and the identifier. We don't have any feedback to suggest that this is a problem--noting that we don't have any feedback in general. Markup is, in general, an unsupported feature and I don't think we have any implementation experience either.

My suggestion would be: let's reject this for now. A future version could allow spaces between the starting sigil and the identifier (but no version would be able to disallow spaces between the bracket and the sigil).

@aphillips aphillips added resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. Agenda+ Requested for upcoming teleconference and removed LDML46 LDML46 Release (Tech Preview - October 2024) labels Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Agenda+ Requested for upcoming teleconference resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. Seek-Feedback-in-Preview Issue should be something we seek feedback on in the tech preview period syntax Issues related with MF Syntax
Projects
None yet
Development

No branches or pull requests

5 participants