- Eemeli Aro - Mozilla (EAO)
- Mihai Nita - Google (MIH)
- Elango Cheran - Google (ECH)
- Staś Małolepszy - Google (STA)
- Richard Gibson - OpenJSF (RGN)
- David Filip - XLIFF TC, Huawei (DAF)
- review open issues
- discuss blockers to beginning prototyping
STA: A bit of cleanup is needed because we have a bunch of issues that are related. I propose this discussion in the issue that has the most discussion, which is #253.
EAO: I am closing #257 as redundant.
EAO: As I understand it, the intent of this PR is to make a purely editorial edit without changing anything in the proposal. This PR just makes the change that any character is a character in the Unicode code point space, with descriptions of ranges that are included or excluded. Once this PR lands, then we can have
MIH: I am not able to follow where he is coming from. I don't understand why JavaScript influences the decision, but Java properties doesn't.
EAO: I think the issue you are referring to is #268.
MIH: Yes, the discussion is split. RGN was saying that what I was referring to is in-memory, not applicable, but I don't understand the argument that we care about the valid characters in a serialization... I'm fine with the change. I just want to make sure that we understand the implications ... Depending on the programming language and/or operating system, you get strings serialized as UTF-8, or UTF-16, or potentially something else. The question is if you have invalid UTF-8 characters, do we say it's invalid, or not? That is the implication that we need to discuss.
RGN: Logically, when you are ingesting a representation of a message format into an in-memory MF structure, you need to define what is being ingested. In Java, you can say that it is treated as a sequence of Unicode code points. THe ingest process takes the code points and returns a data structure / object. The sequence of code points is going to be checked syntatically to understand it. There is a special meaning ascribed to the double quote code point, left bracket, etc. It is checked against the EBNF grammar defined. The followup question to the PR I started is what should be considered valid in the grammar. If you say it is literally any code point, other than an unescaped double quote or backslash, you run into the problem that some values cannot be represented UTF-8 string. The whole ecosystem prefers and defaults to UTF-8. You are introducing friction if you say that the message cannot be supported in UTF-8.
STA: I'm not as familiar with teh details of encodings. What sort of code points are not representable as UTF-8.
RGN: Surrogate code points. Those are code points reserved for representing code points in UTF-16 that are beyond the first plane (BMP) of 2^16 code points.
MIH: I understand what you mean. But we also implement this is C and Java, and so on. So what should we do if we receive a message with invalid UTF-8 code points. Do we expect to replace them with the replacement character, or do we just pass them through?
RGN: I think what you're asking about, using JavaScript as a concrete example, is that a JS string is allowed to have unpaired surrogates. So the question is a question for the JS adapter / implementation, but that's not a question for the standard itself.
MIH: So we leave it to the implementation?
RGN: Yes.
MIH: Okay, that is fine with me.
EAO: Given that this discussion is not the main portion of the PR, can we merge this PR? It sounds like we're okay with merging.
EAO: The other PR is the one I filed. It's to make markup placeholders use plus and minus signs to indicate open and close to indicate the type of placeholder/markup tag.
STA: I think we're on the verge of this PR becoming a point of discussion. I think this is related to the issue about needing different sigil names. We can go ahead with EAO's PR and then continue the discussion in #241.
MIH: To be fair, I like the HTML-like style with slashes to indicate open/close/standalone. The question I have is if we need a concept of a sigil in the first place, it feels like bikeshedding.
EAO: Through the discussion, I feel like we have less distance between formatting functions and markup placeholders. My understanding of what we will havei n the spec, data model / structure-wise, is whether we need to have arguments for open markup elements.
MIH: I think we do need to have arguments.
EAO: I don't think we do.
MIH: Okay, then this is not the place to discuss it, it's the issue that we've touched in the past few weeks.
EAO: Once we get to implementation, my sense is that we get things that represent markup elements and things that represent placeholders. I think we're getting away from the actual PR.
ECH: Why do we
STA: Maybe we can spend some time today talking about whether we need markup elements at all. There has been a good discussion in Github about this already. I think EAO's comments already covered what I wanted to say.
MIH: Are you referring to the issue #262.
STA: Yes, and there may have been another issue.
EAO: I would like us to get more to that discussion. Would people be okay with merging #283 so that we have something for the open and close markup tags.
STA: Do we need this?
EAO: I was convinced by RGN that because many templating systems use 2 character syntax to demarcate placeholders, we could use the plus and minus to distinguish open and close.
STA: I understand the idea, but I don't understand where the idea originated from. Maybe this is from MWS's idea that we shouldn't allow whitespace if possible, which I don't agree with, but I agree with the idea that the placeholder starts with the opening curly brace, and everything in between is part of the contents of the placeholder.
MIH: I agree that if you have {:...
}, the colon is a part of the function invocation, not a part of the curly brace.
EAO: I agree with you structurally, but I think it is useful to communicate the open and close, and this is better than the HTML style syntax of nothing for open and a slash for close.
STA: Actually, it's issue #240 that I wanted to spend 10-15 minutes to discuss. Let's switch to that.
I think by special-casing this class of placeholders, we enable tooling to know that they need to invoke special functions when formatting them.
EAO: What I want to get out of MF2 with respect to markup elements is that they're possible. I don't think they need to be special-cased. What occurred to me when writing my comment is that there are things in the registry that ___. I think what STA was saying is that maybe a function doesn't need to be invoked, and can allow a formatToParts.
MIH: I was trying to remember one of the comments from MWS, I think it's that a function can exist that can be pass-through for formatting, say :string
. I do agree that we should have formatToParts, which should be true parts (not the current MF notion of parts), and we can also have the pass-through function -- the identify function.
STA: I'm torn, myself. I thought I knew what my position was. I started out thinking that we can do everything as functions. A few months ago, I was convinced by discussion in the group to consider display elements as special because they are larger than just displaying things. Crucially, I think it's more of a stylistic choice as far as API design, they are functionally equivalent. If it is a part, then it is up to the runtime to decide what to do it. Which is appropriate, because the runtime is responsible for knowing how to display things, rather than inside a function. In fact, some functions can be called too early to know how to decide how to display things. So the function may have to pass through and let the runtime decide anyways. So think we need to decide on the special-ness of display elements.
ZBI: I'm in a similar position as STA as going back and forth on the special-ness of display elements. My anchor is display elements / markup elements do not share the same API as formatting functions. I think the CLDR-TC+ICU-TC decision came into the discussion pseudo-coding the solution using functions. It feels like pushing a square peg through a round hole. It might require tools to look into the registry to figure out what to do.
In the option where we treat markup elements, we have 3 types of tools: the tool that formats a placeholder without any intelligence, a tool that formats a placeholder by looking into the registry for a function to do the formatting, and a tool that can recognize the difference between markup elements and placeholders with functions and can do
EAO: I agree, but I think we can come out with an emergent property from the base rules, but without putting constraints on the extent of what a function can do.
MIH: Tools today already exist that take in the placeholder type open/close/standalone. Tools today also take in the type of markup tag and reason about / dispatch on the tag. There is a tendency to think that UI elements are just UI, but a lot of them have functionality. If you only think of JS, they have an identifier, and when they do, translators cannot delete them because they are used to trigger functionality, so it's a concept that is represented in XLIFF. I am not bringing up XLIFF to say we have to do everything it does, but designs in XLIFF support the needs of the localization world.
ECH: I think I can understand the plus and minus signs because we would need to represent open/close/standalone info either way. That would have to be some key-value option in the options list for a placeholder, and the EM proposal doesn't specify what the name of that key should be. But that is always something that we can specify later.
STA: I wanted to verify that I understand,
ZBI: One more thing that came to mind is that if we treat elements as regular placeholders, then we actually have a deeper problem. In our registry thinking, I don't think we have thought through how we communicate in the registry how a particular function interprets the types of the arguments. But if you say that we don't need to specialize functions for display elements, then we don't need to spend time discussing this.
EAO: I think we're mostly agreed, and we agree on the need for indicating open/close/standalone for placeholders. We still disagree on and need to discuss whether open and close placeholders can take arguments, or not. I propose
MIH: I think the discussion on that last topic in which we disagree can be better understood through the discussion in #262.
STA: Back in #240, or is #283, we are saying that there are 3, not 1 sigils for calling a function. But 2 of them will indicate that in addition to invoking a function, they also indicate the open/close/standalone value of the placeholder.
MIH: I still think we need a function for placeholders representing markup tags, because the tag names might be reused. Ex: <sub>
is in HTML and exists in SSML: https://cloud.google.com/text-to-speech/docs/ssml#sub
ECH: On the point in #240 in which it is proposed to not allow arguments or options for close placeholders, we need to be able to allow ids for close placeholders to match them up to their corresponding open placeholder, especially when there are multiple instances of the same tag name in a message, and they need to be paired.
EAO: This is exactly why I want to preclude them because adding that possibility to a standard is easy, but once you add something, it is hard to take away. I think our implementation work will give us real-world evidence of where it is needed and how.
ECH: I agree with the principle of it being easier to add things and hard to take them away. Even though I have a strong opinion on how this will end up, I am okay with starting with it not being allowed and allowing implementation work informing us soon enough.
STA: The topic that we just discussed seems harmless enough that allowing them or not allowing them won't affect people, but I am not strongly opinionated on that. On the topic from MIH that we need to have positional arguments to indicate the markup type as a function, would you consider having namespaced functions as a sufficient substitute?
To the point that MIH said about namespaces being the same as functions, I want to talk
EAO: Would it be appropriate for me to merge the PR to allow the syntax with the plus and minus signs, and due to lack of time, we can continue the discussion on whether we have arguments and/or options in open and/or close placeholders?
ECH: I am okay with going ahead in this way based on my interpretation that the plus minus signs are syntactic sugar to represent the open/close status of a placeholder, but that the data model wouldn't need to change, because this is just a special syntactical representation.
EAO: Okay. I think if you or MIH can create a PR to represent your changes to the syntax spec in the develop
branch, we can continue the discussion on that PR because we don't have enough time in the current meeting to do so.
STA: On #255, if there's a consensus that excludes me on using curly braces instead of square brackets for delimiting patterns, then I would be okay with curly braces for patterns. And that would make the issue of delimiting selector values really quick to close. I won't block on this forever, but I raised concerns, and no one has addressed those concerns, which is not okay.
EAO: Is it okay to record a consensus on curly braces and have a followup issue to continue looking into square brackets?
STA: That is not what I am saying. MWS is on vacation for 2 more weeks, and others on vacation, too. However, are there other decisions that are affected by a decision here, like how to delimit selector values.
MIH: I still have to do the implementation for ICU4J, so I can work on the data model part if that's okay.
STA: I still am not comfortable moving forward. It will be hard to reverse a decision here if we start implementing based on this and subsequent follow on decisions.
ZBI: I completely agree with STA that we should not move forward without addressing his concerns.