Skip to content

Latest commit

 

History

History
285 lines (148 loc) · 30.2 KB

notes-2024-07-01.md

File metadata and controls

285 lines (148 loc) · 30.2 KB

1 July 2024 | MessageFormat Working Group Teleconference

Attendees

  • Addison Phillips - Unicode (APP) - chair
  • Eemeli Aro - Mozilla (EAO)
  • Tim Chevalier - Igalia (TIM)
  • Harmit Goswami - (...)
  • Richard Gibson - OpenJSF (RGN)
  • Matt Radbourne - Bloomberg (MRR)

Scribe: TIM

Topic: Info Share

Topic: Tech Preview

Let’s review the Task List:

https://github.com/unicode-org/message-format-wg/wiki/Things-That-Need-Doing

Topic: PR Review

Topic: Date/time composition PR

EAO: Would be best if we could discuss these – what they are doing rather than the exact words we’re using to express what they’re doing. It’s more important for us to find some agreement on the overall behavior it’s trying to describe. In the datetime PR, I use a phrase that Tim specifically mentioned on the string PR that is not exactly correct and I agree with him, but I couldn’t think of a better one. The intent is not to have that exact text land ultimately, but to have the intention behind it. Please look at it on that sort of level rather than figuring out if it’s the exact spec language we want.

APP: I agree, with the proviso that we should also be trying to shape it appropriately, so if people have better wording for the concept, suggesting that would be helpful. We need to land this plane. I agree with you about not nitpicking as long as we get the direction correct.

EAO: One possible result might be that we end up determining that defining something like a result value object with methods within the spec is a good way of defining these things, and that gives us a way of defining this behavior in the registry document.

Topic: 813

Proposal to make the default for dateStyle long rather than short

APP: Think carefully about our actions and whether we’ll be happy when they’re done. If this is a problem, figure out what our criteria are for setting defaults.

MIH: [missed] It doesn’t look like a long… Not sure what ICU does. I looked at what ECMAScript says and it doesn’t say anything (the spec itself). I looked at what ICU does and it has a date time default that you can use as an enum, which is defined to be medium. So I agree short is too short, but maybe long is too long. I would suggest a medium as default.

EAO: My suspicion is that on the existing legacy defaults, they’ve been set at a time and within a scope where, for example, not needing to include localized data for month names and so on might have had significant importance. I suspect that the context that MF2 is aiming for – localized, localizable human messages – that in this context, where it’s much much more likely that something like a date or a time or a datetime ends up being formatted within a phrase or otherwise, within a message, that the legacy defaults don’t necessarily think of this as the primary use case for their APIs. I believe that it’s valid for us to make decisions based on what we think is best for localizable messages, rather than what decisions have been made in the past. When looking at past examples, it seems like everyone has picked something different every time.

APP: First, I tried to follow what MDN said Intl.DateTimeFormat did. I didn’t test a date with a time in the course of writing the original thing. I happen to know that Intl.DateTimeFormat – I don’t think it was for the reasons you theorize, but I think that’s not important. What’s important is that we pick defaults that are the least surprising for people and don’t cause a lot of heartache for implementors, if possible. Talking mostly about en-US here; other locales do things a little differently. We should choose something; my hope was that we could pick something that’s already been chosen by someone. Maybe we should look at CLDR and see if there’s something useful there.

Topic: 812

APP: We have at least one request to allow space at the start or end of a complex message. I want to put aside starting with a space, because that gets back into detecting simple vs. complex. We’ve discussed it before and have consensus about it. Trailing whitespace, which is what this PR proposes to permit, doesn’t really harm anyone, doesn’t have any meaning in a complex message. It seems like it would be better to permit it than to be super rigorous and cause messages to emit a syntax error if they have a newline/space at the end. So I would propose merging this and adjusting text as we go. There doesn’t seem to be a reason to make ourselves pedantic about trailing whitespace.

APP: Should I merge it?

EAO: Not yet. I submitted this PR today; this is a normative spec syntax PR. I think we all end up agreeing, but proposing and merging something like this within a few hours is not fair. I’m perfectly happy for it to sit a week.

APP: Do you want to discuss the leading whitespace issue now?

Topic: Leading Whitespace in Complex Messages

See #809. The question of allowing leading whitespace in complex messages needs discussion. We previously rejected allowing leading whitespace because it makes simple messages harder to detect and would be inconsistent with the processing of simple messages. Let’s discuss.

APP: Current consensus is that we do not permit leading whitespace, b/c simple messages can start with whitespace and then contain almost anything that’s legal. A call-out from tech preview feedback provider and separately, EAO, you had a callout, is that people doing multiline stuff have a tendency to go open-quote, newline, here’s the message. That whitespace causes a syntax error or turns the message into a simple message unintentionally. Should we require simple message processing to look ahead and decide if it’s a complex message later? Or did we get it right the first time?

EAO: We need to be able to isolate a complex message pattern, which is double curlies. Right now it’s perfectly legal for a complex message to only contain double curlies. In order to isolate a pattern like this, either we would need to allow something like a (...) isolate at the start of a complex message that we would need to look past to see whether we’re in a complex or simple message. Or we need to put the complex message pattern isolate in the middle of the two curly braces, which is a technically valid position but it’s kind of weird

APP: Yes, and it has the potential to make selection really wacky-looking. If the isolate is an RLI.

EAO: There is a weird but technically valid way that we could solve that problem. The solution anyone would expect us to take would be to have the isolates on the outside of the double curlies. If we do that, then we need to account for the same thing – how do you select a starting message, whether something like an isolate at the start of the message is part of a wrapper around a complex pattern, or part of a simple message and part of the actual body that we don’t want to touch? Given that context, it becomes much less weird for us to do that and also allow whitespace at the start, and look at the first non-whitespace char of a message to determine if it’s simple or complex.

APP: To detect a simple pattern, we would read ahead and decide if the first character is a ‘.’ or double bracket, and that makes a complex message. That read-ahead action means that if you want a simple message to begin with a bunch of spaces and a ‘.’, you need to escape the dot. That’s not necessarily evil, but it’s an idiosyncrasy that we need to ensure that we handle.

TIM: It would be better to not have that lookahead, but there are other places in the syntax that already require it. Other places where you have to push whitespace back.

MIH: I have to check the code, but I don’t remember having anything like that. In this case, we don’t just skip them, but we remember them. If it’s a simple message, we have to put them back. I can’t remember any other cases like this.

APP: I think in most cases, the whitespace is ignorable later if you find a token

MIH: On the other side, I would not make the decision based on the parser. If it’s best for the users and the parser is a bit more complicated, it’s fine with me

APP: I’m zeroed in on the simple message thing because I’m concerned about it being reliable. I don’t want it to be surprising if my message turned into a syntax error when it’s obvious to me that it’s trying to be a simple message.

EAO: If you consider the detector “is this simple or complex?” then right now, it needs to check if the first character is a ‘.’ or the first two are ‘{{‘. The addition in complexity to this detector is that it would need to look at the start of a message and initially skip any characters that are in the set of whitespace that we say is whitespace, or the LRI/RIR/FSI character, and look at what’s after those, whether it’s a dot or ‘{{‘. I would be very surprised if doing this would make it any riskier than it currently is.

APP: How about: ... you have {$num} foos

EAO: You would have to escape the whole message.

APP: I don’t hear any objection to making the change

MIH: I didn’t say I’m happy with it, I said I would make the decision based on the user. Thinking of the user, I don’t like inconsistencies like “Sometimes spaces matter, sometimes they don’t.” Before the curly bracket, we are not inside a complex message yet. Of course, I’m not the typical user. As a user I’m not sure I’d like that.

APP: So you would be unhappy if we permitted pattern-exterior whitespace on complex messages, but only complex messages? Do you see the problem that was being called out by Luca?

MIH: I don’t understand the problem.

EAO: It’s a backtick, and that’s what we use in JS for multi-line code. Same situation would come in Python with a triple tick

MIH: But if they do that for a simple message, they will get trailing spaces

APP: No, because I think JS can go through the process of getting rid of the whitespaces

MIH: But not JS itself, but when you render HTML. It’s going to make them one space, not nothing. And I’ve seen HTML where one line was slightly indented –

APP: People typing a complex message know that that might be true but are counting on having double curlies around the pattern. They expect the MF processor to eat up the extra whitespace

MIH: Can they do that?

foo = `{
complex
}

APP: Inside the double curlies, the whitespace is meaningful

MIH: If I start with a curly and then ,input foo, the spaces after the curly are not relevant

EAO: Currently, the real-world problem comes up when we have a message with a selector where as we know from how we ourselves tend to write our messages, you tend to write it with the matcher on one line and then each of the cases on separate lines. Because this feels relatively natural. In at least a JS coding context, probably in other contexts, which do have support for a multi-line quoted string, it is very common for something that looks and feels and tries to act like code, for people to have that indented so the starts of lines align. To make that happen, you need a newline at the start and spaces for the indentation. In our syntax, in any other position except the start, this is perfectly valid and allowed. But in that specific situation, which shows up at least in JS and Python, this produces a syntax error at the moment. This is the problem that has been identified. We’re talking about this, but also how if we fix this, is the problem that comes up afterward just as bad? My position on that would be that this is a bigger problem than the problem we create by fixing it. My suggestion is to allow for leading whitespace and the isolation at the start of a complex message. That will be the expectation of people writing those messages.

APP: Maybe the way to say that is that people have tripped on it a few times

MIH: The trouble I see that in the example Addison gave, if we put some spaces before it, they’re preserved.

APP: By design. And they would still be preserved once you determine that it’s a simple message.

MIH: So the question now is, if I see some spaces and then .match, is that a simple message?

APP: Because that’s trying to be a complex message, and if you wanted it to be a simple message, you’d escape the ‘.’.

MIH: Before, we didn’t have to …

MIH: If I want to have ‘ …you have’, now I don’t have to put it in curlies. After the change, I would be forced to put it in curlies.

APP: That’s what the change is

MIH: Which as the user, makes it more complicated for me. Now I have to say that if you start with a bunch of spaces and bidi characters, and then it’s followed by a dot, then you have to put everything in curlies. As a user, it makes the rule more difficult for me to remember.

EAO: I would say that if you look at any corpus of messages, the number of actual real-world instances of having a message starting with whitespaces and then ‘.’ is vanishingly small. Given that we already have a requirement that a message starting with a ‘.’ needs to be quoted, I would be very surprised by finding any message starting with a ‘.’, even if it has whitespace at the start, that wasn’t explicitly quoted. I think that right now, with our current syntax, not allowing that whitespace is the single stumbling block that has now been reported twice by people writing messages with MF2. Much more important to fix than the case where you have spaces and then a dot, because that is really really rare.

APP: I’m already looking at bidi and whitespace in a few PRs and I have an action item to pull it together. Why don’t I finish that PR and include the two options? Then we’ll have to resolve, and I hope to do so quickly, about what to do? Can see the potential tripping hazard, but we’re going to have to choose and we should look at it all in one place, not in different pieces.

MIH: The only one comment I would have is – this is indeed a tripping hazard if you do that in code, but nobody does that in code, b/c you have to translate the messages. There will be some kind of storage bundle format, so you don’t really do the apostrophe thing, you do JSON or whatever. In JSON you don’t just drop newlines in the middle of a string.

EAO: I would be very happy for us to make a decision not to do this with what MIhai just said as an explicit reason for why we’re not doing this. The side effects of how it impacts our other decisions around quoting become very interesting to me.

APP: For various reasons, I’ve become enmeshed in a conversation, wearing my W3C hat, about extension localization (browser extensions). One of the things that’s happening there is they’re slipping towards discovering the need for MessageFormat and the need for resource bundle definitions. They have one but it’s half-baked. It’s a JSON thing. Would be cool if MessageBundle were a project that was actually happening, but I observe that resource bundles will interact with message strings. So they might do things like finding the space doesn’t count for authoring purposes. That could be external to us, but I’d rather we didn’t create headaches. I want people to adopt this standard and not invent their own.

MIH: I think the “half baked format” is ARB, Application Resource Bundle

EAO: I’ve had some conversations with our web extension people, looking to see if we could start building message resource bundles into a version 2 sort of solution for web app localization. I have gone ahead as part of the tooling work earlier this year; have done a bidirectional transform from the current message.json spec for messages into the message resource and MessageFormat 2 data model representation. I’m very interested in this space and would be happy for Addison to loop me into the conversations you mentioned.

APP: I have the action item to incorporate this into the bidi/whitespace cleanup

EAO: Just refreshing mentioning that if we end up not allowing whitespace at the start, would you (Mihai) say we should also not allow the isolation at the start of a complex message?

MIH: Yeah, I would say so.

EAO: I think, Addison, that means we put the isolation in the middle of the curlies. Can you clarify, Mihai, what your preference would be?

APP: Currently we don’t permit spaces between the isolates and the pattern quotes. We could, but that allows the isolates to become kind of free-floating invisibles embedded in space.

MIH: I think it depends on what we’re trying to prevent. Don’t understand why we want them in the front

APP: You want isolates on the outside of the pattern quotes because if they’re on the inside, they’re part of the pattern. You don’t want them inside the pattern quotes.

EAO: Alternative: {LRI{alternative}PDI}

APP: If it’s complex, we don’t have to rewind, because the LRI gets dropped. It’s only for presentation. It behaves like whitespace (optional whitespace) in that regard. So maybe it’s different that way. Whereas if you had space, space, LRI, some text, now you’re a simple message; the LRI is part of the pattern.

EAO: I thought I’d highlight/underline this given that we passed over it, but happy for the action there on Addison to take this forward. Mihai, of course that’ll require an expression on the bar. Explaining this as an easy and developer-friendly option is something I’d be happy to leave to the reader.

MIH: We don’t have a problem with LRIs, PDIs in that kind of context, right? The dot is already a complex thing. It’s only a problem if we want LRI{{.

let foo = `
 .match ($bar) 
* {{my pattern}}`
versus: 
let foo = `.match($bar) * {{pattern}}`

806

APP: Holding in abeyance while Eemeli works on his action items?

TIM: Yes

804

APP: Elango had an action item to update this based on our conversation. He applied some suggestions but there’s still work to do. Then Eemeli was going to have an action item to do the spec change. Elango is out on vacation.

EAO: Would it be OK for me to file the PR implementing this, as we know how what our sense is of what things should look like, rather than waiting for Elango to do his updates?

MIH: I think we have a PR already. I created a PR for that, and for it to be accepted, the reason was that you have to first document it in a design doc. I didn’t have time to catch up with the design doc – is it so radically different? Do we need a new PR?

APP: We agreed to changes to what’s in Elango’s design doc in last week’s call. Might want to read the notes. Off the top of my head, I don’t know that your PR still obtains or not. It might. I think we were going to check in the design doc and then approach the text again. I think it’s different than your proposed text.

EAO: During last week’s call, we spent most of the time actually talking about this, and the conclusion we ended up with is rather than going with what’s currently in the PR, to go with the separate… I don’t have access right now

APP: I’m of the opinion that we don’t have to wait for Elango to come back before implementing it; maybe we can work around the fact that he’s on vacation and get the design doc finished and merged. I don’t want to block for a month or several weeks on getting what I think is a consensus done.

MIH: What I’m a bit reluctant about is that I’ve seen things presented as consensus when they are not…

APP: Read last week’s notes, they’re quite extensive. See if you understand what we think the consensus is. Then we’ll discuss next week.

EAO: Thought I’d note that we identified also last week that the bit about functions being allowed to customize their fallback values is an unnecessary complication. We discussed last week that this would be included in the PR, which also needs to touch some of the formatting text around there.

EAO: Currently the spec says that a custom function is allowed to signal that it hit an error, but also provide an alternative non-default fallback value to use for itself, rather than the one in the spec. This is what I’m saying is an unnecessary complication.

MIH: I’ll try to parse the notes from last time

APP: I might ping Elango; if he’s not available, I might do some things to get his design doc in and then make changes to it that we agreed on, so we’re all looking at the same thing.

799

APP: Unify input and local declarations in the data model – no activity. I think we said last week that we would look at it with an eye towards merging or not merging.

TIM: If I’m the only one opposed, I would withdraw my objection

MIH: I might have an opinion, still catching up from vacation

EAO: In short: in the data model, there is no discernible benefit from marking if a declaration is an input or a local. When you look at the content of a declaration, what’s not allowed in a local is exactly the stuff that’s allowed in an input, and vice versa. So we can use a single data structure in the data model. Removes the possibility of representing an error case that is not really beneficial for anyone.

MIH: I understand what you’re trying to do; not sure about the implications.

APP: Maybe give it a read and if you have questions, maybe you and Eemeli can talk on Slack.

Topic: How do we approach registry format?

What is the shape of the data for the function registries?

EAO: We did a bunch of work earlier and then got stuck. We need to get ourselves unstuck so that we can either determine that this thing should not exist or doesn’t need to exist; or that it can exist and this is what it should look like. What we didn’t do well or didn’t know we ought to do is to define the audience or consumer of the data definition of the registry structure. We started by having it be a thing for tooling and humans, but now we’ve effectively gone and done the communicating to humans part separately. That communicating to humans and implementors to some extent includes stuff that never could land in the registry.xml, like what the functions do rather than what their interfaces are. I’m interested in figuring out what the audience is for the registry, and what the tools look like that currently do the stuff. I’m suspecting that registry.xml is primarily meant for validators and linters and so on, and it would behoove us to look at what tools are using this, what the formats they’re currently consuming are to do their work. To design what we are doing them and providing for them to consider these tooling consumers above others. Not necessarily have this be the source of truth, but to be an artifact generated by humans from what we describe in the text of the registry document. Who else is interested in this? Previously Stas, but he’s lost bandwidth somewhat. If it’s a smaller group than the whole WG, could we coordinate work?

APP: I hear that; I think in general it’s the right idea. The challenge is that there may not be tooling already extant. I think a key audience to me would be translation tooling. I know from experience that whenever I created syntax in my message formatter, I needed to talk to the translation tools team and say, for example, when I had a selector, this is how you know for a given locale what the explosion is of the keys. Here are what the options and their values are, and what they may be, so you can communicate them to translators and lint them and do stuff like that. I don’t think there’s any general-purpose tooling around that, but our wish is that people would develop it rather than doing it by communicating to humans. I’m unsure if there’s a runtime element to this. Whether the runtime cares about the registry description. We should maybe think about what the use cases are.

MIH: You pretty much said everything I wanted to say. I’m interested in this mostly for localization tools. Can be used by linters, but the part I care the most about is tools. I pasted a link to something that exists already, the internationalization tag set. Just as you know what the plurals are for Russian, you would know what the grammatical cases are for Romanian – stuff like that. I’m interested, I see it as mostly useful for tooling. Something similar that exists already, standard: the W3C "Internationalization Tag Set (ITS) Version 2.0"

EAO: While I agree that there probably isn’t anything extant that can consume stuff like this, there is more general-purpose than localization validation and other sorts of code tooling that definitely exist. I am interested and will be looking into how customizable they are with new syntaxes and new function definitions in particular, because that’s a large part of what we need to define. How to effectively describe interfaces for functions in a somewhat code-agnostic way, so that the tooling can read that well. And what tools already exist for consuming what formats, so that someone building a better localization tool, for example, would not need to invent everything, but we could make their life easier.

MIH: ITS is standard; other localization tools use proprietary formats. Some use DTDs, some use proprietary formats, there’s no universal anything.

APP: I would prefer if possible that we not replicate CLDR. That we be able to reference that without requiring people to copy all this stuff in. One of the things about the registry right now is that it’s more concerned about describing the shape of the function space and being able to point to things like “for Romanian plural rules, look at CLDR” Rather than copying it all in. The second thing is that maybe this should become an ancillary deliverable rather than something we have to deliver as stabilized and normative. Could become a reference thing that people can iterate on a little bit. Concentrate on those things and then focus on creating a registry of functions you must and should implement; creating an ecosystem for custom functionality.

EAO: I would count as an entirely valid solution for this to say: this machine-readable thing, we don’t need it. It might be nice, we might need to work on it later, but you can build an implementation and tooling just based on the registry.md definitions we have. I would be happy just concluding that; it removes a lot of work we would need to do. To specifically respond to what Addison said, I agree that if we do this thing, let’s do it in a way that allows it to be independently versioned.

APP: It would either have to be versioned with the registry or on its own.

EAO: Probably on its own. We already have the test suite effectively versioned that way. This would be the same sort of supporting document and element and definition.

APP: Does that mean there’s a proposal that we change our deliverable set to hive this off? That section of our spec is not currently part of the tech preview.

EAO: I think the question is, do we want to punt it later, or just punt it out for now?

APP: My proposal would be to make it – if we deliver anything in this space, to make it a non-normative ancillary thing, as we’ve done with a couple of other pieces. So not a deliverable.

EAO: So basically say, we’re not doing this for now and maybe leave a space somewhere saying that if you really want to do this thing, you can

APP: If we happen to deliver something, it’s a reference implementation. If you want this, here’s what it would look like.

EAO: For context, a likely result of this is that I don’t know if anyone other than me is working on an MF2 language server (meaning programming language back-end for using various editors). Effectively if we don’t define it here, I’m still going to need some definition for how we define a custom function. Giving context to what happens if we don’t spec it out; something like that will still come into existence, but it’s not going to be universal and will probably look different from what we would do within the WG.

MIH: I played with your VSCode thing before and I liked it. Implementing what you implement would inform the registry. After all, your thing would be a tool that can consume this. So I think it’s beneficial to build some tools that need this info; helps us decide what to put in the registry.

EAO: Put an action item on me to file a PR dropping that from our deliverables. I’ll do a thing so we can officially decide this later.

Topic: AOB?

EAO: Do we have an idea of what we’ll be doing next week or in a few weeks, so we can prepare? For example, contextual options. Is there anything else we want to touch in the near future?

APP: I want to get back to that, but what I hope to do is deliver the bidi and whitespace stuff for next time. Because of holidays, maybe no one sees it, but we’ve talked about it already so I hope it’s pretty baked. The other thing on our plate is function composition. I think if we look at your PRs and do comments and so on, a big block of our time next week should be to try to resolve what we’re doing. We want to close in on that, because it’s been holding it back. Those things, and if we can get the error thing in shape, which we’ve already mostly agreed to. If we can resolve those three things, I’d be happy with next week. Contextual options or attributes are important to me, and so I’d like to have that discussion, but don’t want to have four things all at once. Does that sound like a plan?

EAO: When are you going to get the bidi to a state where I can look at it again?

APP: I’ll take all of the optional-whitespace / whitespace and merge it into the bidi thing. The PR that says “don’t review this” is close on the bidi thing. I think there’s a couple of tweaks that I have left.

EAO: I would like it if we could separate the ABNF change, making the optional and non-optional whitespace, to be a different action and change than adding in the bidi stuff.

APP: I think if you look at 811, maybe I’ll just make 811 live, I think that makes the design doc for bidi match what we’ve talked about. I’ll double check it. Let’s get that done and agree about bidi. Maybe bidi, errors, and composition for next time? Then we’ll do whitespace and attributes the following week.

EAO: I think because of what we just discussed earlier, the bidi and leading whitespace discussions are somewhat linked. We can try to sort one of them out first, but if it’s bidi… we’ll see how it goes.

APP: They are linked. If we don’t agree on the approach to bidi… then it’s just a detail of the whitespace implementation. If we try to have both conversations at the same time, we’ll be switching between two very different conversations.