Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add u: options namespace #846

Merged
merged 12 commits into from
Oct 21, 2024
3 changes: 2 additions & 1 deletion spec/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,8 @@
1. [Data Model Errors](errors.md#data-model-errors)
1. [Resolution Errors](errors.md#resolution-errors)
1. [Message Function Errors](errors.md#message-function-errors)
1. [Default Function Registry](registry.md)
1. [Default Function Registry](registry/default.md)
1. [`u:` Unicode Registry](registry/unicode.md)
eemeli marked this conversation as resolved.
Show resolved Hide resolved
1. [Formatting](formatting.md)
1. [Interchange data model](data-model/README.md)

Expand Down
23 changes: 20 additions & 3 deletions spec/formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,9 +214,21 @@ the following steps are taken:

3. Perform _option resolution_.

4. Call the function implementation with the following arguments:
4. Determine the **_<dfn>function context</dfn>_** for calling the function implementation.
This includes:

- The current _locale_.
- The current _locale_,
potentially including a fallback chain of locales.
- The base directionality of the _message_ and its _text_ tokens.

If the resolved mapping of _options_ includes any `u:` options
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like special pleading on our part. The u: namespace is "just a namespace"?

Perhaps:

Suggested change
If the resolved mapping of _options_ includes any `u:` options
Implementations are encouraged to support _options_ defined in
the Unicode Reserved Namespace (`u:`).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not "just a namespace", because it needs special powers to affect the function context, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the "function context" context distinction is an implementation detail.

In most formatters implementations the locale and the options on how to format are passed to the constructor as parameters, at the same time.

I would rather look at this as "universal function parameters" that might be recognized and honored by several / all functions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem here is that the current set of u: options behaves that way, but we might introduce one that isn't function context affecting in the future? If our intention to is to require that all such options be context-affecting, we should set that as a requirement in the design doc and elsewhere. We might need to contemplate an additional namespace in the future as well, although I can't think of any universal options just at the moment that we wouldn't just put in the default namespace.

Note: I am not disagreeing with doing this. Just making sure we're consistent and clear about it.

supported by the implementation,
process them as specified in the [Unicode Registry](/spec/registry/unicode.md).
eemeli marked this conversation as resolved.
Show resolved Hide resolved
Such `u:` options MAY be removed from the resolved mapping of _options_.

5. Call the function implementation with the following arguments:

- The _function context_.
- The resolved mapping of _options_.
- If the _expression_ includes an _operand_, its resolved value.

Expand Down Expand Up @@ -303,7 +315,7 @@ the following steps are taken:

Implementation-defined _functions_ SHOULD use an implementation-defined _namespace_.

5. If the call succeeds,
6. If the call succeeds,
resolve the value of the _expression_ as the result of that function call.

If the call fails or does not return a valid value,
Expand Down Expand Up @@ -344,6 +356,11 @@ The resolved value of _markup_ includes the following fields:
- The _identifier_ of the _markup_
- The resolved _options_ values after _option resolution_.

If the resolved mapping of _options_ includes any `u:` options
supported by the implementation,
process them as specified in the [Unicode Registry](/spec/registry/unicode.md).
Such `u:` options MAY be removed from the resolved mapping of _options_.
aphillips marked this conversation as resolved.
Show resolved Hide resolved

The resolution of _markup_ MUST always succeed.

### Fallback Resolution
Expand Down
File renamed without changes.
65 changes: 65 additions & 0 deletions spec/registry/unicode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# MessageFormat 2.0 Unicode Registry
eemeli marked this conversation as resolved.
Show resolved Hide resolved

The `u:` namespace is reserved for use by the Unicode Consortium.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps quote the design document here?

Suggested change
The `u:` namespace is reserved for use by the Unicode Consortium.
This registry is for items in the namespace `u:`
which is reserved for use by the Unicode Consortium.
This registry can contain _functions_ or _options_.
Implementations are not required to implement entries found in this registry.
Items in this registry are stable and subject to stability guarantees.
This registry might sometimes be used to incubate functionality before promotion to the RGI or default registry in a future release.
In such cases, the `u:` namespace version is retained, but deprecated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first paragraph is mostly fine, but as #634 is only a PR at this time, it's way too early to refer to the RGI, for instance, or talk about incubation. We can add this language later, when we get consensus on these processes and their names.

I would also find it a bit odd to say e.g. that this "can contain functions", when AFAIK no such u: function has been proposed by anyone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permission to do something does not obligate us to actually do it. No one has proposed a u: function, but there's no reason to forbid them.


Are you disagreeing with the design in #634? More specifically, do you disagree with my thinking about how to organize the registries? I do think we should lock-step these efforts to some degree. We'll probably rename RGI somewhere along the way, for example. But I think we'll likely end up with the same concepts at least.

We could replace my line 9 above with something like:

Items in this registry might be deprecated but will not be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in general agreement about having definitions for non-required functions and options, but I'm honestly not really sure what the current state of #634 is, and whether e.g. the registry development process added to the PR last Friday is final, or whether it's being iterated on. When we last discussed this on a call in May, it was "Still in progress".

I'm also not sure whether the current design doc proposal's language "Implementations are not required to implement any values found in this registry and may adopt or ignore registry entries at their discretion." ought to be strengthened into something more like a SHOULD.

My preference would be to not lockstep processes that do not strictly need lockstep, and to focus first on the concrete (i.e. this PR), and use that experience to inform the registry maintenance plan. On which we ought to have a separate discussion, if/once it's in a stable state.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why functions?

The idea for the parameters is "u: parameters are potentially recognized / honored by all functions"
What would be the meaning of a "u:" function?

If we think "u:function" is a Unicode function, it is redundant.
All functions in the standard registry are Unicode functions, since this is a Unicode spec.


## Options

This section describes common options which each implementation SHOULD support
for all _functions_ and _markup_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:

Suggested change
This section describes common options which each implementation SHOULD support
for all _functions_ and _markup_.
This section describes _options_ which are intended to be common to all _functions_.
Implementations SHOULD support resolving and passing these _options_.
_Function_ authors SHOULD implement each _option_ as described here to ensure interoperability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed language doesn't quite match the accepted design, though. These are not intended to be options that are handled by each function, but by the implementation, and they e.g. apply changes to the function context, so that function authors explicitly do not need to do anything to enable their use.


### `u:id`

A string value that is included as an `id` or other suitable value
in the formatted parts for the _placeholder_,
or any other structured formatted results.

Ignored when formatting a message to a string.

Accepts string values, or values which can be stringified without error.
eemeli marked this conversation as resolved.
Show resolved Hide resolved
For other values, a _Bad Option_ error is emitted
and the `u:id` option is ignored.

### `u:locale`

A comma-delimited list of BCP 47 language tags,
or an implementation-defined list of such tags.
eemeli marked this conversation as resolved.
Show resolved Hide resolved

Replaces the _locale_ defined in the _function context_ for this _expression_.
The value is ignored when set on _markup_.
eemeli marked this conversation as resolved.
Show resolved Hide resolved

During processing, the `u:locale` option
is always removed from the resolved mapping of _options_.
aphillips marked this conversation as resolved.
Show resolved Hide resolved

Values matching the following ABNF are always accepted:
```abnf
u-locale-option = langtag *([s] "," [s] langtag)
```
using `langtag` as defined in [BCP 47](https://www.rfc-editor.org/rfc/bcp/bcp47.txt).
Note that `langtag` is the rule for "normal language tags",
and does not include private-use or grandfathered tags.

Implementations MAY support additional language tags,
such as private-use or grandfathered tags,
or tags using `_` instead of `-` as a separator.
When the value of `u:locale` is set by a _variable_,
implementations MAY support non-string values otherwise representing locales.
eemeli marked this conversation as resolved.
Show resolved Hide resolved

For unsupported values, a _Bad Option_ error is emitted
and the value of the `u:locale` option is ignored.

### `u:dir`

Replaces the base directionality defined in
the _function context_ for this _expression_.
The value is ignored when set on _markup_.
eemeli marked this conversation as resolved.
Show resolved Hide resolved

During processing, the `u:dir` option
is always removed from the resolved mapping of _options_.

Accepts the following string values:
- `ltr`: left-to-right directionality
- `rtl`: right-to-left directionality
- `auto`: directionality determined from _expression_ contents

For other values, a _Bad Option_ error is emitted
and the value of the `u:dir` option is ignored.
2 changes: 1 addition & 1 deletion spec/syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -548,7 +548,7 @@ whether an _operand_ is required,
what form the values of an _operand_ can take,
what _options_ and _option_ values are valid,
and what outputs might result.
See [function registry](./registry.md) for more information.
See [function registry](/spec/registry/default.md) for more information.

A _function_ starts with a prefix sigil `:` followed by an _identifier_.
The _identifier_ MAY be followed by one or more _options_.
Expand Down
2 changes: 2 additions & 0 deletions test/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ These test files are intended to be useful for testing multiple different messag
- `data-model-errors.json` - Strings that should produce Data Model Error when processed.
Error names are defined in ["MessageFormat 2.0 Errors"](../spec/errors.md) in the spec.

- `unicode.md` — Test cases for the `u:` Unicode Registry, using built-in functions.

- `functions/` — Test cases that correspond to built-in functions.
The behaviour of the built-in formatters is implementation-specific so the `exp` field is often
omitted and assertions are made on error cases.
Expand Down
3 changes: 3 additions & 0 deletions test/schemas/v0/tests.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,9 @@
"name": {
"type": "string"
},
"id": {
"type": "string"
},
"options": {
"type": "object"
}
Expand Down
105 changes: 105 additions & 0 deletions test/tests/unicode.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
{
"$schema": "https://raw.githubusercontent.com/unicode-org/message-format-wg/main/test/schemas/v0/tests.schema.json",
"scenario": "Unicode u: Registry",
"description": "Common options affecting the function context",
"defaultTestProperties": {
"locale": "en-US"
},
"tests": [
{
"src": "{#tag u:id=x u:dir=rtl u:locale=ar}content{/ns:tag u:id=x}",
"exp": "content",
"expParts": [
{
"type": "markup",
"kind": "open",
"id": "x",
"name": "tag"
},
{
"type": "literal",
"value": "content"
},
{
"type": "markup",
"kind": "close",
"id": "x",
"name": "tag"
}
]
},
{
"src": "hello {4.2 :number u:locale=fr}",
"exp": "hello 4,2"
},
{
"src": "hello {world :string u:dir=ltr u:id=foo}",
"exp": "hello world",
"expParts": [
{
"type": "literal",
"value": "hello "
},
{
"type": "string",
"source": "|world|",
"dir": "ltr",
"id": "foo",
"value": "world"
}
]
},
{
"src": "hello {world :string u:dir=rtl}",
"exp": "hello \u2067world\u2069",
"expParts": [
{
"type": "literal",
"value": "hello "
},
{
"type": "string",
"source": "|world|",
"dir": "rtl",
"value": "world"
}
]
},
{
"src": "hello {world :string u:dir=auto}",
"exp": "hello \u2068world\u2069",
"expParts": [
{
"type": "literal",
"value": "hello "
},
{
"type": "string",
"source": "|world|",
"dir": "auto",
"value": "world"
}
]
},
{
"locale": "ar",
"src": "أهلاً {بالعالم :string u:dir=rtl}",
"exp": "أهلاً \u2067بالعالم\u2069"
},
{
"locale": "ar",
"src": "أهلاً {بالعالم :string u:dir=auto}",
"exp": "أهلاً \u2068بالعالم\u2069"
},
{
"locale": "ar",
"src": "أهلاً {world :string u:dir=ltr}",
"exp": "أهلاً \u2066world\u2069"
},
{
"locale": "ar",
"src": "أهلاً {بالعالم :string}",
"exp": "أهلاً \u2067بالعالم\u2069"
}
]
}