Skip to content

Latest commit

 

History

History
988 lines (807 loc) · 37.3 KB

registry.md

File metadata and controls

988 lines (807 loc) · 37.3 KB

WIP DRAFT MessageFormat 2.0 Registry

Implementations and tooling can greatly benefit from a structured definition of formatting and matching functions available to messages at runtime. This specification is intended to provide a mechanism for storing such declarations in a portable manner.

Goals

This section is non-normative.

The registry provides a machine-readable description of MessageFormat 2 extensions (custom functions), in order to support the following goals and use-cases:

  • Validate semantic properties of messages. For example:
    • Type-check values passed into functions.
    • Validate that matching functions are only called in selectors.
    • Validate that formatting functions are only called in placeholders.
    • Verify the exhaustiveness of variant keys given a selector.
  • Support the localization roundtrip. For example:
    • Generate variant keys for a given locale during XLIFF extraction.
  • Improve the authoring experience. For example:
    • Forbid edits to certain function options (e.g. currency options).
    • Autocomplete function and option names.
    • Display on-hover tooltips for function signatures with documentation.
    • Display/edit known message metadata.
    • Restrict input in GUI by providing a dropdown with all viable option values.

Conformance and Use

This section is normative.

To be conformant with MessageFormat 2.0, an implementation MUST implement the functions, options and option values, operands and outputs described in the section Default Registry below.

Implementations MAY implement additional functions or additional options. In particular, implementations are encouraged to provide feedback on proposed options and their values.

Important

In the Tech Preview, the registry data model should be regarded as experimental. Changes to the format are expected during this period. Feedback on the registry's format and implementation is encouraged!

Implementations are not required to provide a machine-readable registry nor to read or interpret the registry data model in order to be conformant.

The MessageFormat 2.0 Registry was created to describe the core set of formatting and selection functions, including operands, options, and option values. This is the minimum set of functionality needed for conformance. By using the same names and values, messages can be used interchangeably by different implementations, regardless of programming language or runtime environment. This ensures that developers do not have to relearn core MessageFormat syntax and functionality when moving between platforms and that translators do not need to know about the runtime environment for most selection or formatting operations.

The registry provides a machine-readable description of functions suitable for tools, such as those used in translation automation, so that variant expansion and information about available options and their effects are available in the translation ecosystem. To that end, implementations are strongly encouraged to provide appropriately tailored versions of the registry for consumption by tools (even if not included in software distributions) and to encourage any add-on or plug-in functionality to provide a registry to support localization tooling.

Registry Data Model

This section is non-normative.

Important

This part of the specification is not part of the Tech Preview.

The registry contains descriptions of function signatures. registry.dtd describes its data model.

The main building block of the registry is the <function> element. It represents an implementation of a custom function available to translation at runtime. A function defines a human-readable <description> of its behavior and one or more machine-readable signatures of how to call it. Named <validationRule> elements can optionally define regex validation rules for literals, option values, and variant keys.

MessageFormat 2 functions can be invoked in two contexts:

  • inside placeholders, to produce a part of the message's formatted output; for example, a raw value of |1.5| may be formatted to 1,5 in a language which uses commas as decimal separators,
  • inside selectors, to contribute to selecting the appropriate variant among all given variants.

A single function name may be used in both contexts, regardless of whether it's implemented as one or multiple functions.

A signature defines one particular set of at most one argument and any number of named options that can be used together in a single call to the function. <formatSignature> corresponds to a function call inside a placeholder inside translatable text. <matchSignature> corresponds to a function call inside a selector.

A signature may define the positional argument of the function with the <input> element. If the <input> element is not present, the function is defined as a nullary function. A signature may also define one or more <option> elements representing named options to the function. An option can be omitted in a call to the function, unless the required attribute is present. They accept either a finite enumeration of values (the values attribute) or validate their input with a regular expression (the validationRule attribute). Read-only options (the readonly attribute) can be displayed to translators in CAT tools, but may not be edited.

As the <input> and <option> rules may be locale-dependent, each signature can include an <override locales="..."> that extends and overrides the corresponding input and options rules. If multiple <override> elements would match the current locale, only the first one is used.

Matching-function signatures additionally include one or more <match> elements to define the keys against which they can match when used as selectors.

Functions may also include <alias> definitions, which provide shorthands for commonly used option baskets. An alias name may be used equivalently to a function name in messages. Its <setOption> values are always set, and may not be overridden in message annotations.

If a <function>, <input> or <option> includes multiple <description> elements, each SHOULD have a different xml:lang attribute value. This allows for the descriptions of these elements to be themselves localized according to the preferred locale of the message authors and editors.

Example

The following registry.xml is an example of a registry file which may be provided by an implementation to describe its built-in functions. For the sake of brevity, only locales="en" is considered.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE registry SYSTEM "./registry.dtd">

<registry xml:lang="en">
    <function name="platform">
        <description>Match the current OS.</description>
        <matchSignature>
            <match values="windows linux macos android ios"/>
        </matchSignature>
    </function>

    <validationRule id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/>
    <validationRule id="positiveInteger" regex="[0-9]+"/>
    <validationRule id="currencyCode" regex="[A-Z]{3}"/>

    <function name="number">
        <description>
            Format a number.
            Match a **formatted** numerical value against CLDR plural categories or against a number literal.
        </description>

        <matchSignature>
            <input validationRule="anyNumber"/>
            <option name="type" values="cardinal ordinal"/>
            <option name="minimumIntegerDigits" validationRule="positiveInteger"/>
            <option name="minimumFractionDigits" validationRule="positiveInteger"/>
            <option name="maximumFractionDigits" validationRule="positiveInteger"/>
            <option name="minimumSignificantDigits" validationRule="positiveInteger"/>
            <option name="maximumSignificantDigits" validationRule="positiveInteger"/>
            <!-- Since this applies to both cardinal and ordinal, all plural options are valid. -->
            <match locales="en" values="one two few other" validationRule="anyNumber"/>
            <match values="zero one two few many other" validationRule="anyNumber"/>
        </matchSignature>

        <formatSignature>
            <input validationRule="anyNumber"/>
            <option name="minimumIntegerDigits" validationRule="positiveInteger"/>
            <option name="minimumFractionDigits" validationRule="positiveInteger"/>
            <option name="maximumFractionDigits" validationRule="positiveInteger"/>
            <option name="minimumSignificantDigits" validationRule="positiveInteger"/>
            <option name="maximumSignificantDigits" validationRule="positiveInteger"/>
            <option name="style" readonly="true" values="decimal currency percent unit" default="decimal"/>
            <option name="currency" readonly="true" validationRule="currencyCode"/>
        </formatSignature>

        <alias name="integer">
          <description>Locale-sensitive integral number formatting</description>
          <setOption name="maximumFractionDigits" value="0" />
          <setOption name="style" value="decimal" />
        </alias>
    </function>
</registry>

Given the above description, the :number function is defined to work both in a selector and a placeholder:

.match {$count :number}
1 {{One new message}}
* {{{$count :number} new messages}}

Furthermore, :number's <matchSignature> contains two <match> elements which allow the validation of variant keys. The element whose locales best matches the current locale using resource item lookup from LDML is used. An element with no locales attribute is the default (and is considered equivalent to the root locale).

  • <match locales="en" values="one two few other" .../> can be used in locales like en and en-GB to validate the when other variant by verifying that the other key is present in the list of enumarated values: one other.
  • <match ... validationRule="anyNumber"/> can be used to valide the when 1 variant by testing the 1 key against the anyNumber regular expression defined in the registry file.

A localization engineer can then extend the registry by defining the following customRegistry.xml file.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE registry SYSTEM "./registry.dtd">

<registry xml:lang="en">
    <function name="noun">
        <description>Handle the grammar of a noun.</description>
        <formatSignature>
            <override locales="en">
                <input/>
                <option name="article" values="definite indefinite"/>
                <option name="plural" values="one other"/>
                <option name="case" values="nominative genitive" default="nominative"/>
            </override>
        </formatSignature>
    </function>

    <function name="adjective">
        <description>Handle the grammar of an adjective.</description>
        <formatSignature>
            <override locales="en">
                <input/>
                <option name="article" values="definite indefinite"/>
                <option name="plural" values="one other"/>
                <option name="case" values="nominative genitive" default="nominative"/>
            </override>
        </formatSignature>
        <formatSignature>
            <override locales="en">
                <input/>
                <option name="article" values="definite indefinite"/>
                <option name="accord"/>
            </override>
        </formatSignature>
    </function>
</registry>

Messages can now use the :noun and the :adjective functions. The following message references the first signature of :adjective, which expects the plural and case options:

You see {$color :adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!

The following message references the second signature of :adjective, which only expects the accord option:

.input {$object :noun case=nominative}
{{You see {$color :adjective article=indefinite accord=$object} {$object}!}}

Default Registry

Important

This part of the specification is part of the Tech Preview and is NORMATIVE.

This section describes the functions which each implementation MUST provide to be conformant with this specification.

String Value Selection and Formatting

The :string function

The function :string provides string selection and formatting.

Operands

The operand of :string is either any implementation-defined type that is a string or for which conversion to a string is supported, or any literal value. All other values produce an Invalid Expression error.

For example, in Java, implementations of the java.lang.CharSequence interface (such as java.lang.String or java.lang.StringBuilder), the type char, or the class java.lang.Character might be considered as the "implementation-defined types". Such an implementation might also support other classes via the method toString(). This might be used to enable selection of a enum value by name, for example.

Other programming languages would define string and character sequence types or classes according to their local needs, including, where appropriate, coercion to string.

Options

The function :string has no options.

Note

Proposals for string transformation options or implementation experience with user requirements is desired during the Tech Preview.

Selection

When implementing MatchSelectorKeys(resolvedSelector, keys) where resolvedSelector is the resolved value of a selector expression and keys is a list of strings, the :string selector performs as described below.

  1. Let compare be the string value of resolvedSelector.
  2. Let result be a new empty list of strings.
  3. For each string key in keys:
    1. If key and compare consist of the same sequence of Unicode code points, then
      1. Append key as the last element of the list result.
  4. Return result.

Note

Matching of key and compare values is sensitive to the sequence of code points in each string. As a result, variations in how text can be encoded can affect the performance of matching. The function :string does not perform case folding or Unicode Normalization of string values. Users SHOULD encode messages and their parts (such as keys and operands), in Unicode Normalization Form C (NFC) unless there is a very good reason not to. See also: String Matching

Note

Unquoted string literals in a variant do not include spaces. If users wish to match strings that include whitespace (including U+3000 IDEOGRAPHIC SPACE) to a key, the key needs to be quoted.

For example:

.match {$string :string}
| space key | {{Matches the string " space key "}}
*             {{Matches the string "space key"}}

Formatting

The :string function returns the string value of the resolved value of the operand.

Numeric Value Selection and Formatting

The :number function

The function :number is a selector and formatter for numeric values.

Operands

The function :number requires a Number Operand as its operand.

Options

Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the value of other options, or both.

Note

The names of options and their values were derived from the options in JavaScript's Intl.NumberFormat.

The following options and their values are required to be available on the function :number:

Note

The following options and option values are being developed during the Technical Preview period.

The following values for the option style are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

  • currency
  • unit

The following options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

  • currency
  • currencyDisplay
    • symbol (default)
    • narrowSymbol
    • code
    • name
  • currencySign
    • accounting
    • standard (default)
  • unit
    • (anything not empty)
  • unitDisplay
    • long
    • short (default)
    • narrow
Default Value of select Option

The value plural is the default for the option select because it is the most common use case for numeric selection. It can be used for exact value matches but also allows for the grammatical needs of languages using CLDR's plural rules. This might not be noticeable in the source language (particularly English), but can cause problems in target locales that the original developer is not considering.

For example, a naive developer might use a special message for the value 1 without considering a locale's need for a one plural:

.match {$var :number}
1   {{You have one last chance}}
one {{You have {$var} chance remaining}}
*   {{You have {$var} chances remaining}}

The one variant is needed by languages such as Polish or Russian. Such locales typically also require other keywords such as two, few, and many.

Percent Style

When implementing style=percent, the numeric value of the operand MUST be multiplied by 100 for the purposes of formatting.

For example,

The total was {0.5 :number style=percent}.

should format in a manner similar to:

The total was 50%.

Selection

The function :number performs selection as described in Number Selection below.

The :integer function

The function :integer is a selector and formatter for matching or formatting numeric values as integers.

Operands

The function :integer requires a Number Operand as its operand.

Options

Some options do not have default values defined in this specification. The defaults for these options are implementation-dependent. In general, the default values for such options depend on the locale, the value of other options, or both.

Note

The names of options and their values were derived from the options in JavaScript's Intl.NumberFormat.

The following options and their values are required in the default registry to be available on the function :integer:

Note

The following options and option values are being developed during the Technical Preview period.

The following values for the option style are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

  • currency
  • unit

The following options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

  • currency
  • currencyDisplay
    • symbol (default)
    • narrowSymbol
    • code
    • name
  • currencySign
    • accounting
    • standard (default)
  • unit
    • (anything not empty)
  • unitDisplay
    • long
    • short (default)
    • narrow
Default Value of select Option

The value plural is the default for the option select because it is the most common use case for numeric selection. It can be used for exact value matches but also allows for the grammatical needs of languages using CLDR's plural rules. This might not be noticeable in the source language (particularly English), but can cause problems in target locales that the original developer is not considering.

For example, a naive developer might use a special message for the value 1 without considering a locale's need for a one plural:

.match {$var :integer}
1   {{You have one last chance}}
one {{You have {$var} chance remaining}}
*   {{You have {$var} chances remaining}}

The one variant is needed by languages such as Polish or Russian. Such locales typically also require other keywords such as two, few, and many.

Percent Style

When implementing style=percent, the numeric value of the operand MUST be multiplied by 100 for the purposes of formatting.

For example,

The total was {0.5 :number style=percent}.

should format in a manner similar to:

The total was 50%.

Selection

The function :integer performs selection as described in Number Selection below.

Number Operands

The operand of a number function is either an implementation-defined type or a literal whose contents match the number-literal production in the ABNF. All other values produce an Invalid Expression error.

For example, in Java, any subclass of java.lang.Number plus the primitive types (byte, short, int, long, float, double, etc.) might be considered as the "implementation-defined numeric types". Implementations in other programming languages would define different types or classes according to their local needs.

Note

String values passed as variables in the formatting context's input mapping can be formatted as numeric values as long as their contents match the number-literal production in the ABNF.

For example, if the value of the variable num were the string -1234.567, it would behave identically to the local variable in this example:

.local $example = {|-1234.567| :number}
{{{$num :number} == {$example}}}

Note

Implementations are encouraged to provide support for compound types or data structures that provide additional semantic meaning to the formatting of number-like values. For example, in ICU4J, the type com.ibm.icu.util.Measure can be used to communicate a value that includes a unit or the type com.ibm.icu.util.CurrencyAmount can be used to set the currency and related options (such as the number of fraction digits).

Digit Size Options

Some options of number functions are defined to take a "digit size option". Implementations of number functions use these options to control aspects of numeric display such as the number of fraction, integer, or significant digits.

A "digit size option" is an option value that the function interprets as a small integer value greater than or equal to zero. Implementations MAY define an upper limit on the resolved value of a digit size option option consistent with that implementation's practical limits.

In most cases, the value of a digit size option will be a string that encodes the value as a decimal integer. Implementations MAY also accept implementation-defined types as the value. When provided as a string, the representation of a digit size option matches the following ABNF:

digit-size-option = "0" / (("1"-"9") [DIGIT])

Number Selection

Number selection has three modes:

  • exact selection matches the operand to explicit numeric keys exactly
  • plural selection matches the operand to explicit numeric keys exactly or to plural rule categories if there is no explicit match
  • ordinal selection matches the operand to explicit numeric keys exactly or to ordinal rule categories if there is no explicit match

When implementing MatchSelectorKeys(resolvedSelector, keys) where resolvedSelector is the resolved value of a selector expression and keys is a list of strings, numeric selectors perform as described below.

  1. Let exact be the JSON string representation of the numeric value of resolvedSelector. (See Determining Exact Literal Match for details)
  2. Let keyword be a string which is the result of rule selection on resolvedSelector.
  3. Let resultExact be a new empty list of strings.
  4. Let resultKeyword be a new empty list of strings.
  5. For each string key in keys:
    1. If the value of key matches the production number-literal, then
      1. If key and exact consist of the same sequence of Unicode code points, then
        1. Append key as the last element of the list resultExact.
    2. Else if key is one of the keywords zero, one, two, few, many, or other, then
      1. If key and keyword consist of the same sequence of Unicode code points, then
        1. Append key as the last element of the list resultKeyword.
    3. Else, emit a Selection Error.
  6. Return a new list whose elements are the concatenation of the elements (in order) of resultExact followed by the elements (in order) of resultKeyword.

Note

Implementations are not required to implement this exactly as written. However, the observed behavior must be consistent with what is described here.

Rule Selection

If the option select is set to exact, rule-based selection is not used. Return the empty string.

Note

Since valid keys cannot be the empty string in a numeric expression, returning the empty string disables keyword selection.

If the option select is set to plural, selection should be based on CLDR plural rule data of type cardinal. See charts for examples.

If the option select is set to ordinal, selection should be based on CLDR plural rule data of type ordinal. See charts for examples.

Apply the rules defined by CLDR to the resolved value of the operand and the function options, and return the resulting keyword. If no rules match, return other.

Example. In CLDR 44, the Czech (cs) plural rule set can be found here.

A message in Czech might be:

.match {$numDays :number}
one  {{{$numDays} den}}
few  {{{$numDays} dny}}
many {{{$numDays} dne}}
*    {{{$numDays} dní}}

Using the rules found above, the results of various operand values might look like:

Operand value Keyword Formatted Message
1 one 1 den
2 few 2 dny
5 other 5 dní
22 few 22 dny
27 other 27 dní
2.4 many 2,4 dne

Determining Exact Literal Match

Important

The exact behavior of exact literal match is only defined for non-zero-filled integer values. Annotations that use fraction digits or significant digits might work in specific implementation-defined ways. Users should avoid depending on these types of keys in message selection.

Number literals in the MessageFormat 2 syntax use the format defined for a JSON number. A resolvedSelector exactly matches a numeric literal key if, when the numeric value of resolvedSelector is serialized using the format for a JSON number, the two strings are equal.

Note

Only integer matching is required in the Technical Preview. Feedback describing use cases for fractional and significant digits-based selection would be helpful. Otherwise, users should avoid using matching with fractional numbers or significant digits.

Date and Time Value Formatting

This subsection describes the functions and options for date/time formatting. Selection based on date and time values is not required in this release.

Note

Selection based on date/time types is not required by MF2. Implementations should use care when defining selectors based on date/time types. The types of queries found in implementations such as java.time.TemporalAccessor are complex and user expectations may be inconsistent with good I18N practices.

The :datetime function

The function :datetime is used to format date/time values, including the ability to compose user-specified combinations of fields.

If no options are specified, this function defaults to the following:

  • {$d :datetime} is the same as {$d :datetime dateStyle=short timeStyle=short}

Note

The default formatting behavior of :datetime is inconsistent with Intl.DateTimeFormat in JavaScript and with {d,date} in ICU MessageFormat 1.0. This is because, unlike those implementations, :datetime is distinct from :date and :time.

Operands

The operand of the :datetime function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce an Invalid Expression error.

Options

The :datetime function can use either the appropriate style options or can use a collection of field options (but not both) to control the formatted output.

If both are specified, an Invalid Expression error MUST be emitted and a fallback value used as the resolved value of the expression.

Note

The names of options and their values were derived from the options in JavaScript's Intl.DateTimeFormat.

Style Options

The function :datetime has these style options.

  • dateStyle
    • full
    • long
    • medium
    • short
  • timeStyle
    • full
    • long
    • medium
    • short
Field Options

Field options describe which fields to include in the formatted output and what format to use for that field. The implementation may use this annotation to configure which fields appear in the formatted output.

Note

Field options do not have default values because they are only to be used to compose the formatter.

The field options are defined as follows:

Important

The value 2-digit for some field options must be quoted in the MessageFormat syntax because it starts with a digit but does not match the number-literal production in the ABNF.

.local $correct = {$someDate :datetime year=|2-digit|}
.local $syntaxError = {$someDate :datetime year=2-digit}

The function :datetime has the following options:

  • weekday
    • long
    • short
    • narrow
  • era
    • long
    • short
    • narrow
  • year
    • numeric
    • 2-digit
  • month
    • numeric
    • 2-digit
    • long
    • short
    • narrow
  • day
    • numeric
    • 2-digit
  • hour
    • numeric
    • 2-digit
  • minute
    • numeric
    • 2-digit
  • second
    • numeric
    • 2-digit
  • fractionalSecondDigits
    • 1
    • 2
    • 3
  • hourCycle (default is locale-specific)
    • h11
    • h12
    • h23
    • h24
  • timeZoneName
    • long
    • short
    • shortOffset
    • longOffset
    • shortGeneric
    • longGeneric

Note

The following options do not have default values because they are only to be used as overrides for locale-and-value dependent implementation-defined defaults.

The following date/time options are not part of the default registry. Implementations SHOULD avoid creating options that conflict with these, but are encouraged to track development of these options during Tech Preview:

The :date function

The function :date is used to format the date portion of date/time values.

If no options are specified, this function defaults to the following:

  • {$d :date} is the same as {$d :date style=short}

Operands

The operand of the :date function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce an Invalid Expression error.

Options

The function :date has these options:

  • style
    • full
    • long
    • medium
    • short (default)

The :time function

The function :time is used to format the time portion of date/time values.

If no options are specified, this function defaults to the following:

  • {$t :time} is the same as {$t :time style=short}

Operands

The operand of the :time function is either an implementation-defined date/time type or a date/time literal value, as defined in Date and Time Operand. All other operand values produce an Invalid Expression error.

Options

The function :time has these options:

  • style
    • full
    • long
    • medium
    • short (default)

Date and Time Operands

The operand of a date/time function is either an implementation-defined date/time type or a date/time literal value, as defined below. All other operand values produce an Invalid Expression error.

A date/time literal value is a non-empty string consisting of an ISO 8601 date, or an ISO 8601 datetime optionally followed by a timezone offset. As implementations differ slightly in their parsing of such strings, ISO 8601 date and datetime values not matching the following regular expression MAY also be supported. Furthermore, matching this regular expression does not guarantee validity, given the variable number of days in each month.

(?!0000)[0-9]{4}-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])(T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]{1,3})?(Z|[+-]((0[0-9]|1[0-3]):[0-5][0-9]|14:00))?)?

When the time is not present, implementations SHOULD use 00:00:00 as the time. When the offset is not present, implementations SHOULD use a floating time type (such as Java's java.time.LocalDateTime) to represent the time value. For more information, see Working with Timezones.

Important

The ABNF and syntax of MF2 do not formally define date/time literals. This means that a message can be syntactically valid but produce an Operand Mismatch Error at runtime.

Note

String values passed as variables in the formatting context's input mapping can be formatted as date/time values as long as their contents are date/time literals.

For example, if the value of the variable now were the string 2024-02-06T16:40:00Z, it would behave identically to the local variable in this example:

.local $example = {|2024-02-06T16:40:00Z| :datetime}
{{{$now :datetime} == {$example}}}

Note

True time zone support in serializations is expected to coincide with the adoption of Temporal in JavaScript. The form of these serializations is known and is a de facto standard. Support for these extensions is expected to be required in the post-tech preview. See: https://datatracker.ietf.org/doc/draft-ietf-sedate-datetime-extended/