-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add <when>
to help select the right <match>
#558
Conversation
This mechanism looks decent, but I don't think we should incorporate it. We do need to point to CLDR data (which this PR doesn't solve). I think that recreating plurals.xml in the registry is wasteful and will lead to a binding between the registry and CLDR releases. There is no reason for us to do that. The registry defines:
This PR is adding locale filters (or explosion rules, if you prefer) on the option values (but not the rules themselves). The formatter implementation will never read this or check it. This is for tools to use when generating target language variant matrices for translation. I think we should work on the pointer-to-data mechanism and maybe something like this PR for defining locale option sets using an ancillary format. That would mean either a transform of |
I'd like to push back a bit on that characterization. This is not about adding data to the registry, but about mapping specific sets of MF2 selector options to subsets of that data and providing a form in which that data can be expressed for tools. To be explicit, I am not proposing to include any For example, the data on Polish plurals looks like this: <pluralRules locales="pl">
<pluralRule count="one">i = 1 and v = 0 @integer 1</pluralRule>
<pluralRule count="few">v = 0 and i % 10 = 2..4 and i % 100 != 12..14 @integer 2~4, 22~24, 32~34, 42~44, 52~54, 62, 102, 1002, …</pluralRule>
<pluralRule count="many">v = 0 and i != 1 and i % 10 = 0..1 or v = 0 and i % 10 = 5..9 or v = 0 and i % 100 = 12..14 @integer 0, 5~19, 100, 1000, 10000, 100000, 1000000, …</pluralRule>
<pluralRule count="other"> @decimal 0.0~1.5, 10.0, 100.0, 1000.0, 10000.0, 100000.0, 1000000.0, …</pluralRule>
</pluralRules> Currently, in order to figure out that in Polish a selector on This PR is about providing an MF2-friendly language in which it's possible to express the selection data so that the problems of "figure out the relevant data" and "write a good message validator" can be solved separately, rather than needing everyone who might want to solve the problem doing it for themselves -- or not at all. For a possible next-step with the proposed
Yes, you're right. None of the registry contents has any bearing on a formatter implementation; it's all for tools and validators.
Agreed; that's what we have #538 for. The <function name="number">
...
<matchSignature>
...
<option name="select" values="plural ordinal" default="plural" />
...
<when option="select" values="ordinal">
<matchRef href="path/to/ordinals.xml" transform="all-pluralRules.xsl" />
</when>
<when option="select" values="plural">
<when option="maximumFractionDigits" values="0">
<matchRef href="path/to/plurals.xml" transform="integer-pluralRules.xsl" />
</when>
<matchRef href="path/to/plurals.xml" transform="all-pluralRules.xsl" />
</when>
<match values="zero one two few many other" validationRule="anyNumber" />
</matchSignature>
...
<alias name="integer">
<setOption name="maximumFractionDigits" value="0" />
</alias>
</function> Note there how two different data sources and two different transforms are used, and how the fallback is still defined directly. Each |
Yes, although tools don't need to know which rule fires for a given value. They just need to know what rules can fire for a given selector. The What I'm getting at is: you're right that we could parse the data to produce an intermediate data file (and we should probably get CLDR to produce it going forwards). What's more, we need to handle the case of a custom function that doesn't use CLDR data directly. But I still don't think we should put any CLDR data into the registry itself. |
I agree with Addison |
Okay, let's fix all of it here then. I added If the registry does include a After prodding at this for quite a while and starting from the ideas of #538, I came to the realization that the transform of e.g. CLDR plural content to So we end up with the <?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="cldr-plural-matches.xslt"?>
<cldrPluralMatches href="path/to/cldr/common/supplemental/plurals.xml" /> where the To prove that this can work, here's an implementation of the transforms required for CLDR ordinals and plurals, including integer handling: https://gist.github.com/eemeli/75a0380e57adb237305ab4c480929a1f You can test that locally by putting all the xsltproc match-plural-integers.xml |
spec/registry.md
Outdated
|
||
Each `<matches>` MAY contain either one or more `<match>` elements, or an `href` attribute. | ||
If an `href` attribute is set, its URL value MUST resolve to an XML document | ||
with a root `<matches>` element with no `href` attribute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not permit chaining?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we've no reason to, and this way we can rely on the external matches XML to resolve all of its dependencies completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are no cases in which part of a <matches>
tree in the external XML refers to (say) CLDR data? Once you've implemented resolving an external file, it's just a question of recursion, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are no cases in which part of a
<matches>
tree in the external XML refers to (say) CLDR data?
Correct. A <matches>
can only contain <match>
elements, which may only have locales
and values
attributes, and are otherwise empty. So there's no space for recursion.
Co-authored-by: Addison Phillips <[email protected]>
Co-authored-by: Addison Phillips <[email protected]>
Co-authored-by: Addison Phillips <[email protected]>
Co-authored-by: Addison Phillips <[email protected]>
- `<when option="select" values="plural"><matches><match locales="en" values="one other" ... />` | ||
can be used in locales like `en` and `en-GB` if the selection type is known to be plural | ||
to validate that only `one`, `other` or numeric keys are used for variants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to make these into examples. And I think we might want to avoid "validate that only". Perhaps:
- `<when option="select" values="plural"><matches><match locales="en" values="one other" ... />` | |
can be used in locales like `en` and `en-GB` if the selection type is known to be plural | |
to validate that only `one`, `other` or numeric keys are used for variants. | |
> For example, | |
> `<when option="select" values="plural"><matches><match locales="en" values="one other" ... />` | |
> could be used when validating translations for locales such as `en` and `en-GB` | |
> to check that variant keys `one` and `other` have been provided | |
> (in addition to any numeric keys). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of href
for match
, but that's #538, so let's discuss it there. I'm not a fan of other changes in this PR.
I think this is trying to model the registry data in a way that specifies the logic to be applied to the data. It thus couples the definition of the data with the usage of the data. The same can be said of the changes in #534, which I commented on a few minutes ago.
We already have a built-in way of achieving what this PR is trying to do, if I understand correctly: multiple signature elements.
This PR proposes to allow creating nested element hierarchies that imitate code, like this:
<when option="select" values="plural">
<matches validationRule="anyNumber">
<match locales="en" values="one other"/>
</matches>
</when>
<when option="select" values="ordinal">
<matches validationRule="anyNumber">
<match locales="en" values="one two few other"/>
</matches>
</when>
<matches validationRule="anyNumber">
<match values="zero one two few many other"/>
</matches>
We should instead be able to just describe the data:
<matchSignature>
<match validationRule="anyNumber"/>
<match values="zero one two few many other"/>
</matchSignature>
<matchSignature>
<option name="select" values="plural"/>
<match locales="en" values="one other"/>
<match locales="pl" values="one few many other"/>
</matchSignature>
<matchSignature>
<option name="select" values="ordinal"/>
<match locales="en" values="one two few other"/>
<match locales="pl" values="other"/>
</matchSignature>
In fact, as I mention in #534, I don't think we even need the locales
attributes on match
elements. Compare:
<matchSignature>
<match validationRule="anyNumber"/>
<match values="zero one two few many other"/>
</matchSignature>
<matchSignature locales="en">
<option name="select" values="plural"/>
<match values="one other"/>
</matchSignature>
<matchSignature locales="en">
<option name="select" values="ordinal"/>
<match values="one two few other"/>
</matchSignature>
<matchSignature locales="pl">
<option name="select" values="plural"/>
<match values="one few many other"/>
</matchSignature>
<matchSignature locales="pl">
<option name="select" values="ordinal"/>
<match values="other"/>
</matchSignature>
@stasm I agree that the purpose of the registry data (as I've described it elsewhere) is to inform tools and such about available variant keys, but not to replicate either CLDR data or functionality about which key gets chosen when. I like the middle example with
|
I fully agree. I used plural matching for the sake of example, and because #538 is still open.
I can see the appeal of the middle form; I like it as well. The bottom one is equally expressive but perhaps goes one step too far in trying to avoid extending the registry's schema, which ends up being inconvenient. Even if we expect registries to be generated and consumed by tools, I think they may also be authored by people. I'd call the middle snippet reasonably convenient for that purpose. To further explain why I think multiple signatures are more expressive than
|
Should we hold a separate call on the registry, to align ourselves on its core user stories? We do have the registry's Goals section, but it's rather focused on end users. It leaves out library and implementation developer concerns like:
Considering these is leading me at least towards the thoughts expressed in #561, i.e. dropping the |
In the 2024-01-15 call we agreed to tag this for |
IMO, the key work for v45 review is a list of the valid function / selector
identifiers, and for each one a brief description of what it does and list
of its valid option identifiers, and a brief description of what each
option does, and (ideally) a pointer to a reference that explains the
functions/options more precisely.
I think the registry format needs considerable work, and I don't think
there is time for that review in the time remaining.
For example, we need to be able to have the validation of an option or
input format depend on an external specification: the matchSignature
and formatSignature are far too sketchy to provide much value, and cannot
really be complete without copying a great deal of information
that is complicated to get right. Even the very simple case of
minimumIntegerDigits etc cannot be specified by something as simple as
"positiveInteger", since there are interactions between options
(minimumIntegerDigits=3 and maximumIntegerDigits=2. And we don't want to
repeat all the data in, for example,
https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml,
or even just the categories, in the registry. Similarly, it is a mistake to
enforce the same valid plural categories for en plural and en ordinals;
that does not let localization tooling do what it needs to do for
expansion. The best way to do that is to have basic information in the
registry but point to external specifications for details.
…On Sun, Jan 21, 2024 at 3:01 PM Addison Phillips ***@***.***> wrote:
In the 2024-01-15 call we agreed to tag this for Future and that I would
set up a separate call about registry format.
—
Reply to this email directly, view it on GitHub
<#558 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMAI2EDZPJ3CBFINP4LYPWM3FAVCNFSM6AAAAABANW4O4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBSG44TKMJUGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Closing due to #815. |
As noted by @macchiati in #471 (comment):
Thinking about this myself, I think we need a mechanism for picking the right set of keys based on multiple option values. Polish is a decent example here:
one
,few
, andmany
one
,few
,many
andother
other
The way we have aliases set up, they set option values like
maximumFractionDigits: 0
for their parent function, so if we start solving this via aliases, then we're likely to end up in a place where:integer
and:number maximumFractionDigits=0
have different behaviour.So here I propose adding
<when>
elements to the<matchSignature>
as a way of resolving that. Continuing with the above example, they work like this:The way that should be used is that the
option
/values
combo of each<when>
is tested in order to pick a first set of<match>
to check against the current locale, and that's repeated until one of the sets provides a match.So if we start with a selector expression
{$x :integer}
, its resolved options are:Let's consider what happens with a few different locales
en
,pl
, andfr
(standing in for "any other locale"):<when option="select" values="ordinal">
does not match, so its contents are ignored.<when option="select" values="plural">
does match, let's consider its contents:<when option="maximumFractionDigits" values="0">
also matches:pl
, its set of<match>
elements does provide a Lookup match, so we use that and don't consider any later ones.en
andfr
, its set of<match>
elements does not provide a Lookup match.en
andfr
with the next set<match locales="en" ... />
,<match locales="pl" ... />
:en
we do find a Lookup match, and use that.fr
, no match at this level.fr
, we ultimately fall back to the last<match>
without alocales
which implicitly matches.So we end up with these sets of category keys to use for the selector:
en
:one other
pl
:one few many
fr
:zero one two few many other
(due to fallback)Please note that while the example here does perform selection on
:number
and that's still being discussed in #471, this approach would be equally valid on a non-alias selector like:plural
, for which being able to account for integral input values would be just as useful.