Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The spec should not force implementations to go against their standard error handling patterns #782

Closed
mihnita opened this issue May 6, 2024 · 13 comments
Assignees
Labels
errors Issues related to the errors section of the spec LDML46 LDML46 Release (Tech Preview - October 2024) resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.

Comments

@mihnita
Copy link
Collaborator

mihnita commented May 6, 2024

The current spec (spec/errors.md) says:

In all cases, when encountering a runtime error,
a message formatter MUST provide some representation of the message.
An informative error or errors MUST also be separately provided.

These are two MUST, with an "and"

This goes against the way many programming languages and frameworks handle errors.

For example ICU4J either returns a fallback string, or throws.

Rust returns a Result, with Err or Ok fields.
But it is an either / or, can't set both.

Similar for JavaScript, and in general many programming languages and frameworks.

Many platforms have mechanisms that load a string from resources and format with parameters, in one step.
With the API designed many-many years ago, and can't be changed.

For ICU Tim filed https://unicode-org.atlassian.net/browse/ICU-22762

But I think that the ICU answer to that issue is not relevant to this.


TLDR:

This is a standard that applies to all programming languages and frameworks out there, now and in the future.
It is not its job to force an implementation to do error handing against its native style.

We can recommend, and an implementation might choose to come up with a mechanism, which might or might not be the one described above. For example might provide format and formatStrict methods, which are also against the current spec.

@aphillips aphillips added Agenda+ Requested for upcoming teleconference errors Issues related to the errors section of the spec LDML46 LDML46 Release (Tech Preview - October 2024) labels May 6, 2024
@sffc
Copy link
Member

sffc commented May 6, 2024

I think it is reasonable for the spec to say that implementations must provide a way to produce the fallback message representation, but it should allow this to be done with separate functions such as format and formatStrict.

With regard to error handling, I think the spec might be doing a little far to require access to all errors simultaneously; it seems returning just one error, such as the first, would be sufficient. In ICU4X, we have a policy of returning Copy errors, which means that the layout must be known at compile time and cannot contain heap memory. We made this decision because we had data showing that expensive error types contribute negatively to both performance and code size. We would be violating this rule by being required to return a vector of errors. That said, we do now offer a mechanism, TryWriteable, to annotate error substrings via a formatToParts-like mechanism (basically, generate a parts annotation when an error occurs). Note that the annotations are not very expressive (they basically just say "an error occurred here"). It's not clear whether this behavior is conformant with the spec as written, but it seems like it should be.

In other words, I agree with the OP.

@aphillips
Copy link
Member

aphillips commented May 6, 2024

Thanks @sffc

I think there is general agreement that the first MUST requires only that there be a way to get the fallback string.

I think the spec might be doing a little far to require access to all errors simultaneously; it seems returning just one error, such as the first, would be sufficient.

I don't think we require that an implementation expose more than one error. In fact, we go so far as to recommend what should happen for implementations that don't provide multiple errors:

When a message contains more than one error, or contains some error which leads to further errors, an implementation which does not emit all of the errors SHOULD prioritise Syntax Errors and Data Model Errors over others.

The MUST that @mihnita quotes only requires that, when there is an error, the implementation make the error or error(s) accessible to the caller.

@mihnita
Copy link
Collaborator Author

mihnita commented May 7, 2024

Yes, the spec allows for an implementation to return just one error.
But (at least in my reading) the spec requires an implementation to return both a fallback string AND errors (at least one).

Most common error handling styles (that I can think of) allow one or the other, but not both.
If you got an error, then you don't get a result.

The one exception is probably ICU4C, which can also set the status to a WARNING or an ERROR.
If it is an error, you should not try to use the result.
So the proposed MF2 errors would have to be in fact reported as warnings.
It would not be very different from what it has now.

But that is not the case for a lot of other frameworks.


I am also unhappy about forcing this on everything present and future because not everybody needs this.
Exact error details might be handy for tooling, or a low level library like ICU.
But not necesarilly for a higher level API implemented on top of it.

For example in Android one can do String foo = resources.getString(stringId, arg1, arg2, arg3).
That's the API, and can be easily adapted for the string to be in MF2 syntax.
But it can't be adapted to return an error.
It is a very old API (15 years or so).
And nobody cares about that error at that level anyway.
What is one do to about it? Max you can do is log it.
And as a dev you can't access the log on someone's phone.
So asking to report the error is against the established functonality, and useless.

@aphillips
Copy link
Member

For example in Android one can do String foo = resources.getString(stringId, arg1, arg2, arg3).
That's the API, and can be easily adapted for the string to be in MF2 syntax.
But it can't be adapted to return an error.
It is a very old API (15 years or so).

Resource APIs are super interesting in this space. They sometimes error, but that error is almost always the local equivalent of MissingResourceException. The implementation we had at Amazon hid MF behind the resource manager, as do many localization support libraries.

Resource code that wraps MF2 would need to handle any errors and consume the appropriate fallback string (or errors, or both).

But (at least in my reading) the spec requires an implementation to return both a fallback string AND errors (at least one).

I agree that this is a valid reading of the current text. Please note that we discussed this in yesterday's call and there was some sentiment to "produce both". A stronger sentiment was: make concrete text suggestions (a PR) to make the spec do the right thing.

@mihnita
Copy link
Collaborator Author

mihnita commented May 7, 2024

Resource code that wraps MF2 would need to handle any errors and consume the appropriate fallback string (or errors, or both).

Right.

But for the end user, that API would "support MF2".
For all purposes, that is an implementation of MF2.

So the spec, by being so strict, declares that any such APIs are non-compliant.

@catamorphism
Copy link
Collaborator

What about replacing this text in errors.md

In all cases, when encountering a runtime error, a message formatter MUST provide some representation of the message. An informative error or errors MUST also be separately provided.

with something like the following?

Message formatters MUST provide an entry point that returns a representation of the message even if a runtime error is encountered. Message formatters MUST also provide an entry point that signals an informative error or errors, without necessarily providing a representation of the message.

This would allow (in ICU4C, for example) something like:

class MessageFormatter {
    UnicodeString formatToStringBestEffort(UErrorCode&);
    UnicodeString formatToStringStrict(UErrorCode&);
// other methods...
}

The caller can choose whether to call the first method, which ignores MessageFormat runtime errors; or the second method, which sets its error code when there's a MessageFormat runtime error. (Both would still signal syntax and data model errors.)

@mihnita -- would this satisfy your requirements?

@alerque
Copy link
Contributor

alerque commented May 8, 2024

That wording is certainly more flexible, but off the top of my head many languages are just going to implement the "best effort version" as:

function formatToStringBestEffort (args)
  return formatToStringStrict(args) or ""
end

In what circumstances could an implementation return anything other than an empty string (or the message id as a string or the error message content as a string or something to that effect)? If the formatting operation is throwing a data model error there isn't any valid formatted string to return, and as @mihnita was pointing out most programming languages do not even allow for anything else. This still feels like it's trying to foist a programming paradigm on the implementation side of things that will specifically class with most languages.

@eemeli
Copy link
Collaborator

eemeli commented May 8, 2024

@alerque For data model errors you're right, there isn't necessarily any better fallback than the message id or our ultimate fallback �. However, for runtime errors we do define fallback representations for placeholders for which formatting has failed, such as missing but expected formatting arguments, or formatting arguments with unexpected types.

This allows us to format a message like

Click {$button} to continue

with effectively the same message as its source, if $button is not defined.

@echeran
Copy link
Collaborator

echeran commented May 9, 2024

In Monday's weekly meeting, people requested ICU-TC to give feedback on this topic. There was a thorough discussion about this in today's ICU-TC meeting. Since the notes doc is public, instead of pasting the notes here inline, here is a link to the relevant discussion notes.

@echeran
Copy link
Collaborator

echeran commented May 9, 2024

Notes from today's ICU4X-WG meeting:

@Manishearth Specs that are not tied to a language should not specify how the errors get bubbled out, but they can specify the error cases. The Unicode Source Code WG spent a long time about how different compilers work out errors. You should not get into that rabbit hole, and instead talk about error signals. I like it when specs provide an ontology of errors. I would say a MessageFormatter must be able to get reasons behind the error, not necessarily the error itself. When you have the spec, you're defining the abstract operation, not the nitty gritty of things like errors. So I strongly agree with @mihnita

@sffc I like the idea of an ontology of errors. I think there are 2 questions in this thread. 1) Does the formatter support a fallback representation of a message. 2) what vehicle does it use and how those errors are structured? We support that in ICU4X via TryWriteable, and the model that we use there can be used here. According to what @aphillips said, he thinks it's not required that we return multiple errors, but I'm not entirely convinced. The spec text says "an informative error oerrors MUST also be separately provided"

@Manishearth As an implementer, I see that line and I would just shrug and ignore it. That is not the kind of reaction that you want implementers to have. It's not actionable, it's not clear what to do. But if you now say "MUST", now an implementer has to go and implement that without knowing what that means.

@echeran Could you explain a little about TryWriteable?

@sffc I don't know what "informative error" even means, but I can show what TryWriteable does

@Manishearth "Informative error" just means warning. But even that ontology of "informative error" is fraught. With the Source Code WG, we had that issue. Different compilers even for the same language do things differently, let alone for other languages.

@sffc TryWriteable Rustdoc.

Example:

// Failure case, including the ERROR part:
writeable::assert_try_writeable_parts_eq!(
    HelloWorldWriteable { name: None },
    "Hello, nobody!",
    Err(HelloWorldWriteableError::MissingName),
    [(7, 13, writeable::Part::ERROR)]
);

In the above example, the name placeholder is missing, so "nobody" is printed out, and an error is returned, both as a typed error and a part span. If there are multiple errors, multiple spans will be annotated, but only the first error will be returned as an upgraded type. It is expensive in Rust to return multiple errors because that would require allocating memory.

@aphillips aphillips added Action-Item Action item assigned by the WG and removed Agenda+ Requested for upcoming teleconference labels May 13, 2024
@aphillips
Copy link
Member

@mihnita to file a pr

@eemeli
Copy link
Collaborator

eemeli commented Sep 7, 2024

I'm pretty sure that this was resolved by #804.

@aphillips
Copy link
Member

Closing because it appears to have been implemented in the spec. Reopen if there is disagreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
errors Issues related to the errors section of the spec LDML46 LDML46 Release (Tech Preview - October 2024) resolve-candidate This issue appears to have been answered or resolved, and may be closed soon.
Projects
None yet
Development

No branches or pull requests

7 participants