Remove markdown from API error messages #117

eliperelman · 2018-04-03T12:08:27Z

This thought arose from a recent bug, and it just got me wondering if this would be a good idea or not. Right now the APIs return error messages rich with information in Markdown format. This is mostly useful only if you have a Markdown renderer, the only one of which I know lives in Tools. This means that the site has less control of how to render error messages than if it knew how to handle individual error types itself. Not to mention, it is also one less XSS injection point we would have to worry about if we weren't rendering the Markdown from these messages.

Something interesting I have seen React do lately is shipping error code handlers in production, which generates a link that consumers can follow for an expanded error message (example error message):

https://reactjs.org/docs/error-decoder.html?invariant=130&args[]=undefined&args[]=

This keeps the payload small, and also gives control of rendering the messages back to the UI, at the expense of having to follow this step to see the full error context. This is not a problem for the Tools site since it can render whatever messages it wants directly. This would be more of a problem for other consumers that would log these errors manually.

Thoughts? What are the pros and cons of doing something like this?

djmitche · 2018-04-03T12:28:44Z

Even in a non-UI context, the markdown still "looks nice", or at least the idea of the design was that someone else could later make it look nice. In particular, we have lists of scopes which are horrifying when line-wrapped together but readable when formatted as a bulleted list. Still, right now the only other formatted data is the scope expression, and that's just an indented JSON dump.

So potentially the error messages could be text, with the expectation that it would be rendered in a monospace font, or at least that newlines and leading space would be respected. We wouldn't lose any useful expressiveness.

eliperelman · 2018-04-03T12:32:56Z

Another drive-by comment: if the Tools site wanted to display errors in its own way, right now that would be painful to do since it would mean parsing the Markdown for error messages. There isn't (at least that I know of) a way to get structured information about an error other than parsing the message.

djmitche · 2018-04-03T12:39:51Z

Sure there is

djmitche · 2018-04-03T23:11:15Z

Hm, reading the first comment again.. we do provide a details field in the response body for an error, but we don't make any commitments about what data is there, and the same code can have different details depending on the context (for example, 400 Bad Input). So teaching tools about all of those errors would require standardizing and documenting them in the references first, which would be quite a job.

We need to provide useful error messages for consumers of our APIs, which are usually not the tools site. So we still need to include a user-readable message in the response body.

So, I think the current arrangement of including a message along with the details and error.code works nicely.

There is at least one case where we can do a little better in tools (403's) and we're already doing so. It's possible we could do better with schema validation errors, too, but that wouldn't be easy. I don't know if any other errors would benefit at all.

jonasfj · 2018-04-04T22:34:07Z

This means that the site has less control of how to render error messages than if it knew how to handle individual error types itself.

I disagree, you can handle the errors just fine, if you want to... It's true that we haven't formalized everything.

Currently, errors aren't formally part of the references, but we have a very strong convention that errors are on the form: {code: '<errorCode>', message: '<markdown>', details: {<free-form>}} (maybe we even documented it)

All variables injected into the <markdown> messages are accessible from the details structure.
Take a look a the code.

But it's correct that we could:
A) publish a list of <errorCode> in references, and,
B) publish a list of details properties available for a given <errorCode>

If you want to take on these formalization tasks I'm very supportive of any such efforts.

But I note that there is probably a lot of clients that don't want to handle all error messages.
For those clients it's nice that there is a markdown message which can be displayed as plain text or formatted html...

I think the errors for which it's interesting to make custom error messages are largely:

AuthenticationFailed
InsufficientScopes
InputValidationError
ResourceNotFound

For most of the other messages I think most clients just wants a generic error message, where markdown is pretty decent. But if you want to formalize what details certain error codes should carry that would be cool :)

jonasfj · 2018-04-04T22:35:08Z

I suggest closing this issue and instead focus on further formalization of errors taskcluster-lib-api.

IMO, it's not worth formalizing all errors and it's very inflexible, but formalizing that some properties are required in details for some errors would be neat.

jonasfj · 2018-04-04T22:36:05Z

Also publishing a schema for error messages from tc-lib-api would be cool.

djmitche · 2018-04-04T22:43:20Z

Currently, errors aren't formally part of the references, but we have a very strong convention that errors are on the form: {code: '', message: '', details: {}} (maybe we even documented it)

This is documented; the "free-form" is the part that's not documented.

eliperelman · 2018-04-05T02:48:59Z

I disagree, you can handle the errors just fine,

I meant by following the Markdown strictly. This is the API dictating UI, rather than the UI dictating it based on the error data.

Currently, errors aren't formally part of the references, but we have a very strong convention that errors are on the form: {code: '<errorCode>', message: '<markdown>', details: {<free-form>}} (maybe we even documented it)

Right, I forgot we had the code and details portion. In the case that the UI wants to have more control over how errors are rendered, we would essentially have to ignore the message.

djmitche · 2018-04-05T03:31:31Z

I don't think that would be a good idea. We don't have a "registry" of error messages, nor do we want to have to document every new error and implement it in tools (and all the clients?). Jonas pointed out a few of the more common errors where I, too, would be willing to document and commit to particular properties for details, and if we were to do so then it would be OK for the tools site to treat those specially (still falling back to the show-the-message approach when those details are missing, to allow backward- and forward-compatibility). Two of the things that makes me willing for those errors is that they are generated by a library common to all services, and that they are fairly commonly seen by users.

But take https://github.com/taskcluster/taskcluster-hooks/blob/ff9fb7b88cb213fdf5b1fff0870525196319758d/src/v1.js#L234-L241 for example. That has code InputError, but is kind of ridiculously specific to hooks and the cron-parser input format. The resulting message is self-contained, and i can't see any benefit to documenting it -- the programmatic meaning is just what it says: there was an error in in the input.

This is the API dictating UI

That is intentional - it's how Taskcluster is designed. Fundamentally Taskcluster is a service with well-defined APIs, and the browser UI is one of many consumers of that API. That's not an absolutist position: there's room for compromise to make tools more than a "taskcluster-client-by-pointing-and-clicking". For example, it's OK to add API methods to support that such as a summarizeTaskGroup API endpoint that lists only taskIds and statuses, to make the task group browser more efficient.. But fundamentally the API is always first. I suspect this difference in perspective is behind a lot of misunderstandings within the team!

I would not want to get into a situation where the error message that a user of the API sees is less informative than that they see in the tools UI. I'm comfortable with the tools UI special-casing some of the more common errors that contain big blobs of data (InsufficientScopes is the big offender IMHO), so that it can represent that data more appropriately onscreen, but that can't come at the expense of understandability of that error by a "normal" API consumer.

djmitche · 2018-04-05T03:43:37Z

(by "that" in the first sentence, I was referring to "..ignore the message")

eliperelman · 2018-04-05T17:06:58Z

Most of my points were me playing devil's advocate, so I pretty much agree with everything said. The one part where I disagree is that API should dictate UI, which is orthogonal to this issue.

That is intentional - it's how Taskcluster is designed.

I don't think this is true, because...

Fundamentally Taskcluster is a service with well-defined APIs, and the browser UI is one of many consumers of that API.

...this would be a conflation of API with consumers of the API. If that were really true, then any consumer of an API would have a similar UI for a given method, and this just isn't the case.

But fundamentally the API is always first.

I agree. I'm not saying we should change parts of the API to make the front-end always happy at the expense of positive parts of the API.

What I am saying I'll try to elaborate on via some examples.

Just because an API returns a list of entities, does not mean that those entities should always be displayed in a table.
Just because an API returns a Markdown list, does not mean that the UI should be forced to render a bunch of bullet points.

Does that makes sense? For relevancy back to this issue, I'm just saying it would be nice to have more control over how some of these things get rendered without having to parse Markdown.

TLDR; I think good APIs inform UIs, but shouldn't dictate UX.

djmitche · 2018-04-05T19:46:43Z

then any consumer of an API would have a similar UI for a given method, and this just isn't the case.

I don't understand what that means. What is the "UI" for a decision task, or a Bitbucket app that runs tasks when specially-formatted comments are added to a merge request?

I think my concern with errors is that if the UI wants to understand them better to be able to render them differently, then we need to document and commit to maintaining that additional kind of detail (error code names, contents of details, or whatever else we would add). And we would need to commit to implementing parallel logic in every other consumer of the API. And we could need to commit to updating all of that logic, in a compatible way, whenever the errors changed.

Maybe it would help to be concrete. I know you used it as an example, but "an error occurred and I am forced to use bullet points" isn't very concrete. What error? How would you like to represent it? Maybe ultimately all that's required here is to modify the code that generates a few errors to generate better messages. Or maybe we only commit to the format of a few specific errors.

It might be good to have a look at how these errors are generated. You can find them as res.reportError calls in the services. Many of the more commonly-encountered errors are from calls to that function are in taskcluster-lib-api.

eliperelman · 2018-04-05T21:12:27Z

I don't understand what that means.

I know, I'm not doing a very good job of articulating what I mean. 😞

What is the "UI" for a decision task, or a Bitbucket app that runs tasks when specially-formatted comments are added to a merge request?

The UI is different for every UI consumer. The CLI can show different messages in the terminal based on API responses. Any number of web apps that consume an API's response may choose completely different ways to visualize that response. What's important is that they have access to the data.

I think my concern with errors is that if the UI wants to understand them better to be able to render them differently

I honestly don't care about understanding them better. All I really care about is that the information is accessible to control rendering. These may be the same concern.

Maybe it would help to be concrete. I know you used it as an example, but "an error occurred and I am forced to use bullet points" isn't very concrete. What error?

Take the error in https://bugzilla.mozilla.org/show_bug.cgi?id=1449848. It uses Markdown in the error response, which could feasibly use any kind of layout Markdown specifies, like tables, bullets, anchors, etc. Some of that is cool, but other times you may want to not render a table when a table was given to you. If we have access to the raw data, then that's nice and we can render whatever we want based on it. When it's embedded in Markdown, we would either just render the Markdown, or parse it and extract what's needed.

It might be good to have a look at how these errors are generated.

For me this discussion is more about bikeshedding than to be prescriptive about direction. I don't foresee us doing a lot of work to not render error messages using Markdown, hence why I only opened this as an RFC. 😄

jonasfj · 2018-04-05T22:15:44Z

My two cents is the introduction of code, details and message, was intended to ensure a best-effort contract that could be formalized further.

If tc-tools start handling InsufficientScopeErrors and ignored the message that's fine. While we haven't strictly committed to certain properties in details, best-effort might be good enough...

Or a UI like tools could check if the error has an expected form, code helps enable this, and in such case ignore the message and use special logic...

In order words when tc-tools doesn't know how to handle an error it can always default to using the markdown.

All of this said, defining JSON schemas for a subset of error codes and validating those would be super nice. I think that's the next natural step in making error handling more reliable. IMO tc-tools can already do a best effort handling that relies on informal convention.

True, API shouldn't dictate UI/UX. But in some cases the fallback has to be some generic error object that can be displayed, markdown is a good choice. Formalizing some error codes is a good step in the right direction though. Formalizing all error codes it's probably too verbose, and breaks the API promise whenever we introduce a new error code (as previous consumers won't know how to handle it).

jhford · 2018-04-10T15:05:15Z

Have we done a cost/benefit analysis on the cost of slightly larger error messages vs. the engineering effort to implement all the required stuff? Having markdown in the error messages, without processing it's mostly readable formatting and with rendering it can be pretty.

eliperelman · 2018-04-11T14:08:32Z

Have we done a cost/benefit analysis on the cost of slightly larger error messages vs. the engineering effort to implement all the required stuff?

No, I only raised this RFC for discussion due to a bug.

Having markdown in the error messages, without processing it's mostly readable formatting and with rendering it can be pretty.

I agree. The tradeoff is needing to be more careful about what gets put into the Markdown from a security perspective, and the lack of ability to control the rendering, as mentioned at #117 (comment).

djmitche · 2018-04-20T14:53:36Z

I'm still working on fixing the isshe. As you can see, I haven't been successful yet!

When that's done, I think the markdown rendering will be better.

Possible directions to take this RFC:

define a few error codes more precisely, so that they can be interpreted by the UI
make slightly richer error data structures with a title, summary, details, icon, etc.
nothing (close the RFC)

@eliperelman what's your feeling?

eliperelman · 2018-04-21T12:16:02Z

I like your first 2 bullet points. 😀

jhford · 2018-04-23T10:23:59Z

I like the idea of having something a richer error data structure, for example:

{
  "code": "MissingWorkerType",
  "level": "info" | "warning" | "error",
  "msg": "Worker Type not found",
  "longMsg": "The provisioner tried to load a worker type called 'testworkertype' but it was not found",
  "context": {"workerType": "testWorkerType", "provisionerId": "aws-provisioner-v1"},
  "details": "'Error: for-stack\\n    at repl:1:1\n    at ContextifyScript.Script.runInThisContext (vm.js:50:33)\n    at REPLServer.defaultEval (repl.js:240:29)\n    at bound (domain.js:301:14)\n    at REPLServer.runBound [as eval] (domain.js:314:12)\n    at REPLServer.onLine (repl.js:441:10)\n    at emitOne (events.js:121:20)\n    at REPLServer.emit (events.js:211:7)\n    at REPLServer.Interface._onLine (readline.js:282:10)\n    at REPLServer.Interface._line (readline.js:631:8)'"
}

This would be nice because we could let the error-creator be responsible for figuring out the details of the error without having to figure out the UI and it makes it possible to deploy new errors without having to have every error code have a matching tools.tc.net deployment.

In this structure, my idea was that the properties would be interpreted as follows:

code: (mandatory) this is the error code, as we have now
level: (defaults to error) this could be used in UI to decide whether to have a green, orange or red tooltip.
msg: (mandatory) this is a really brief message, e.g. for a tooltip over a red error marker
longMsg: (optional) this is a longer, but still prose description of the problem. This could be used in a details modal popup, with the context and details being hidden until requested
context: (optional) mapping of useful debugging information, could be drawn in a table in the modal popup.
details: (optional) only ever rendered in a <pre> element, without any markdown rendering. Markdown is still possible, but we would never render it, either client or server side. This could be used in the same modal as long, but with a show/hide "details" button. Ideally, lib-api would do escaping to ensure safety, but the client side should also do this.

I'm a little rusty with frontend stuff, otherwise I'd write up a quick p-o-c.

jonasfj · 2018-04-23T10:41:26Z

I think what we have is pretty good. Adding too many fields won't help making them useful.
I could see adding context as a table, and formalizing details for some error codes.
But I love the generality of saying that message is markdown, because this ensures that the frontend doesn't need to know every possible error code.

If anyone wants to bikeshed this I think we should setup a vidyo call.

Who is interested? (I am)

jhford · 2018-04-23T10:51:00Z

Let's not define behaviour for specific error codes, since that would require frontend changes to alter behaviour of each error code. Instead, why don't we define extra values which the UI can decide to use as appropriate to render better errors? I think there's clearly room for adding more information -- as evidenced by this RFC existing. The crux of the matter is where we put that information.

Do we encode information about errors it in the error message and make the UI data-driven or do we encode them in the UI and make the UI and services tightly coupled.

If we make the message markdown and add nothing, there's basically nothing between an extremely terse "NoWorkerTypeFound" and an entire screen of stack traces. Based on my personal usage of our system, I really do think there's some room for a longer form explanation that's short of a full stack trace.

In the absence of a longMsg field, the UI could just skip rendering that and show the details. Basically, in this proposal, the UI should be able to render something based on just having a message and code, but will give us better errors when we give it better data.

Since the details field is likely only going to be a stack trace anyway, why not skip the effort of getting all the rendering perfect and just stick them in <pre>${error.details}</pre>?

jonasfj · 2018-04-23T11:15:32Z

minimized ramblings, but I note

If we make the message markdown and add nothing, there's basically nothing between an extremely terse "NoWorkerTypeFound" and an entire screen of stack traces.

I never expose stack traces. Internal error should return an incidentId. I assume this was just a bad example :)

jonasfj · 2018-04-23T13:12:19Z

So jhford and I talk a bit about this... And maybe what we need is another layer between code and message, like a summary that's a single line summary of what when wrong.

Maybe, this is where we should ask Eli what he wants :)

djmitche · 2018-04-23T13:22:13Z

We're already careful not to produce stack traces in production, and we should keep doing so :)

So, there are a few partially overlapping approaches on the table here:

keep status quo
add fields to all errors to support better presentation (e.g., code/level/msg/longMesage/context/details)
formalize the details of some error codes so that frontends can interpret them better (e.g., InsufficientScopes)
formalize the details of all error codes and require frontends to interpret them

The first and last options seem quite unpopular. Perhaps there's a middle ground between the second and third options? Maybe fewer additional fields, and only one or two errors formalized?

djmitche · 2018-04-23T13:22:52Z

Oh, yes, I didn't see jonas's last comment -- sounds great :)

djmitche mentioned this issue Apr 3, 2018

Bug 1449848 - Bugfix for JSON substitutions (+ prettify) taskcluster/taskcluster-lib-api#92

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove markdown from API error messages #117

Remove markdown from API error messages #117

eliperelman commented Apr 3, 2018 •

edited

Loading

djmitche commented Apr 3, 2018

eliperelman commented Apr 3, 2018

djmitche commented Apr 3, 2018

djmitche commented Apr 3, 2018

jonasfj commented Apr 4, 2018

jonasfj commented Apr 4, 2018

jonasfj commented Apr 4, 2018

djmitche commented Apr 4, 2018 •

edited

Loading

eliperelman commented Apr 5, 2018

djmitche commented Apr 5, 2018

djmitche commented Apr 5, 2018

eliperelman commented Apr 5, 2018

djmitche commented Apr 5, 2018

eliperelman commented Apr 5, 2018

jonasfj commented Apr 5, 2018

jhford commented Apr 10, 2018

eliperelman commented Apr 11, 2018

djmitche commented Apr 20, 2018

eliperelman commented Apr 21, 2018

jhford commented Apr 23, 2018

jonasfj commented Apr 23, 2018

jhford commented Apr 23, 2018

jonasfj commented Apr 23, 2018 •

edited

Loading

jonasfj commented Apr 23, 2018

djmitche commented Apr 23, 2018

djmitche commented Apr 23, 2018

Remove markdown from API error messages #117

Remove markdown from API error messages #117

Comments

eliperelman commented Apr 3, 2018 • edited Loading

djmitche commented Apr 3, 2018

eliperelman commented Apr 3, 2018

djmitche commented Apr 3, 2018

djmitche commented Apr 3, 2018

jonasfj commented Apr 4, 2018

jonasfj commented Apr 4, 2018

jonasfj commented Apr 4, 2018

djmitche commented Apr 4, 2018 • edited Loading

eliperelman commented Apr 5, 2018

djmitche commented Apr 5, 2018

djmitche commented Apr 5, 2018

eliperelman commented Apr 5, 2018

djmitche commented Apr 5, 2018

eliperelman commented Apr 5, 2018

jonasfj commented Apr 5, 2018

jhford commented Apr 10, 2018

eliperelman commented Apr 11, 2018

djmitche commented Apr 20, 2018

eliperelman commented Apr 21, 2018

jhford commented Apr 23, 2018

jonasfj commented Apr 23, 2018

jhford commented Apr 23, 2018

jonasfj commented Apr 23, 2018 • edited Loading

jonasfj commented Apr 23, 2018

djmitche commented Apr 23, 2018

djmitche commented Apr 23, 2018

eliperelman commented Apr 3, 2018 •

edited

Loading

djmitche commented Apr 4, 2018 •

edited

Loading

jonasfj commented Apr 23, 2018 •

edited

Loading