Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove markdown from API error messages #117

Open
eliperelman opened this issue Apr 3, 2018 · 26 comments
Open

Remove markdown from API error messages #117

eliperelman opened this issue Apr 3, 2018 · 26 comments

Comments

@eliperelman
Copy link

eliperelman commented Apr 3, 2018

This thought arose from a recent bug, and it just got me wondering if this would be a good idea or not. Right now the APIs return error messages rich with information in Markdown format. This is mostly useful only if you have a Markdown renderer, the only one of which I know lives in Tools. This means that the site has less control of how to render error messages than if it knew how to handle individual error types itself. Not to mention, it is also one less XSS injection point we would have to worry about if we weren't rendering the Markdown from these messages.

Something interesting I have seen React do lately is shipping error code handlers in production, which generates a link that consumers can follow for an expanded error message (example error message):

screen shot 2018-04-03 at 7 03 48 am

https://reactjs.org/docs/error-decoder.html?invariant=130&args[]=undefined&args[]=

This keeps the payload small, and also gives control of rendering the messages back to the UI, at the expense of having to follow this step to see the full error context. This is not a problem for the Tools site since it can render whatever messages it wants directly. This would be more of a problem for other consumers that would log these errors manually.

Thoughts? What are the pros and cons of doing something like this?

@djmitche
Copy link
Contributor

djmitche commented Apr 3, 2018

Even in a non-UI context, the markdown still "looks nice", or at least the idea of the design was that someone else could later make it look nice. In particular, we have lists of scopes which are horrifying when line-wrapped together but readable when formatted as a bulleted list. Still, right now the only other formatted data is the scope expression, and that's just an indented JSON dump.

So potentially the error messages could be text, with the expectation that it would be rendered in a monospace font, or at least that newlines and leading space would be respected. We wouldn't lose any useful expressiveness.

@eliperelman
Copy link
Author

Another drive-by comment: if the Tools site wanted to display errors in its own way, right now that would be painful to do since it would mean parsing the Markdown for error messages. There isn't (at least that I know of) a way to get structured information about an error other than parsing the message.

@djmitche
Copy link
Contributor

djmitche commented Apr 3, 2018

Sure there is

@djmitche
Copy link
Contributor

djmitche commented Apr 3, 2018

Hm, reading the first comment again.. we do provide a details field in the response body for an error, but we don't make any commitments about what data is there, and the same code can have different details depending on the context (for example, 400 Bad Input). So teaching tools about all of those errors would require standardizing and documenting them in the references first, which would be quite a job.

We need to provide useful error messages for consumers of our APIs, which are usually not the tools site. So we still need to include a user-readable message in the response body.

So, I think the current arrangement of including a message along with the details and error.code works nicely.

There is at least one case where we can do a little better in tools (403's) and we're already doing so. It's possible we could do better with schema validation errors, too, but that wouldn't be easy. I don't know if any other errors would benefit at all.

@jonasfj
Copy link

jonasfj commented Apr 4, 2018

This means that the site has less control of how to render error messages than if it knew how to handle individual error types itself.

I disagree, you can handle the errors just fine, if you want to... It's true that we haven't formalized everything.


Currently, errors aren't formally part of the references, but we have a very strong convention that errors are on the form: {code: '<errorCode>', message: '<markdown>', details: {<free-form>}} (maybe we even documented it)

All variables injected into the <markdown> messages are accessible from the details structure.
Take a look a the code.

But it's correct that we could:
A) publish a list of <errorCode> in references, and,
B) publish a list of details properties available for a given <errorCode>

If you want to take on these formalization tasks I'm very supportive of any such efforts.


But I note that there is probably a lot of clients that don't want to handle all error messages.
For those clients it's nice that there is a markdown message which can be displayed as plain text or formatted html...

I think the errors for which it's interesting to make custom error messages are largely:

  • AuthenticationFailed
  • InsufficientScopes
  • InputValidationError
  • ResourceNotFound

For most of the other messages I think most clients just wants a generic error message, where markdown is pretty decent. But if you want to formalize what details certain error codes should carry that would be cool :)

@jonasfj
Copy link

jonasfj commented Apr 4, 2018

I suggest closing this issue and instead focus on further formalization of errors taskcluster-lib-api.

IMO, it's not worth formalizing all errors and it's very inflexible, but formalizing that some properties are required in details for some errors would be neat.

@jonasfj
Copy link

jonasfj commented Apr 4, 2018

Also publishing a schema for error messages from tc-lib-api would be cool.

@djmitche
Copy link
Contributor

djmitche commented Apr 4, 2018

Currently, errors aren't formally part of the references, but we have a very strong convention that errors are on the form: {code: '', message: '', details: {}} (maybe we even documented it)

This is documented; the "free-form" is the part that's not documented.

@eliperelman
Copy link
Author

I disagree, you can handle the errors just fine,

I meant by following the Markdown strictly. This is the API dictating UI, rather than the UI dictating it based on the error data.

Currently, errors aren't formally part of the references, but we have a very strong convention that errors are on the form: {code: '<errorCode>', message: '<markdown>', details: {<free-form>}} (maybe we even documented it)

Right, I forgot we had the code and details portion. In the case that the UI wants to have more control over how errors are rendered, we would essentially have to ignore the message.

@djmitche
Copy link
Contributor

djmitche commented Apr 5, 2018

I don't think that would be a good idea. We don't have a "registry" of error messages, nor do we want to have to document every new error and implement it in tools (and all the clients?). Jonas pointed out a few of the more common errors where I, too, would be willing to document and commit to particular properties for details, and if we were to do so then it would be OK for the tools site to treat those specially (still falling back to the show-the-message approach when those details are missing, to allow backward- and forward-compatibility). Two of the things that makes me willing for those errors is that they are generated by a library common to all services, and that they are fairly commonly seen by users.

But take https://github.com/taskcluster/taskcluster-hooks/blob/ff9fb7b88cb213fdf5b1fff0870525196319758d/src/v1.js#L234-L241 for example. That has code InputError, but is kind of ridiculously specific to hooks and the cron-parser input format. The resulting message is self-contained, and i can't see any benefit to documenting it -- the programmatic meaning is just what it says: there was an error in in the input.

This is the API dictating UI

That is intentional - it's how Taskcluster is designed. Fundamentally Taskcluster is a service with well-defined APIs, and the browser UI is one of many consumers of that API. That's not an absolutist position: there's room for compromise to make tools more than a "taskcluster-client-by-pointing-and-clicking". For example, it's OK to add API methods to support that such as a summarizeTaskGroup API endpoint that lists only taskIds and statuses, to make the task group browser more efficient.. But fundamentally the API is always first. I suspect this difference in perspective is behind a lot of misunderstandings within the team!

I would not want to get into a situation where the error message that a user of the API sees is less informative than that they see in the tools UI. I'm comfortable with the tools UI special-casing some of the more common errors that contain big blobs of data (InsufficientScopes is the big offender IMHO), so that it can represent that data more appropriately onscreen, but that can't come at the expense of understandability of that error by a "normal" API consumer.

@djmitche
Copy link
Contributor

djmitche commented Apr 5, 2018

(by "that" in the first sentence, I was referring to "..ignore the message")

@eliperelman
Copy link
Author

Most of my points were me playing devil's advocate, so I pretty much agree with everything said. The one part where I disagree is that API should dictate UI, which is orthogonal to this issue.

That is intentional - it's how Taskcluster is designed.

I don't think this is true, because...

Fundamentally Taskcluster is a service with well-defined APIs, and the browser UI is one of many consumers of that API.

...this would be a conflation of API with consumers of the API. If that were really true, then any consumer of an API would have a similar UI for a given method, and this just isn't the case.

But fundamentally the API is always first.

I agree. I'm not saying we should change parts of the API to make the front-end always happy at the expense of positive parts of the API.

What I am saying I'll try to elaborate on via some examples.

  • Just because an API returns a list of entities, does not mean that those entities should always be displayed in a table.
  • Just because an API returns a Markdown list, does not mean that the UI should be forced to render a bunch of bullet points.

Does that makes sense? For relevancy back to this issue, I'm just saying it would be nice to have more control over how some of these things get rendered without having to parse Markdown.

TLDR; I think good APIs inform UIs, but shouldn't dictate UX.

@djmitche
Copy link
Contributor

djmitche commented Apr 5, 2018

then any consumer of an API would have a similar UI for a given method, and this just isn't the case.

I don't understand what that means. What is the "UI" for a decision task, or a Bitbucket app that runs tasks when specially-formatted comments are added to a merge request?

I think my concern with errors is that if the UI wants to understand them better to be able to render them differently, then we need to document and commit to maintaining that additional kind of detail (error code names, contents of details, or whatever else we would add). And we would need to commit to implementing parallel logic in every other consumer of the API. And we could need to commit to updating all of that logic, in a compatible way, whenever the errors changed.

Maybe it would help to be concrete. I know you used it as an example, but "an error occurred and I am forced to use bullet points" isn't very concrete. What error? How would you like to represent it? Maybe ultimately all that's required here is to modify the code that generates a few errors to generate better messages. Or maybe we only commit to the format of a few specific errors.

It might be good to have a look at how these errors are generated. You can find them as res.reportError calls in the services. Many of the more commonly-encountered errors are from calls to that function are in taskcluster-lib-api.

@eliperelman
Copy link
Author

I don't understand what that means.

I know, I'm not doing a very good job of articulating what I mean. 😞

What is the "UI" for a decision task, or a Bitbucket app that runs tasks when specially-formatted comments are added to a merge request?

The UI is different for every UI consumer. The CLI can show different messages in the terminal based on API responses. Any number of web apps that consume an API's response may choose completely different ways to visualize that response. What's important is that they have access to the data.

I think my concern with errors is that if the UI wants to understand them better to be able to render them differently

I honestly don't care about understanding them better. All I really care about is that the information is accessible to control rendering. These may be the same concern.

Maybe it would help to be concrete. I know you used it as an example, but "an error occurred and I am forced to use bullet points" isn't very concrete. What error?

Take the error in https://bugzilla.mozilla.org/show_bug.cgi?id=1449848. It uses Markdown in the error response, which could feasibly use any kind of layout Markdown specifies, like tables, bullets, anchors, etc. Some of that is cool, but other times you may want to not render a table when a table was given to you. If we have access to the raw data, then that's nice and we can render whatever we want based on it. When it's embedded in Markdown, we would either just render the Markdown, or parse it and extract what's needed.

It might be good to have a look at how these errors are generated.

For me this discussion is more about bikeshedding than to be prescriptive about direction. I don't foresee us doing a lot of work to not render error messages using Markdown, hence why I only opened this as an RFC. 😄

@jonasfj
Copy link

jonasfj commented Apr 5, 2018

My two cents is the introduction of code, details and message, was intended to ensure a best-effort contract that could be formalized further.

If tc-tools start handling InsufficientScopeErrors and ignored the message that's fine. While we haven't strictly committed to certain properties in details, best-effort might be good enough...

Or a UI like tools could check if the error has an expected form, code helps enable this, and in such case ignore the message and use special logic...

In order words when tc-tools doesn't know how to handle an error it can always default to using the markdown.

All of this said, defining JSON schemas for a subset of error codes and validating those would be super nice. I think that's the next natural step in making error handling more reliable. IMO tc-tools can already do a best effort handling that relies on informal convention.


True, API shouldn't dictate UI/UX. But in some cases the fallback has to be some generic error object that can be displayed, markdown is a good choice. Formalizing some error codes is a good step in the right direction though. Formalizing all error codes it's probably too verbose, and breaks the API promise whenever we introduce a new error code (as previous consumers won't know how to handle it).

@jhford
Copy link
Contributor

jhford commented Apr 10, 2018

Have we done a cost/benefit analysis on the cost of slightly larger error messages vs. the engineering effort to implement all the required stuff? Having markdown in the error messages, without processing it's mostly readable formatting and with rendering it can be pretty.

@eliperelman
Copy link
Author

Have we done a cost/benefit analysis on the cost of slightly larger error messages vs. the engineering effort to implement all the required stuff?

No, I only raised this RFC for discussion due to a bug.

Having markdown in the error messages, without processing it's mostly readable formatting and with rendering it can be pretty.

I agree. The tradeoff is needing to be more careful about what gets put into the Markdown from a security perspective, and the lack of ability to control the rendering, as mentioned at #117 (comment).

@djmitche
Copy link
Contributor

I'm still working on fixing the isshe. As you can see, I haven't been successful yet!

screenshot from 2018-04-19 17-48-49

When that's done, I think the markdown rendering will be better.

Possible directions to take this RFC:

  • define a few error codes more precisely, so that they can be interpreted by the UI
  • make slightly richer error data structures with a title, summary, details, icon, etc.
  • nothing (close the RFC)

@eliperelman what's your feeling?

@eliperelman
Copy link
Author

I like your first 2 bullet points. 😀

@jhford
Copy link
Contributor

jhford commented Apr 23, 2018

I like the idea of having something a richer error data structure, for example:

{
  "code": "MissingWorkerType",
  "level": "info" | "warning" | "error",
  "msg": "Worker Type not found",
  "longMsg": "The provisioner tried to load a worker type called 'testworkertype' but it was not found",
  "context": {"workerType": "testWorkerType", "provisionerId": "aws-provisioner-v1"},
  "details": "'Error: for-stack\\n    at repl:1:1\n    at ContextifyScript.Script.runInThisContext (vm.js:50:33)\n    at REPLServer.defaultEval (repl.js:240:29)\n    at bound (domain.js:301:14)\n    at REPLServer.runBound [as eval] (domain.js:314:12)\n    at REPLServer.onLine (repl.js:441:10)\n    at emitOne (events.js:121:20)\n    at REPLServer.emit (events.js:211:7)\n    at REPLServer.Interface._onLine (readline.js:282:10)\n    at REPLServer.Interface._line (readline.js:631:8)'"
}

This would be nice because we could let the error-creator be responsible for figuring out the details of the error without having to figure out the UI and it makes it possible to deploy new errors without having to have every error code have a matching tools.tc.net deployment.

In this structure, my idea was that the properties would be interpreted as follows:

  • code: (mandatory) this is the error code, as we have now
  • level: (defaults to error) this could be used in UI to decide whether to have a green, orange or red tooltip.
  • msg: (mandatory) this is a really brief message, e.g. for a tooltip over a red error marker
  • longMsg: (optional) this is a longer, but still prose description of the problem. This could be used in a details modal popup, with the context and details being hidden until requested
  • context: (optional) mapping of useful debugging information, could be drawn in a table in the modal popup.
  • details: (optional) only ever rendered in a <pre> element, without any markdown rendering. Markdown is still possible, but we would never render it, either client or server side. This could be used in the same modal as long, but with a show/hide "details" button. Ideally, lib-api would do escaping to ensure safety, but the client side should also do this.

I'm a little rusty with frontend stuff, otherwise I'd write up a quick p-o-c.

@jonasfj
Copy link

jonasfj commented Apr 23, 2018

I think what we have is pretty good. Adding too many fields won't help making them useful.
I could see adding context as a table, and formalizing details for some error codes.
But I love the generality of saying that message is markdown, because this ensures that the frontend doesn't need to know every possible error code.

If anyone wants to bikeshed this I think we should setup a vidyo call.

Who is interested? (I am)

@jhford
Copy link
Contributor

jhford commented Apr 23, 2018

Let's not define behaviour for specific error codes, since that would require frontend changes to alter behaviour of each error code. Instead, why don't we define extra values which the UI can decide to use as appropriate to render better errors? I think there's clearly room for adding more information -- as evidenced by this RFC existing. The crux of the matter is where we put that information.

Do we encode information about errors it in the error message and make the UI data-driven or do we encode them in the UI and make the UI and services tightly coupled.

If we make the message markdown and add nothing, there's basically nothing between an extremely terse "NoWorkerTypeFound" and an entire screen of stack traces. Based on my personal usage of our system, I really do think there's some room for a longer form explanation that's short of a full stack trace.

In the absence of a longMsg field, the UI could just skip rendering that and show the details. Basically, in this proposal, the UI should be able to render something based on just having a message and code, but will give us better errors when we give it better data.

Since the details field is likely only going to be a stack trace anyway, why not skip the effort of getting all the rendering perfect and just stick them in <pre>${error.details}</pre>?

@jonasfj
Copy link

jonasfj commented Apr 23, 2018

minimized ramblings, but I note

If we make the message markdown and add nothing, there's basically nothing between an extremely terse "NoWorkerTypeFound" and an entire screen of stack traces.

I never expose stack traces. Internal error should return an incidentId. I assume this was just a bad example :)

@jonasfj
Copy link

jonasfj commented Apr 23, 2018

So jhford and I talk a bit about this... And maybe what we need is another layer between code and message, like a summary that's a single line summary of what when wrong.

Maybe, this is where we should ask Eli what he wants :)

@djmitche
Copy link
Contributor

We're already careful not to produce stack traces in production, and we should keep doing so :)

So, there are a few partially overlapping approaches on the table here:

  • keep status quo
  • add fields to all errors to support better presentation (e.g., code/level/msg/longMesage/context/details)
  • formalize the details of some error codes so that frontends can interpret them better (e.g., InsufficientScopes)
  • formalize the details of all error codes and require frontends to interpret them

The first and last options seem quite unpopular. Perhaps there's a middle ground between the second and third options? Maybe fewer additional fields, and only one or two errors formalized?

@djmitche
Copy link
Contributor

Oh, yes, I didn't see jonas's last comment -- sounds great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants