Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: demonstrate uses of the errors library #38127

Merged
merged 22 commits into from
Jun 18, 2019
Merged

Conversation

knz
Copy link
Contributor

@knz knz commented Jun 10, 2019

This continues the discussion from #37765 - the original PR was auto-closed by github (due to a bug?) so I opened this new PR.

This PR adapts various places in CockroachDB to demonstrate benefits of the errors library introduced in #36987 / #37121.

The main goals of this PR is to remove most usages of pgerror.Error, and replace them by regular error objects.

The benefit of this change is that all errors details are then preserved throughout distsql execution.

This is achieved by the following high level steps:

  1. the introduction of a new function pgerror.Flatten() that can convert a regular error into a *pgerror.Error.
    This step occurs in a commit in the PR near the beginning, called "pgerror: add Flatten() to convert errors into pgerror.Error"

    This function is meant to be used at "boundaries" only:

    • when sending an error packet to a SQL client
    • when reporting a distsql error to a v19.1 sql gateway
  2. changing the implementation of the pgerror.XXX() constructor functions to use the new library's constructors instead.
    This step is achieved by another commit in the PR near the end, called "pgwire, distsqlpb: let native errors flow through"

    From that point, most of the in-flight error objects inside CockroachDB are not instances of pgerror.Error any more.
    (The exception is a pgerror.Error object received on a distsql flow from a v19.1 node).
    This is thus also the moment where calls to pgerror.Flatten() really become necessary at the aforementioned boundaries.

The end result is that most of the "interior" of CockroachDB sees plain error objects and cannot expect a *pgerror.Error object any more.
The remaining commits in this PR are necessary to make these thing work together well:

  • the few commits towards the beginning, prior to step 1 are ancillary commits that ensure that SQL errors are produced where appropriate.

    (two actually belong to other PRs sql/row: use assertion failures instead of panics #37802 sem/tree: avoid deconstructing the error when parsing a datum #37806 and will disappear when those PRs are merged)

  • the commits in-between steps 1 and 2 changes the "interior" code to not rely on the presence of a pgerror.Error object in the causal chain.

    These commits must occur after step 1 because they require the presence of a "pg error code" which is computed by Flatten()'s companion function GetPGCode().

  • the commits after step 2 are mostly cosmetic and remove more dependencies on pgerror.Error in other places.

@knz knz requested a review from a team as a code owner June 10, 2019 12:00
@knz knz requested review from a team June 10, 2019 12:00
@knz knz requested a review from a team as a code owner June 10, 2019 12:00
@knz knz requested review from a team June 10, 2019 12:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Jun 10, 2019

Last comments from @andreimatei

I've looked at those 3 commits.
LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @mjibson)


pkg/cli/sql.go, line 1479 at r28 (raw file):

// decomposition in the first return value. If it is not, the function
// assembles a pgerror.Error with suitable Detail and Hint fields.
func (c *cliState) serverSideParse(sql string) (stmts []string, pgErr *pgerror.Error) {

I was gonna ask if this method could stop using pgErr but I see you've done that in a future PR :plause:


pkg/sql/distsqlpb/data.go, line 149 at r28 (raw file):

	resErr.FullError = &fullError

	// Now populate compatibility fields for 19.1 nodes.

consider gating this code on a cluster version check for 19.2


pkg/sql/distsqlpb/data.go, line 171 at r28 (raw file):

	if e.FullError != nil {
		// If there's a 19.1-forward full error, decode and use that.

you mean 19.2 here I think


pkg/sql/distsqlpb/data.proto, line 58 at r28 (raw file):

  // 19.1 nodes and further are expected to use full_error directly where
  // all error decorations are preserved.
  optional errorspb.EncodedError full_error = 3;

writing this field must have been satisfying :)


pkg/sql/pgwire/pgerror/errors.proto, line 52 at r28 (raw file):

  //   populated with the first key found in errors.GetTelemetryKeys().
  //
  // After 19.1 is outphased, this field can be removed.

nit: just say "TODO: remove in 19.3" here and elsewhere

@knz
Copy link
Contributor Author

knz commented Jun 10, 2019

More comments from @andreimatei

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @mjibson)


pkg/sql/schema_changer.go, line 195 at r26 (raw file):

	// The Backfill will grab a new timestamp to read at for the rest
	// of the backfill.
	// TODO(knz): this should really use errors.Is(). However until/unless

Have you considered doing gating the use of errors.Is() here on a cluster version check? That'll remind us to remove the old code next release.


pkg/sql/schema_changer.go, line 219 at r26 (raw file):

	// TODO(knz): This should really use the errors library with
	// a custom marker.
	if _, ok := errors.UnwrapAll(err).(errTableVersionMismatch); ok {

Why not errors.Is() here?
Please put more words in this comment; as written I don't think it'll mean much to people (it doesn't even mean much to me :P)


pkg/sql/schema_changer.go, line 840 at r26 (raw file):

			// As we are dismissing the error, go through the recording motions.
			// This ensures that any important error gets reported to Sentry, etc.
			sqltelemetry.RecordError(ctx, err, &sc.settings.SV)

do we still need the logging above if we're doing RecordError()? Seems to me that RecordError could log if it doesn't already.


pkg/sql/schema_changer.go, line 865 at r26 (raw file):

			log.Warningf(ctx, "while reasing schema change lease: %+v", err)
			// Go through the recording motions. See comment above.
			sqltelemetry.RecordError(ctx, err, &sc.settings.SV)

same about the logging above

@knz
Copy link
Contributor Author

knz commented Jun 10, 2019

Last comments from @RaduBerinde

I tried but I got stuck. There is an inversion of control in the error generation from the LALR parsing code. I'll take suggestions!

It's not a big deal if it's not easy. One idea would be to use WithDetailf instead of Wrapf("at or near"). That also makes more sense conceptually, as we are providing detail about the error.

By the way - WithDetailf is a bit awkward - it looks like we have to prepend a space or the regular error message is bad but then the extra space shows up over pgwire (e.g. DETAIL: at or near ",") [edit: there are two spaces before at, github doesn't show it right)

Another thing - when looking through the WithXX functions available, it was hard to understand whether the info I'm adding would be visible if you just print the error (in general, it's hard to know how the info would show up in various contexts, like printing or going over pgwire). It would be good to add to the comments for these.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/sql/schema_changer.go, line 197 at r12 (raw file):

Have you considered doing gating the use of errors.Is() here on a cluster version check? That'll remind us to remove the old code next release.

Yes I have considered, however there are more places that test this particular error in this way. Filed #38128 to address them comprehensively.


pkg/sql/schema_changer.go, line 218 at r12 (raw file):

Why not errors.Is() here?
Please put more words in this comment; as written I don't think it'll mean much to people (it doesn't even mean much to me :P)

Good idea. Done. Also, removed the comment.


pkg/sql/schema_changer.go, line 837 at r12 (raw file):

do we still need the logging above if we're doing RecordError()? Seems to me that RecordError could log if it doesn't already.

RecordError only logs (with log.Error) when the error is an internal (assertion) error.
This logging here ensures that any error in the schema changer gets logged as a warning.


pkg/sql/schema_changer.go, line 865 at r12 (raw file):

same about the logging above

ditto


pkg/sql/distsqlpb/data.go, line 149 at r13 (raw file):

consider gating this code on a cluster version check for 19.2

I like the idea (and doing it in other places), however here it's not possible because this is also called from a String() method which does not have access to the cluster version.


pkg/sql/distsqlpb/data.go, line 171 at r13 (raw file):

you mean 19.2 here I think

yes I did, thanks. Fixed.


pkg/sql/pgwire/pgerror/errors.proto, line 52 at r13 (raw file):

nit: just say "TODO: remove in 19.3" here and elsewhere

Done.

@knz
Copy link
Contributor Author

knz commented Jun 10, 2019

It's not a big deal if it's not easy. One idea would be to use WithDetailf instead of Wrapf("at or near"). That also makes more sense conceptually, as we are providing detail about the error.

We already do this (at the end of populateErrorDetails). The reasoning is as follows:

  • the approximate location ("at or near") is included in the main error message (that returned by Error() and the main "Message" payload in pgwire)
  • the detailed location (with a position caret0 is included in the error detail string (that returned by GetAllDetails() and the "Detail" payload in pgwire)
  • the reason why we want it in both is that not all clients are able to propagate the pgwire detail field to the eyes of a human user:
    • most of our tests do not propagate the detail field
    • some SQL UI tools I've used in the past do not either

By the way - WithDetailf is a bit awkward - it looks like we have to prepend a space or the regular error message is bad but then the extra space shows up over pgwire (e.g. DETAIL: at or near ",")

I do not understand this. I have tried manually to use it and I do not see what you see. can you tell more?

Another thing - when looking through the WithXX functions available, it was hard to understand whether the info I'm adding would be visible if you just print the error (in general, it's hard to know how the info would show up in various contexts, like printing or going over pgwire). It would be good to add to the comments for these.

Agreed - this merits a summary in the RFC. I think this table will help:

Error annotation format %s/%q/%v %+v pgwire
message visible visible (first line) message payload
wrap prefix visible (as prefix) visible (details on subsequent lines) message payload
pg code not visible visible (details on subsequent lines) code payload
hint not visible visible (details on subsequent lines) hint payload
detail not visible visible (details on subsequent lines) detail payload
telemetry keys not visible visible (details on subsequent lines) not visible
secondary errors not visible visible (details on subsequent lines) not visible
barrier origins not visible visible (details on subsequent lines) not visible
issue links not visible visible (details on subsequent lines) translated to hint
assertion failure annotation not visible visible (details on subsequent lines) translated to hint

@knz knz force-pushed the 20190523-errlib branch from 1b13e64 to 03b3992 Compare June 10, 2019 14:39
@knz knz force-pushed the 20190523-errlib-diffs branch 2 times, most recently from 11ef5d4 to e1a2ffc Compare June 10, 2019 14:43
@knz knz force-pushed the 20190523-errlib branch from 03b3992 to 2744cd9 Compare June 11, 2019 07:39
@knz knz force-pushed the 20190523-errlib-diffs branch from e1a2ffc to bf152aa Compare June 11, 2019 07:39
@knz knz force-pushed the 20190523-errlib branch from 2744cd9 to 5188198 Compare June 11, 2019 08:13
@knz knz force-pushed the 20190523-errlib-diffs branch from bf152aa to ef803ab Compare June 11, 2019 08:14
@RaduBerinde
Copy link
Member

I do not understand this. I have tried manually to use it and I do not see what you see. can you tell more?

Try this diff:

--- a/pkg/sql/parser/lexer.go
+++ b/pkg/sql/parser/lexer.go
@@ -194,7 +194,8 @@ func (l *lexer) populateErrorDetails() {
 			// parser encounters a parsing error.
 			l.lastError = errors.Wrap(l.lastError, "syntax error")
 		}
-		l.lastError = errors.Wrapf(l.lastError, "at or near \"%s\"", lastTok.str)
+		l.lastError = errors.WithDetailf(l.lastError, "at or near \"%s\"", lastTok.str)
 	}

Then run make test PKG=./pkg/sql/parser. I see errors like:

            syntax error: AS OF specified multiple timesat or near "EOF"

I think this table will help:

It would be great if we included this information for each WithXXX helper, e.g:

 // WithDetail decorates an error with a textual detail.
 // The detail may contain PII and thus will not reportable.
-// Note: the detail does not appear in the main error message
-// returned with Error().
+//
+// Visibility of the detail:
+//  - Error(): not visible
+//  - %+v: visible (on subsequent lines)
+//  - pgwire: detail payload
+//

knz added 21 commits June 18, 2019 17:42
... since the error library's Wrap is decent enough.

Release note: None
The new function `Flatten()` provides the bridge between the new error
library on the remainder of the code using pgerror.Error.

Release note: None
This replaces code that mutates `pgerror.Error` objects in place by
code that adds hints and detail annotations agnostic of the particular
error type.

Release note: None
Prior to this patch, the opt code was special-casing pgerror.Error
and required an object of this type to be present in the causal
chain to work properly.

This patch removes the requirement by using the facilities from the
errors library.

Release note: None
`GetPGCause()` is annoying/counter-productive because it peels all the
decorations around a `pgerror.Error`, including details that may be
helpful to troubleshooting. It is also not friendly to the new errors
library which is able to define pg error code without using
`pgerror.Error`.

This patch removes all its uses but one (last remaining in
`distsqlpb`, modified in a later patch) and replaces them, when
applicable, by `GetPGCode()` which is friendly to the new errors
library.

Release note: None
This patch introduces support for receiving
decorated (non-`pgerror.Error`) errors from distsql flows.  It also
redefines the `pgerror` helpers to use the new errors library, so as
to not instantiate `pgerror.Error` object until/unless `Flatten()` is
called.

This way errors remain as regular errors (and not `pgerror.Error`)
throughout most of the CockroachDB internals, and only become
`pgerror.Error` ("flattened") at very few points:

- to generate a compatibility payload when sending an error to 19.1 nodes
- in pgwire to generate packets for clients
- in the output of SHOW SYNTAX
- in some tests

Release note: None
A while ago I had changed many calls to
`errors.Errorf`/`errors.New`/`fmt.Errorf`/`errors.Wrap` to use
`pgerror` error constructors instead. The goal was to ensure the error
objects were decorated with safe details suitable for reporting.

However using those APIs also required a pg error code, even in cases
where none was obviously available. So I chose then to use the pg code
"CodeDataException" as a dummy value.

Now that the `errors` library is able to annotate errors in all
the good ways, the `pgerror` interface is not needed any more.
This commit reverts the past change.

This change is nearly all mechanical, using perl.
The only manual change is to the function `wrapRowErr` in the
`importccl` package.

Release note: None
... instead of the aliased definitions in `pgerror`.

(This change is entirely mechanical, using perl)

Release note: None
... where appropriate. This drops the dependency on `util/log` in many
cases.

(This change was mechanical, using perl)

Release note: None
... and remove pgwire/pgerror/codes.go

Release note: None
Previously `teamcity-testrace.sh` would run `testrace` passing all the
modified packages simultaneously in the PKG make var. This causes `go
test` to issue all the tests concurrently, regardless of available
hardware. If there are sufficiently many packages modified, this
overloads the machines, makes test run much slower than usual, and
triggers bad behavior (timeouts).

This patch alleviates the potential problem by running the tests one
after the other.

Release note: None
@knz knz force-pushed the 20190523-errlib branch from 5cba9e7 to 17f2718 Compare June 18, 2019 15:44
@knz knz force-pushed the 20190523-errlib-diffs branch from 684b49b to dc13807 Compare June 18, 2019 15:45
craig bot pushed a commit that referenced this pull request Jun 18, 2019
37121: errors: introduce a general-purpose modular error library with good properties r=knz a=knz

This PR introduces an error library implemented after the principles laid out in #36987.

It subsumes the following two PRs:

- #38127 *: demonstrate uses of the errors library 
  - review approval: #38127 (comment)
- #37813 sql, *: simplify remove dependencies on pgerror, util/log
  - review approval: #37813 (comment)

Release note: None

Co-authored-by: Raphael 'kena' Poss <[email protected]>
sql, *: simplify remove dependencies on pgerror, util/log
@knz knz requested review from a team June 18, 2019 17:14
@knz knz merged commit fdf2ba8 into 20190523-errlib Jun 18, 2019
@knz knz deleted the 20190523-errlib-diffs branch June 18, 2019 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants