Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: make conversions between string and usernames more explicit #55398

Merged
merged 1 commit into from
Oct 27, 2020

Conversation

knz
Copy link
Contributor

@knz knz commented Oct 9, 2020

Informs (but does not fix) #55396

Informs #54696
Informs #55389

tldr: the conversions between "external" strings and internal
usernames was unprincipled, and it turns out, incorrect in some
cases. This patch cleans this up by introducing a strict conversion
API.

Background

CockroachDB currently performs case folding and unicode NFC
normalization upon receiving usernames, specifically usernames received
from as SQL login principals.

Internally, usernames are often—but not always—considered
pre-normalized for the purpose of comparisons, privilege checks, role
membership checks and the like.

Finally, sometimes usernames are reported "back to outside".
In error messages, log files etc, but also:

  • in the SQL syntax produced by SHOW CREATE, SHOW SYNTAX etc.
  • to generate OS-level file names for exported files.

New API

This patch introduces a new data type security.SQLUsername.
It is incomparable and non-convertible with the Go string.

Engineers must now declare intent about how to do the conversion:

  • security.MakeSQLUsernameFromUserInput converts an "outside" string
    to a username.
  • security.MakeSQLUsernameFromPreNormalizedString promises that
    its argument has already been previously normalized.
  • security.MakeSQLUsernameFromPreNormalizedStringChecked also checks
    that its argument has already been previously normalized.

To output usernames, the following APIs are also available.

  • username.Normalized() produces the username itself, without
    decorations. These corresponds to the raw string (after
    normalization).

  • username.SQLIdentifier() produces the username in valid
    SQL syntax, so that it can be injected safely in a SQL
    statement.

  • (*tree.FmtCtx).FormatUsername() takes a username and properly
    handles quoting and anonymization, like FormatName() does for
    tree.Name already.

Likewise, conversion from/to protobuf is now regulated, via the new
APIs username.EncodeProto() and usernameproto.Decode().

Problems being solved

  • the usernames "from outside" were normalized sometimes, but not
    consistently
    :

    1. they were in the arguments of CREATE/DROP/ALTER ROLE. This was
      not changed.
    2. they were not consistently converted in cockroach cert. This was
      corrected.
    3. they were not in the cockroach userfile commands. This
      has been adjusted with a reference to issue importccl: the userfile table name generation algorithm breaks with special usernames, and confused client-side and server-side usernames #55389.
    4. they are not in GRANT/REVOKE. This patch does not change
      this behavior, but highlights it by spelling out
      MakeSQLUsernameFromPreNormalizedString() in the implementation.
    5. ditto for CREATE SCHEMA ... AUTHORIZATION and ALTER ... OWNER TO
    6. they were in the argument to LoginRequest. This was not
      changed.
    7. they were not in the argument of the other API requests that
      allow usernames, for example ListSessions or CancelQuery.
      This was not changed, but is now documented in the API.
  • the usernames "inside" were incorrectly directly injected
    in SQL statements, even though they may contain special
    characters that create SQL syntax errors.

    This has been corrected by suitable uses of the new
    SQLIdentifier() method.

  • There was an outright bug in a call to CreateGCJobRec (something
    about GCing jobs), where a Description field was passed in lieu
    of a username for a User field. The implications of this are unclear.

Status after this change

The new API makes it possible to audit exactly where "sensitive"
username/string conversion occurs. After this patch, we find the
following uses:

  • MakeSQLUsernameFromUserInput:

    • pgwire user auth
    • CLI URL parsing
    • cockroach userfile
    • cockroach cert
    • (*rpc.SecurityContext).PGURL() (unsure whether that's a good thing)
    • CREATE/DROP/ALTER ROLE
    • when using literal strings as role_spec in the SQL grammar
  • MakeSQLUsernameFromPreNormalizedString:

    • role membership checks inside SQL based on data read
      from system tables.
    • in GRANT/REVOKE (this is surprising, see above)
  • MakeSQLUsernameFromPreNormalizedStringChecked:

    • when intepreting the username in API query parameters, for
      those API documented as using pre-normalized usernames.

Release note: None

@knz knz requested review from solongordon, arulajmani, aaron-crl and a team October 9, 2020 22:03
@knz knz requested a review from a team as a code owner October 9, 2020 22:03
@knz knz requested review from miretskiy and removed request for a team October 9, 2020 22:03
@knz
Copy link
Contributor Author

knz commented Oct 9, 2020

NB: I'm not this change needs to be backported; however:

  • @ajwerner @spaskob I'd like you to check the fix to the CreateGCJobRec invocation and determine if the severity of this needs to be back-ported to 20.2.

    Specifically, this diff (schema_changer.go):

@@ -849,7 +849,7 @@ func (sc *SchemaChanger) rollbackSchemaChange(ctx context.Context, err error) er
                // Queue a GC job.
                jobRecord := CreateGCJobRecord(
                        "ROLLBACK OF "+sc.job.Payload().Description,
-                       sc.job.Payload().Description,
+                       sc.job.Payload().UsernameProto.Decode(),
                        jobspb.SchemaChangeGCDetails{
                                Tables: []jobspb.SchemaChangeGCDetails_DroppedID{
                                        {

Thanks

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz force-pushed the 20201009-sql-username branch 4 times, most recently from dcdbfae to a3d6b20 Compare October 10, 2020 13:05
@knz knz requested a review from a team as a code owner October 10, 2020 13:05
@dt
Copy link
Member

dt commented Oct 10, 2020

IMO none of the userfile implications are severe enough to call for a 20.2.0 backport -- worse-case you get an error if the default table name picked for your user is non-conforming, right? and then you just try again with a a valid table name explicitly specified? That seems slightly annoying but IMO not a release blocker.

@knz
Copy link
Contributor Author

knz commented Oct 11, 2020

Thanks David!

(Still interested in the implication for the schema change diff but maybe that's more a question for Andrew)

Copy link

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good modulo @solongordon 's feedback. I like pulling this principles into the security package as it's great hygiene and will help with tracking usage as @knz mentioned.

I have a nit around zero length usernames that I'd like others thoughts on. It shouldn't block this PR but I think it's worth discussing.

Copy link

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a nit around zero length usernames that I'd like others thoughts on. It shouldn't block this PR but I think it's worth .

And it looks like reviewable ate my nit comment. I promise it was better written the first time.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @miretskiy, and @solongordon)


pkg/security/username.go, line 119 at r1 (raw file):

func MakeSQLUsernameFromUserInput(u string, purpose UsernamePurpose) (res SQLUsername, err error) {
	// Perform case folding and NFC normalization.
	res.u = lex.NormalizeName(u)

Nit: We still allow zero length usernames to be used within the application. Do we need/want to support this going forward?

It looks like NormalizeName will happily return a SQLUsername{""} if provided a zero length string and while we check length during user creation as part of the regex in ValidateForCreation() we don't check it outside of this case.

In addition Undefined() returns true when provided a zero length SQLUsername meaning that user1.Undefined() == user2.Undefine() can evaluate true which unsettles the math student in me.

Do we use/rely on this being able to accommodate a zero length field within the application? Given how many of the string tests we run will evaluate to vacuously true this makes me queasy. Would it be unreasonable to check min/max length in the constructor after normalization to protect against this case if we don't actively use this as a zero length value?

@knz
Copy link
Contributor Author

knz commented Oct 12, 2020

I agree the validation function should check the username is non-empty. I will make this change.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good, and thanks for laying the foundations for #54696 (among other issues). I left mostly minor comments and a couple questions (mostly for my understanding). I'll let @solongordon have the final review on this, though.

Reviewed 158 of 231 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @knz, @miretskiy, and @solongordon)


pkg/cli/cli_test.go, line 1392 at r1 (raw file):

	// cert create-client ,foo
	// ERROR: failed to generate client certificate and key: username ",foo" invalid
	// SQLSTATE: 42602

Not sure if these comments have any bearing on the tests, but was this change intentional?


pkg/security/username.go, line 128 at r1 (raw file):

// UsernamePurpose indicates the purpose of the resulting
// SQLUsername in MakeSQLUsernameFromUserInput.
type UsernamePurpose bool

nit: int (and iota below) instead of bool maybe?


pkg/security/username.go, line 180 at r1 (raw file):

// a canonical username. The caller of this promises that the
// argument is pre-normalized.
func MakeSQLUsernameFromPreNormalizedString(u string) SQLUsername {

Can we check the "pre-normalization" explicitly here instead of relying on the caller knowing what they're doing?


pkg/security/username.go, line 219 at r1 (raw file):

// EncodeProto turns a username into its proto representation.
func (u SQLUsername) EncodeProto() SQLUsernameProto { return SQLUsernameProto(u.u) }

super nit: my IDE is complaining about receiver names being different, can we change this to s?


pkg/server/testserver.go, line 71 at r1 (raw file):

	// TestUser is a fixed user used in unittests.
	// It has valid embedded client certs.
	TestUser = security.TestUser

Can we get rid of this completely? I tried replacing all usages of server.TestUser with security.TestUser and didn't run into any dependency errors, so I wonder if we no longer need this.


pkg/server/serverpb/status.proto, line 547 at r1 (raw file):

  // Username of the user making this request.
  // The caller is responsible to normalize the username
  // (= case fold and perform unicode NFC normalization).

Instead of relying on the caller, can we be a bit more prescriptive about this without breaking stuff?

What if we replace the type of this to be SQLUsernameProto instead of string, and enforce that the username is normalized in the EncodeProto and Decode methods?


pkg/sql/grant_revoke.go, line 65 at r1 (raw file):

	// REASSIGN / OWNER TO do normalize.
	// Related: https://github.com/cockroachdb/cockroach/issues/54696
	grantees := make([]security.SQLUsername, len(n.Grantees))

What's the reason to consider not normalizing here? Backwards compat?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @arulajmani, @miretskiy, and @solongordon)


pkg/cli/cli_test.go, line 1392 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Not sure if these comments have any bearing on the tests, but was this change intentional?

yes, the comment is the test (expected test output)

the cert commands are not SQL client commands, so having a SQLSTATE for them is nonsensical


pkg/security/username.go, line 119 at r1 (raw file):

We still allow zero length usernames to be used within the application.

No we don't. Added the check here.

It looks like NormalizeName will happily return a SQLUsername{""} if provided a zero length string and while we check length during user creation as part of the regex in ValidateForCreation() we don't check it outside of this case.

Done

In addition Undefined() returns true when provided a zero length SQLUsername meaning that user1.Undefined() == user2.Undefine() can evaluate true which unsettles the math student in me.

These comparisons do not occur in the source code.

Do we use/rely on this being able to accommodate a zero length field within the application?

No


pkg/security/username.go, line 128 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: int (and iota below) instead of bool maybe?

Nah, as long as there are just 2 values bool just works fine.


pkg/security/username.go, line 180 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Can we check the "pre-normalization" explicitly here instead of relying on the caller knowing what they're doing?

Discussed off the issue: I don't think we want this for performance reasons. The usernames fly over the wire for every SQL query.


pkg/security/username.go, line 219 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

super nit: my IDE is complaining about receiver names being different, can we change this to s?

Done.


pkg/server/testserver.go, line 71 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Can we get rid of this completely? I tried replacing all usages of server.TestUser with security.TestUser and didn't run into any dependency errors, so I wonder if we no longer need this.

Done.


pkg/server/serverpb/status.proto, line 547 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Instead of relying on the caller, can we be a bit more prescriptive about this without breaking stuff?

What if we replace the type of this to be SQLUsernameProto instead of string, and enforce that the username is normalized in the EncodeProto and Decode methods?

Discussed this out of review: I am not keen to force checks in the proto conversion, for performance reason.

However, I think we can be prescriptive here by adding a check when receiving an API request. Done.


pkg/sql/grant_revoke.go, line 65 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

What's the reason to consider not normalizing here? Backwards compat?

Fixed.

@knz knz force-pushed the 20201009-sql-username branch 2 times, most recently from 4dde711 to a74a778 Compare October 22, 2020 11:36
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @arulajmani, @miretskiy, and @solongordon)

@knz knz force-pushed the 20201009-sql-username branch from a74a778 to 691f128 Compare October 23, 2020 18:06
@knz
Copy link
Contributor Author

knz commented Oct 23, 2020

Had to split package lex into lexbase / lex during rebase to avoid a cycle dependency due to the sessiondata refactor.

@knz knz force-pushed the 20201009-sql-username branch 2 times, most recently from 9bb4a52 to 1b7dc43 Compare October 24, 2020 21:40
@knz knz force-pushed the 20201009-sql-username branch from 1b7dc43 to ca1a0c8 Compare October 25, 2020 22:42
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Could you also update the commit message to include a reference to MakeSQLUsernameFromPreNormalizedStringChecked as well?

Reviewed 6 of 69 files at r2, 53 of 88 files at r3, 11 of 15 files at r4.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aaron-crl, @arulajmani, @knz, @miretskiy, and @solongordon)


pkg/sql/grant_revoke.go, line 65 at r1 (raw file):

Previously, knz (kena) wrote…

Fixed.

Which layer was changed to fix this, I wasn't fully able to follow what happened here.

Also, could you update the TODO (above and below)


pkg/sql/grant_revoke.go, line 163 at r4 (raw file):

		if _, ok := users[grantee]; !ok {
			sqlName := tree.Name(n.grantees[i].Normalized())
			return errors.Errorf("user or role %s does not exist", &sqlName)

nit: you can just use grantee.Normalized() here directly instead of this sqlName stuff

@knz knz force-pushed the 20201009-sql-username branch 3 times, most recently from 6d88855 to 414f647 Compare October 26, 2020 21:23
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aaron-crl, @arulajmani, @miretskiy, and @solongordon)


pkg/sql/grant_revoke.go, line 65 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Which layer was changed to fix this, I wasn't fully able to follow what happened here.

Also, could you update the TODO (above and below)

Sorry I did not change anything here. So the TODO stands.
All in all this PR pushes the ball down the road on this question, which is why the linked issue remains open for now.


pkg/sql/grant_revoke.go, line 163 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: you can just use grantee.Normalized() here directly instead of this sqlName stuff

No that's incorrect: tree.Name has different quoting rules, that of identifiers. for example the username foo-bar would quote as "foo-bar". I did not want to change this behavior in this PR.

@knz
Copy link
Contributor Author

knz commented Oct 26, 2020

Could you also update the commit message to include a reference to MakeSQLUsernameFromPreNormalizedStringChecked as well?

done

@knz knz force-pushed the 20201009-sql-username branch 3 times, most recently from 96dd09d to a3ae4e2 Compare October 27, 2020 09:39
tldr: the conversions between "external" strings and internal
usernames was unprincipled, and it turns out, incorrect in some
cases. This patch cleans this up by introducing a strict conversion
API.

**Background**

CockroachDB currently performs case folding and unicode NFC
normalization upon receiving usernames, specifically usernames received
from as SQL login principals.

Internally, usernames are often—but not always—considered
pre-normalized for the purpose of comparisons, privilege checks, role
membership checks and the like.

Finally, sometimes usernames are reported "back to outside".
In error messages, log files etc, but also:

- in the SQL syntax produced by SHOW CREATE, SHOW SYNTAX etc.
- to generate OS-level file names for exported files.

**New API**

This patch introduces a new data type `security.SQLUsername`.
It is incomparable and non-convertible with the Go `string`.

Engineers must now declare intent about how to do the conversion:

- `security.MakeSQLUsernameFromUserInput` converts an "outside" string
  to a username.
- `security.MakeSQLUsernameFromPreNormalizedString` promises that
  its argument has already been previously normalized.
- `security.MakeSQLUsernameFromPreNormalizedStringChecked` also checks
  that its argument has already been previously normalized.

To output usernames, the following APIs are also available.

- `username.Normalized()` produces the username itself, without
  decorations. These corresponds to the raw string (after
  normalization).

- `username.SQLIdentifier()` produces the username in valid
  SQL syntax, so that it can be injected safely in a SQL
  statement.

- `(*tree.FmtCtx).FormatUsername()` takes a username and properly
  handles quoting and anonymization, like `FormatName()` does for
  `tree.Name` already.

Likewise, conversion from/to protobuf is now regulated, via the new
APIs `username.EncodeProto()` and `usernameproto.Decode()`.

**Problems being solved**

- the usernames "from outside" were normalized sometimes, *but not
  consistently*:

  1. they were in the arguments of CREATE/DROP/ALTER ROLE. This was
     not changed.
  2. they were not consistently converted in `cockroach cert`. This was
     corrected.
  3. they were not in the `cockroach userfile` commands. This
     has been adjusted with a reference to issue cockroachdb#55389.
  4. they are *not* in GRANT/REVOKE. This patch does not change
     this behavior, but highlights it by spelling out
     `MakeSQLUsernameFromPreNormalizedString()` in the implementation.
  5. ditto for CREATE SCHEMA ... AUTHORIZATION and ALTER ... OWNER TO
  6. they were in the argument to `LoginRequest`. This was not
     changed.
  7. they were not in the argument of the other API requests that
     allow usernames, for example `ListSessions` or `CancelQuery`.
     This was not changed, but is now documented in the API.

- the usernames "inside" were incorrectly directly injected
  in SQL statements, even though they may contain special
  characters that create SQL syntax errors.

  This has been corrected by suitable uses of the new
  `SQLIdentifier()` method.

- There was an outright bug in a call to `CreateGCJobRec` (something
  about GCing jobs), where a `Description` field was passed in lieu
  of a username for a `User` field. The implications of this are unclear.

**Status after this change**

The new API makes it possible to audit exactly where "sensitive"
username/string conversion occurs. After this patch, we find the
following uses:

- `MakeSQLUsernameFromUserInput`:

  - pgwire user auth
  - CLI URL parsing
  - `cockroach userfile`
  - `cockroach cert`
  - `(*rpc.SecurityContext).PGURL()` (unsure whether that's a good thing)
  - CREATE/DROP/ALTER ROLE
  - when using literal strings as `role_spec` in the SQL grammar

- `MakeSQLUsernameFromPreNormalizedString`:

  - role membership checks inside SQL based on data read
    from `system` tables.
  - in GRANT/REVOKE (this is surprising, see above)

- `MakeSQLUsernameFromPreNormalizedStringChecked`:

  - when intepreting the username in API query parameters, for
    those API documented as using pre-normalized usernames.

Release note: None
@knz knz force-pushed the 20201009-sql-username branch from a3ae4e2 to 1d05979 Compare October 27, 2020 13:06
@knz
Copy link
Contributor Author

knz commented Oct 27, 2020

bors r=aaron-crl,arulajmani

@craig
Copy link
Contributor

craig bot commented Oct 27, 2020

Build succeeded:

@craig craig bot merged commit 8652b7a into cockroachdb:master Oct 27, 2020
@knz knz deleted the 20201009-sql-username branch October 27, 2020 14:53
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 28, 2020
We picked up a dependency on pkg/security within the parser package
after cockroachdb#55398.

We have to pin go dependencies by hand if they're only present in
auto-generated code. It's because it's not otherwise visible to
bazel/gazelle when generating the BUILD files (during the analysis
phase).

Release note: None
craig bot pushed a commit that referenced this pull request Oct 28, 2020
56064: storage: Bring back MaxSyncDuration env var, fix disk-stalled roachtest r=itsbilal a=itsbilal

The disk-stalled roachtest relies on the ability to control disk stall
detection / fatal intervals, as charybdefs only injects 50ms of
delay per syscall. This change adds an env variable, similar to the
one removed in #55186 to set max sync duration, except now it governs
the default of the cluster setting. The roachtest now modifies that
env variable to let disk stall detection trip on short syscall
delays.

Fixes #54332.

Release note: None.

56092: parser: pin pkg/security as a bazel dep for pkg/parser r=irfansharif a=irfansharif

We picked up a dependency on pkg/security within the parser package
after #55398.

We have to pin go dependencies by hand if they're only present in
auto-generated code. It's because it's not otherwise visible to
bazel/gazelle when generating the BUILD files (during the analysis
phase).

Release note: None

56095: partialidx: add benchmarks for two-variable comparisons r=RaduBerinde a=mgartner

Two-variable comparison implication performs similarly to other types of
implications.

    BenchmarkImplicator/single-exact-match-16                         76.5 ns/op
    BenchmarkImplicator/single-inexact-match-16                      342 ns/op
    BenchmarkImplicator/range-inexact-match-16                       782 ns/op
    BenchmarkImplicator/two-var-comparison-16                        302 ns/op
    BenchmarkImplicator/single-exact-match-extra-filters-16          310 ns/op
    BenchmarkImplicator/single-inexact-match-extra-filters-16        609 ns/op
    BenchmarkImplicator/multi-column-and-exact-match-16               82.4 ns/op
    BenchmarkImplicator/multi-column-and-inexact-match-16            722 ns/op
    BenchmarkImplicator/multi-column-and-two-var-comparisons-16      611 ns/op
    BenchmarkImplicator/multi-column-or-exact-match-16                76.1 ns/op
    BenchmarkImplicator/multi-column-or-exact-match-reverse-16       595 ns/op
    BenchmarkImplicator/multi-column-or-inexact-match-16            1081 ns/op
    BenchmarkImplicator/in-implies-or-16                             976 ns/op
    BenchmarkImplicator/and-filters-do-not-imply-pred-16            3710 ns/op
    BenchmarkImplicator/or-filters-do-not-imply-pred-16              917 ns/op
    BenchmarkImplicator/many-columns-exact-match10-16                296 ns/op
    BenchmarkImplicator/many-columns-inexact-match10-16             6853 ns/op
    BenchmarkImplicator/many-columns-exact-match100-16             19817 ns/op
    BenchmarkImplicator/many-columns-inexact-match100-16          447894 ns/op

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants