Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: audit usages of tenantPrefix.PrefixEnd() and codec.TenantSpan() #104928

Closed
arulajmani opened this issue Jun 14, 2023 · 0 comments · Fixed by #110241
Closed

sql: audit usages of tenantPrefix.PrefixEnd() and codec.TenantSpan() #104928

arulajmani opened this issue Jun 14, 2023 · 0 comments · Fixed by #110241
Assignees
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-multitenant Issues owned by the multi-tenant virtual team

Comments

@arulajmani
Copy link
Collaborator

arulajmani commented Jun 14, 2023

Describe the problem

Recent investigations have revealed that #104606 is caused by split points created during tenant creation, which are generated in:

tenantStartKey := roachpb.RKey(ms.codec.TenantPrefix())
tenantEndKey := tenantStartKey.PrefixEnd()
splits = []roachpb.RKey{tenantStartKey, tenantEndKey}

The problem here is the split point that corresponds to tenantEndKey. Normally, there is no difference between MakeTenantPrefix(ID).PrefixEnd() and MakeTenantPrefix(ID + 1). However, this doesn't hold true at the boundary points of EncodeUvarintAscending -- @nvanbenschoten demonstrated this over at: https://go.dev/play/p/1KKbBMwENyX

Interestingly, because these two keys aren't the same at certain ID values, it means that if the key was generated using MakeTenantPrefix(ID).PrefixEnd(), a tenant ID cannot be parsed from it. That's what caused the fatals at the linked issue above.

All this points to a misunderstanding, in various places in the code, where we aren't being intentional about which one we need or do not need. We should audit all places in the code that use codec.TenantSpan() and <tenant_prefix>.PrefixEnd() to ensure we're doing the right thing (given our new found understanding of the subtleties here).

cc @stevendanna @knz @ecwall

Jira issue: CRDB-28780
Epic: CRDB-26091

@arulajmani arulajmani added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-multitenant Issues owned by the multi-tenant virtual team labels Jun 14, 2023
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jun 14, 2023
Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as
`MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID
corresponds to one of the boundary points in `EncodeUvarintAscending`.
Prior to this patch, various places in the code did not appreciate this
subtlety. This proved problematic when we started creating split points
at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal
to one of these boundary points (e.g. 109), decoding the key
`MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant
ID (in our case, 110). Instead, it results in an error.

Worse yet, various places in KV assume doing such tenant decoding at a
range's start key is kosher. We've since disallowed KV from accepting
such splits in cockroachdb#104802.
This patch fixes the root cause of why these splits were being issued
in the first place -- over in
cockroachdb@ac54eba,
we started creating spilt points at the end of a tenant's keyspace
(in addition to the start of it). We did so using `PrefixEnd()`, which
we have since learned is not what we wanted here.

In addition to the initial split point, we also fix the span config
records we seed during tenant creation. We've also added a test to
ensure all split points created during cluster creation are kosher.
The test uses randomized tenant IDs -- I confirmed that it fails with
tenant ID = 109 (without my changes).

Lastly, there's a bit more auditing work that needs to be done here
about these assumptions. That's captured in a followup issue
cockroachdb#104928.

Prevents cockroachdb#104606 from
happening.

Release note (bug fix): Fixes a bug where tenant creation for certain
IDs would always fail because of invalid split points. Additionally,
such tenant creation failure could leave the host cluster's span
config state entirely busted -- we prevent that as well.
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jun 15, 2023
Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as
`MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID
corresponds to one of the boundary points in `EncodeUvarintAscending`.
Prior to this patch, various places in the code did not appreciate this
subtlety. This proved problematic when we started creating split points
at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal
to one of these boundary points (e.g. 109), decoding the key
`MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant
ID (in our case, 110). Instead, it results in an error.

Worse yet, various places in KV assume doing such tenant decoding at a
range's start key is kosher. We've since disallowed KV from accepting
such splits in cockroachdb#104802.
This patch fixes the root cause of why these splits were being issued
in the first place -- over in
cockroachdb@ac54eba,
we started creating spilt points at the end of a tenant's keyspace
(in addition to the start of it). We did so using `PrefixEnd()`, which
we have since learned is not what we wanted here.

In addition to the initial split point, we also fix the span config
records we seed during tenant creation. We've also added a test to
ensure all split points created during cluster creation are kosher.
The test uses randomized tenant IDs -- I confirmed that it fails with
tenant ID = 109 (without my changes).

Lastly, there's a bit more auditing work that needs to be done here
about these assumptions. That's captured in a followup issue
cockroachdb#104928.

Prevents cockroachdb#104606 from
happening.

Release note (bug fix): Fixes a bug where tenant creation for certain
IDs would always fail because of invalid split points. Additionally,
such tenant creation failure could leave the host cluster's span
config state entirely busted -- we prevent that as well.
craig bot pushed a commit that referenced this issue Jun 15, 2023
104920: sql: refrain from performing invalid splits during tenant creation  r=nvanbenschoten,ecwall a=arulajmani

Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as
`MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID
corresponds to one of the boundary points in `EncodeUvarintAscending`.
Prior to this patch, various places in the code did not appreciate this
subtlety. This proved problematic when we started creating split points
at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal
to one of these boundary points (e.g. 109), decoding the key
`MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant
ID (in our case, 110). Instead, it results in an error.

Worse yet, various places in KV assume doing such tenant decoding at a
range's start key is kosher. We've since disallowed KV from accepting
such splits in #104802.
This patch fixes the root cause of why these splits were being issued
in the first place -- over in
ac54eba,
we started creating spilt points at the end of a tenant's keyspace
(in addition to the start of it). We did so using `PrefixEnd()`, which
we have since learned is not what we wanted here.

In addition to the initial split point, we also fix the span config
records we seed during tenant creation. We've also added a test to
ensure all split points created during cluster creation are kosher.
The test uses randomized tenant IDs -- I confirmed that it fails with
tenant ID = 109 (without my changes).

Lastly, there's a bit more auditing work that needs to be done here
about these assumptions. That's captured in a followup issue
#104928.

Prevents #104606 from
happening.

Release note (bug fix): Fixes a bug where tenant creation for certain
IDs would always fail because of invalid split points. Additionally,
such tenant creation failure could leave the host cluster's span
config state entirely busted -- we prevent that as well.

Co-authored-by: Arul Ajmani <[email protected]>
arulajmani added a commit to arulajmani/cockroach that referenced this issue Jun 15, 2023
Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as
`MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID
corresponds to one of the boundary points in `EncodeUvarintAscending`.
Prior to this patch, various places in the code did not appreciate this
subtlety. This proved problematic when we started creating split points
at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal
to one of these boundary points (e.g. 109), decoding the key
`MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant
ID (in our case, 110). Instead, it results in an error.

Worse yet, various places in KV assume doing such tenant decoding at a
range's start key is kosher. We've since disallowed KV from accepting
such splits in cockroachdb#104802.
This patch fixes the root cause of why these splits were being issued
in the first place -- over in
cockroachdb@ac54eba,
we started creating spilt points at the end of a tenant's keyspace
(in addition to the start of it). We did so using `PrefixEnd()`, which
we have since learned is not what we wanted here.

In addition to the initial split point, we also fix the span config
records we seed during tenant creation. We've also added a test to
ensure all split points created during cluster creation are kosher.
The test uses randomized tenant IDs -- I confirmed that it fails with
tenant ID = 109 (without my changes).

Lastly, there's a bit more auditing work that needs to be done here
about these assumptions. That's captured in a followup issue
cockroachdb#104928.

Prevents cockroachdb#104606 from
happening.

Release note (bug fix): Fixes a bug where tenant creation for certain
IDs would always fail because of invalid split points. Additionally,
such tenant creation failure could leave the host cluster's span
config state entirely busted -- we prevent that as well.
craig bot pushed a commit that referenced this issue Jun 15, 2023
104677: ui: Redirect Range Report page to Hot Ranges page r=gtr a=gtr

Fixes: #102377.

Previously, when a user selected a range ID from the Hot Ranges page,
the left side menu would switch to Advanced Debug and the back button
would also redirect to the Advanced Dedug page. This commit ensures that
when a range ID is selected, the left side menu will stay on the Hot
Ranges page and also changes the back button to redirect back to the
Hot Ranges page.

<img width="791" alt="image" src="https://github.com/cockroachdb/cockroach/assets/35943354/41afa924-7395-4101-a14c-bb4cdeccec0c">

Release note (ui change): The Range Report page (route
`/reports/range/:rangeID`) shows the "Hot Ranges" menu item as
highlighted in the left side menu. The back button in the Range Report
page redirects back to the Hot Ranges page.

104929: opt: resolve TupleStars in UDFs and views r=mgartner a=mgartner

Star expressions have been allowed in UDFs since #95710 and in views
since #97515. This commit lifts a restriction that prevented table
references from being resolved as TupleStars in SELECT lists inside UDFs
and views. For example, an expression like `SELECT tbl FROM tbl` is now
allowed inside views and UDFs.

Fixes #104927
Fixes #97602 

Release note (sql change): Table names are now allowed in SELECT lists
inside view and UDF definitions.


104946: sql: update tenant_span builtins for consistent output r=arulajmani a=stevendanna

The codec's TenantSpan() function was changed to avoid PrefixEnd(). Here, we update overloads of crdb_internal.tenant_span to all use TenantSpan() to avoid inconsistent output from different overloads.

Note, I don't believe that the use of PrefixEnd() in this function constituted a problem in our current usage as the only real use of this function was in calls to crdb_internal.fingerprint() or crdb_internal.scan() where the EndKey is only used as an upper-bound for iteration.

Informs #104928

Epic: none

Release note: None

Co-authored-by: gtr <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Steven Danna <[email protected]>
@shralex shralex added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Jul 17, 2023
@craig craig bot closed this as completed in b7840ea Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-multitenant Issues owned by the multi-tenant virtual team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants