-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: v23.1.2-custom: failed to decode tenant prefix from key for replica: insufficient bytes to decode uvarint value #104606
Labels
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-sentry
Originated from an in-the-wild panic report.
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-kv
KV Team
Comments
cockroach-teamcity
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-sentry
Originated from an in-the-wild panic report.
labels
Jun 8, 2023
yuzefovich
changed the title
sentry: replica_init.go:371: log.Fatal: failed to decode tenant prefix from key for replica [n1,s1,r&{5708/1:{-} × ×}]: insufficient bytes to decode uvarint value: ×
(1) attached stack trace
-- stack trace:...
kvserver: v23.1.2-custom: failed to decode tenant prefix from key for replica: insufficient bytes to decode uvarint value
Jun 10, 2023
kvoli
added
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
labels
Jun 13, 2023
The invalid tenant prefix should be caught in We don't know where the split came from, however logging from splunk shows it wasn't from the splitQueue - as the bad split points were manual. #104802 is a fix to validate tenant ID of the split key. Which should remove the release blocker once backported to 23.1 #104850. |
Closed by #104850 |
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Jun 14, 2023
Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as `MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID corresponds to one of the boundary points in `EncodeUvarintAscending`. Prior to this patch, various places in the code did not appreciate this subtlety. This proved problematic when we started creating split points at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal to one of these boundary points (e.g. 109), decoding the key `MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant ID (in our case, 110). Instead, it results in an error. Worse yet, various places in KV assume doing such tenant decoding at a range's start key is kosher. We've since disallowed KV from accepting such splits in cockroachdb#104802. This patch fixes the root cause of why these splits were being issued in the first place -- over in cockroachdb@ac54eba, we started creating spilt points at the end of a tenant's keyspace (in addition to the start of it). We did so using `PrefixEnd()`, which we have since learned is not what we wanted here. In addition to the initial split point, we also fix the span config records we seed during tenant creation. We've also added a test to ensure all split points created during cluster creation are kosher. The test uses randomized tenant IDs -- I confirmed that it fails with tenant ID = 109 (without my changes). Lastly, there's a bit more auditing work that needs to be done here about these assumptions. That's captured in a followup issue cockroachdb#104928. Prevents cockroachdb#104606 from happening. Release note (bug fix): Fixes a bug where tenant creation for certain IDs would always fail because of invalid split points. Additionally, such tenant creation failure could leave the host cluster's span config state entirely busted -- we prevent that as well.
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Jun 15, 2023
Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as `MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID corresponds to one of the boundary points in `EncodeUvarintAscending`. Prior to this patch, various places in the code did not appreciate this subtlety. This proved problematic when we started creating split points at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal to one of these boundary points (e.g. 109), decoding the key `MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant ID (in our case, 110). Instead, it results in an error. Worse yet, various places in KV assume doing such tenant decoding at a range's start key is kosher. We've since disallowed KV from accepting such splits in cockroachdb#104802. This patch fixes the root cause of why these splits were being issued in the first place -- over in cockroachdb@ac54eba, we started creating spilt points at the end of a tenant's keyspace (in addition to the start of it). We did so using `PrefixEnd()`, which we have since learned is not what we wanted here. In addition to the initial split point, we also fix the span config records we seed during tenant creation. We've also added a test to ensure all split points created during cluster creation are kosher. The test uses randomized tenant IDs -- I confirmed that it fails with tenant ID = 109 (without my changes). Lastly, there's a bit more auditing work that needs to be done here about these assumptions. That's captured in a followup issue cockroachdb#104928. Prevents cockroachdb#104606 from happening. Release note (bug fix): Fixes a bug where tenant creation for certain IDs would always fail because of invalid split points. Additionally, such tenant creation failure could leave the host cluster's span config state entirely busted -- we prevent that as well.
craig bot
pushed a commit
that referenced
this issue
Jun 15, 2023
104920: sql: refrain from performing invalid splits during tenant creation r=nvanbenschoten,ecwall a=arulajmani Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as `MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID corresponds to one of the boundary points in `EncodeUvarintAscending`. Prior to this patch, various places in the code did not appreciate this subtlety. This proved problematic when we started creating split points at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal to one of these boundary points (e.g. 109), decoding the key `MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant ID (in our case, 110). Instead, it results in an error. Worse yet, various places in KV assume doing such tenant decoding at a range's start key is kosher. We've since disallowed KV from accepting such splits in #104802. This patch fixes the root cause of why these splits were being issued in the first place -- over in ac54eba, we started creating spilt points at the end of a tenant's keyspace (in addition to the start of it). We did so using `PrefixEnd()`, which we have since learned is not what we wanted here. In addition to the initial split point, we also fix the span config records we seed during tenant creation. We've also added a test to ensure all split points created during cluster creation are kosher. The test uses randomized tenant IDs -- I confirmed that it fails with tenant ID = 109 (without my changes). Lastly, there's a bit more auditing work that needs to be done here about these assumptions. That's captured in a followup issue #104928. Prevents #104606 from happening. Release note (bug fix): Fixes a bug where tenant creation for certain IDs would always fail because of invalid split points. Additionally, such tenant creation failure could leave the host cluster's span config state entirely busted -- we prevent that as well. Co-authored-by: Arul Ajmani <[email protected]>
arulajmani
added a commit
to arulajmani/cockroach
that referenced
this issue
Jun 15, 2023
Normally, `MakeTenantPrefix(ID).PrefixEnd()` is the same as `MakeTenantPrefix(ID + 1)`. However, this isn't true if the ID corresponds to one of the boundary points in `EncodeUvarintAscending`. Prior to this patch, various places in the code did not appreciate this subtlety. This proved problematic when we started creating split points at `MakeTenantPrefix(ID).PrefixEnd()`. Now, if the tenant ID was equal to one of these boundary points (e.g. 109), decoding the key `MakeTenantPrefix(ID).PrefixEnd()` does not return the expected tenant ID (in our case, 110). Instead, it results in an error. Worse yet, various places in KV assume doing such tenant decoding at a range's start key is kosher. We've since disallowed KV from accepting such splits in cockroachdb#104802. This patch fixes the root cause of why these splits were being issued in the first place -- over in cockroachdb@ac54eba, we started creating spilt points at the end of a tenant's keyspace (in addition to the start of it). We did so using `PrefixEnd()`, which we have since learned is not what we wanted here. In addition to the initial split point, we also fix the span config records we seed during tenant creation. We've also added a test to ensure all split points created during cluster creation are kosher. The test uses randomized tenant IDs -- I confirmed that it fails with tenant ID = 109 (without my changes). Lastly, there's a bit more auditing work that needs to be done here about these assumptions. That's captured in a followup issue cockroachdb#104928. Prevents cockroachdb#104606 from happening. Release note (bug fix): Fixes a bug where tenant creation for certain IDs would always fail because of invalid split points. Additionally, such tenant creation failure could leave the host cluster's span config state entirely busted -- we prevent that as well.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
branch-release-23.1
Used to mark GA and release blockers, technical advisories, and bugs for 23.1
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-sentry
Originated from an in-the-wild panic report.
release-blocker
Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
T-kv
KV Team
This issue was autofiled by Sentry. It represents a crash or reported error on a live cluster with telemetry enabled.
Sentry link: https://cockroach-labs.sentry.io/issues/4236862053/?referrer=webhooks_plugin
Panic message:
Stacktrace (expand for inline code snippets):
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go#L370-L372 in pkg/kv/kvserver.(*Replica).setDescLockedRaftMuLocked
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/replica_init.go#L212-L214 in pkg/kv/kvserver.(*Replica).initRaftMuLockedReplicaMuLocked
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/store_split.go#L256-L258 in pkg/kv/kvserver.prepareRightReplicaForSplit
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/store_split.go#L180-L182 in pkg/kv/kvserver.splitPostApply
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_result.go#L298-L300 in pkg/kv/kvserver.(*Replica).handleSplitResult
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go#L319-L321 in pkg/kv/kvserver.(*replicaStateMachine).handleNonTrivialReplicatedEvalResult
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/replica_application_state_machine.go#L186-L188 in pkg/kv/kvserver.(*replicaStateMachine).ApplySideEffects
cockroach/pkg/kv/kvserver/apply/cmd.go
Lines 209 to 211 in 429af13
cockroach/pkg/kv/kvserver/apply/task.go
Lines 294 to 296 in 429af13
cockroach/pkg/kv/kvserver/apply/task.go
Lines 250 to 252 in 429af13
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go#L1008-L1010 in pkg/kv/kvserver.(*Replica).handleRaftReadyRaftMuLocked
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/replica_raft.go#L721-L723 in pkg/kv/kvserver.(*Replica).handleRaftReady
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/store_raft.go#L645-L647 in pkg/kv/kvserver.(*Store).processReady
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go#L394-L396 in pkg/kv/kvserver.(*raftSchedulerShard).worker
https://github.com/cockroachdb/cockroach/blob/429af1379325bf0521dde42dd284b2aaf6d4282c/pkg/kv/kvserver/pkg/kv/kvserver/scheduler.go#L301-L303 in pkg/kv/kvserver.(*raftScheduler).Start.func2
cockroach/pkg/util/stop/stopper.go
Lines 469 to 471 in 429af13
GOROOT/src/runtime/asm_amd64.s#L1593-L1595 in runtime.goexit
v23.1.2-224-g429af137932
Jira issue: CRDB-28626
The text was updated successfully, but these errors were encountered: