Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: harden lease-upgrade logic #88301

Closed
3 tasks done
irfansharif opened this issue Sep 20, 2022 · 4 comments
Closed
3 tasks done

kvserver: harden lease-upgrade logic #88301

irfansharif opened this issue Sep 20, 2022 · 4 comments
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Sep 20, 2022

Is your feature request related to a problem? Please describe.

In #85629 we changed our lease transfer protocol to only ever transfer expiration-based leases, and have recipients later upgrade them to the more efficient epoch based ones. This was done to limit the effects of ill-advised lease transfers since the incoming leaseholder would need to recognize itself as such within a few seconds -- so we wanted this upgrade happen after having received the lease. This issue tracks some hardening work we want to do before ship the release (tracking using the GA-blocker label).

  • Version gate logic to transfer out expiration based leases, and write a test. We don't want to transfer expiration based leases to 22.1 nodes that don't know to upgrade them immediately (done in #88408).
  • Add cluster setting to disable this code out of an abundance of caution (done in #88408).
  • Consider retaining the same lease sequence number across the lease upgrade process (might need to account for this logic). The change in sequence numbers causes us to reject requests proposed under the old lease, which seems unnecessary.

Jira issue: CRDB-19766

@irfansharif irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) GA-blocker labels Sep 20, 2022
@irfansharif irfansharif self-assigned this Sep 20, 2022
@blathers-crl
Copy link

blathers-crl bot commented Sep 20, 2022

Hi @irfansharif, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@irfansharif irfansharif added stability-period Intended to be worked on during a stability period. Use with the Milestone field to specify version. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 and removed stability-period Intended to be worked on during a stability period. Use with the Milestone field to specify version. labels Sep 20, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 21, 2022
Informs cockroachdb#88301. EnableLeaseUpgrade version gates a change in the lease
transfer protocol whereby we only ever transfer expiration-based leases
(and have recipients later upgrade them to the more efficient epoch
based ones). This was done to limit the effects of ill-advised lease
transfers since the incoming leaseholder would need to recognize itself
as such within a few seconds. This needs version gating so that in
mixed-version clusters, as part of lease transfers, we don't start
sending out expiration based leases to nodes that (i) don't expect them
for certain keyspans, and (ii) don't know to upgrade them to efficient
epoch-based ones.

While here, we also introduce a (hidden, default=true) cluster setting
kv.transfer_expiration_leases_first.enabled to feature gate this
protocol change.
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 23, 2022
Informs cockroachdb#88301. EnableLeaseUpgrade version gates a change in the lease
transfer protocol whereby we only ever transfer expiration-based leases
(and have recipients later upgrade them to the more efficient epoch
based ones). This was done to limit the effects of ill-advised lease
transfers since the incoming leaseholder would need to recognize itself
as such within a few seconds. This needs version gating so that in
mixed-version clusters, as part of lease transfers, we don't start
sending out expiration based leases to nodes that (i) don't expect them
for certain keyspans, and (ii) don't know to upgrade them to efficient
epoch-based ones.

While here, we also introduce a (hidden, default=true) cluster setting
kv.transfer_expiration_leases_first.enabled to feature gate this
protocol change.
craig bot pushed a commit that referenced this issue Sep 23, 2022
85965: ccl/jwtauthccl: Implement JWT based auth for CRDB SSO r=kpatron-cockroachlabs a=kpatron-cockroachlabs

In order to enable future work to allow customers to log into
CRDB instances using SSO, this change implements a new type of
auth based on JWTs. Its enablement and configuration is controlled
by cluster settings instead of being added as an hba_conf option.

In particular, this change introduces three cluster settings:
server.jwt_authentication.enable, server.jwt_authentication.jwks,
and server.jwt_authentication.issuers. If during login, the custom
option `--jwt.auth=true` is sent, the JWT code path is activated
and the contents of the "password" field are analyzed as a token,
if the enabled cluster setting is set to true.

The token signature is validated with the jwks cluster setting.
The token's expiration time is validated, the subject field is
matched to the username requested, the audience is field is
matched against the cluster's ID and the issuer is matched to
the value of the issuers cluster setting.

Finally, if the token passes all of those checks, CRDB ensures
that an enterprise license is active.

Release note (enterprise change): A new JWT based auth method
has been introduced as an option. This work lays the ground
work for users to have SSO based login into their
CockroachDB Dedicated clusters.

87292: Fix show_backup diagram. r=nickvigilante a=stbof

DOC-5517

The show_backup diagram doesn't unlink `location_opt_list`, so links to it in the SQL grammar. The master branch build fails.

Release justification: diagram update.

87932: Diagrams for UDF statements. r=chengxiong-ruan a=stbof



88408: kvserver,clusterversion: {version,feature}-gate lease upgrade logic r=irfansharif a=irfansharif

Informs #88301. EnableLeaseUpgrade version gates a change in the lease transfer protocol whereby we only ever transfer expiration-based leases (and have recipients later upgrade them to the more efficient epoch based ones). This was done to limit the effects of ill-advised lease transfers since the incoming leaseholder would need to recognize itself as such within a few seconds. This needs version gating so that in mixed-version clusters, as part of lease transfers, we don't start sending out expiration based leases to nodes that (i) don't expect them for certain keyspans, and (ii) don't know to upgrade them to efficient epoch-based ones.

While here, we also introduce a (hidden, default=true) cluster setting kv.transfer_expiration_leases_first.enabled to feature gate this protocol change.

Co-authored-by: Kyle Patron <[email protected]>
Co-authored-by: Stephanie Bodoff <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 23, 2022
Informs cockroachdb#88301. EnableLeaseUpgrade version gates a change in the lease
transfer protocol whereby we only ever transfer expiration-based leases
(and have recipients later upgrade them to the more efficient epoch
based ones). This was done to limit the effects of ill-advised lease
transfers since the incoming leaseholder would need to recognize itself
as such within a few seconds. This needs version gating so that in
mixed-version clusters, as part of lease transfers, we don't start
sending out expiration based leases to nodes that (i) don't expect them
for certain keyspans, and (ii) don't know to upgrade them to efficient
epoch-based ones.

While here, we also introduce a (hidden, default=true) cluster setting
kv.transfer_expiration_leases_first.enabled to feature gate this
protocol change.
@nvanbenschoten
Copy link
Member

@irfansharif should we remove the GA blocker label from this issue? Have we found a low-risk way of addressing the final point here?

@irfansharif
Copy link
Contributor Author

I haven't looked into yet, but I don't think that warrants a GA-blocker (removed). I'd be ok addressing it on master, letting it bake, and backporting to a point release after some confidence.

@irfansharif irfansharif removed their assignment Mar 10, 2023
@nvanbenschoten
Copy link
Member

Consider retaining the same lease sequence number across the lease upgrade process (might need to account for this logic). The change in sequence numbers causes us to reject requests proposed under the old lease, which seems unnecessary.

Rediscoverd and subsumed by #117630.

Closing this issue, as the other two tasks have been completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

2 participants