Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: avoid *Transaction -> []LockUpdate translation in IntentResolver.ResolveIntents #77219

Closed
nvanbenschoten opened this issue Mar 1, 2022 · 0 comments · Fixed by #86988
Closed
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. E-quick-win Likely to be a quick win for someone experienced. T-kv KV Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Mar 1, 2022

In #76282, we saw OOMs during a highly concurrent data ingestion workload. The OOMs were permitted in part because CRDB did not push back on the client fast enough, allowing too much async work to build up in the IntentResolver.

To avoid some memory amplification along the intent resolution path, we could rework the contract of IntentResolver.ResolveIntents to avoid the eager *Transaction -> []LockUpdate translation.

Instead of calling LocksAsLockUpdates here, which allocates memory proportionally to the number of intent spans in a txn, we should have ResolveIntents accept an interface which lazily transforms each LockSpan in a Transaction to a LockUpdate in this loop.

Jira issue: CRDB-13478

@nvanbenschoten nvanbenschoten added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-kv-transactions Relating to MVCC and the transactional model. E-quick-win Likely to be a quick win for someone experienced. labels Mar 1, 2022
@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 1, 2022
shralex added a commit to shralex/cockroach that referenced this issue Aug 28, 2022
…ting

Previously, we called LocksAsLockUpdates before calling ResolveIntents, which
pre-allocated memory for all LockUpdates. In this PR we change the interface
of ResolveIntents to avoid this memory allocation and perform the translation
of Span to LockUpdate lazily, as we iterate over them in ResolveIntents.

Release justification: stability change that may help avoid OOM.
Release note: None

Resolves: cockroachdb#77219
Jira issue: https://cockroachlabs.atlassian.net/browse/CRDB-13478
shralex added a commit to shralex/cockroach that referenced this issue Aug 29, 2022
…ting

Previously, we called LocksAsLockUpdates before calling ResolveIntents, which
pre-allocated memory for all LockUpdates. In this PR we change the interface
of ResolveIntents to avoid this memory allocation and perform the translation
of Span to LockUpdate lazily, as we iterate over them in ResolveIntents.

Release justification: stability change that may help avoid OOM.
Release note: None

Resolves: cockroachdb#77219
Jira issue: https://cockroachlabs.atlassian.net/browse/CRDB-13478
craig bot pushed a commit that referenced this issue Sep 6, 2022
86702: ui: add decommissioning relevant graphs to metrics replication dashboard r=Santamaura a=Santamaura

This change adds new graphs to the metrics replication
dashboard. New metrics visualized on the dashboard can be used
to help triage decommissioning issues. Metrics visualized
include:
- queue.replicate.addreplica.(success|error)
- queue.replicate.removereplica.(success|error)
- queue.replicate.replacedeadreplica.(success|error)
- queue.replicate.removedeadreplica.(success|error)
- queue.replicate.replacedecommissioningreplica.(success|error)
- queue.replicate.removedecommissioningreplica.(success|error)
- range.snapshots.recv-queue
- queue.replicate.purgatory
- range.snapshots.rebalancing.rcvd-bytes
- range.snapshots.recovery.rcvd-bytes

Release justification: low risk, high benefit changes to
existing functionality.

Resolves #86599

Release note (ui change): introduce new graphs on metrics
replication dashboard to improve decommissioning observability

86988: kvserver: lazily translate Spans to LockUpdates instead of pre-alloca… r=shralex a=shralex

…ting

Previously, we called LocksAsLockUpdates before calling ResolveIntents, which
pre-allocated memory for all LockUpdates. In this PR we change the interface
of ResolveIntents to avoid this memory allocation and perform the translation
of Span to LockUpdate lazily, as we iterate over them in ResolveIntents.

Release justification: stability change that may help avoid OOM.
Release note: None

Resolves: #77219
Jira issue: https://cockroachlabs.atlassian.net/browse/CRDB-13478

Co-authored-by: Santamaura <[email protected]>
Co-authored-by: shralex <[email protected]>
@craig craig bot closed this as completed in 5ac62b8 Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-transactions Relating to MVCC and the transactional model. C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. E-quick-win Likely to be a quick win for someone experienced. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant