kvcoord: Consider increasing `max_intents_bytes` default #54029

bdarnell · 2020-09-08T14:54:24Z

When a transaction's write set exceeds the kv.transaction.max_intents_bytes size, performance gets dramatically worse since point intent resolutions are changed to range intent resolutions. This can cause unexpectedly high contention between transactions if batched point deletes (for example) must perform intent resolution over large ranges. A motivating example is a "garbage collection" transaction that does a lot of point deletes - if it is big enough to exceed max_intents_bytes, it can have contention with (high-priority) read-only transactions even if they do not touch any garbage-collected keys.

The default value of 256KB was chosen without much thought IIRC; we should consider increasing it. We know of at least one customer who runs with this setting increased to 5MB. The downside to making this setting too high is that if you have a lot of concurrent transactions with large write sets, you could use a lot of unaccounted-for memory. If this were hooked up to proper memory accounting, it would be feasible to make the limit very high (perhaps limited only by the accounting system).

Similar arguments may apply to the max_refresh_spans_bytes setting, although this has not been observed to be as much of a problem in practice.

These settings are (deliberately) soft limits, so that we can accept arbitrarily large transactions with controlled memory usage, but the resulting discontinuity in performance is dangerous. It may be worth providing an option to turn these into hard limits, so that transactions will fail rather than have surprising performance implications.

Epic: CRDB-8282

The text was updated successfully, but these errors were encountered:

blathers-crl · 2020-09-08T14:54:25Z

Hi @bdarnell, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

jordanlewis · 2020-09-08T19:20:47Z

cc @tbg, this is a better writeup of what we've discussed a couple of times.

ajwerner · 2020-09-09T03:20:15Z

Relates to but is more pragmatic than #52768.

aliher1911 · 2021-06-22T13:53:56Z

Condensable Span Set Budgeting

Problem

Transaction coordinator keeps track of intents in a number of condensableSpanSet's to perform intent cleanup when transaction is committed or aborted. For large transactions the size of this set could become considerable. To limit consumed memory cockroach uses a configurable limit to start merging spans once limit is exceeded to try to reduce memory footprint.
In current implementation limit is set quite low to reduce potential damage it could cause by exhausting all available memory. We can increase the limit to 2x-4x-?x or try to use shared memory budget to increase the limit but try to keep total memory usage constrained.

Naive Solution

We could use the overall memory budget that we could borrow from when growing the spanset and return memory back when the transaction is finished.

Issues

There are a number of immediate issues that this approach could cause.
What to do when memory budget is exhausted
Budget would act as a soft limit and would not prevent new spans from being added. We may treat it as a budget for non codensed spansets. If you don't have a budget you must condense the spanset. We could also have a hard limit on when we always try to condense.
Would large transactions consuming too much budget unnecessarily affect smaller ones
This could be resolved by having a minimum size threshold under which we would not try to condens spansets.
What happens if budget setting was changed and current usage is violated by non condensed spansets
Applying budget changes to existing spansets would require too much synchronization for what is actually local data. That would be too complicated and would introduce too much contention. Instead we could just look at the currently set value whenever any spanset is ready to condense and check if it needs condensing. As soon as all active spansets are gone we would be down to new memory budget.

Details

Current spanset usage

txn_interceptor_pipeliner iteratively updates intents when processing request and response in Send. Once the send operation is processed it would check if span needs to be condensed.
it also updates spanset in epoch bump and savepoint rollback, but without checking limits
spanset is destroyed when interceptor is finished in closeLocked

Changes

This logic could be changed to track unbudgeted intents and when it check if newly added data would run over budget. Then it would condense and return freed up memory if possible, or register additional memory if condensing didn't free enough memory.
Memory would be returned when spanset is destroyed in closeLocked

Other uses

Spanset is also use in txn_interceptor_span_refresher where it could be updated, condensed and also cleared in multiple places, but the pattern is similar, we collect unbudgeted spans while adding them and consult budget when we can condense. We also return memory back when we either clean explicitly or destroy the interceptor.

Alternatively…

We can just bump the current limit as it is set too conservatively.
In that case the question is what would be the limit and who are the most exposed people to it to ask for knowledge.

bdarnell · 2021-06-22T14:21:58Z

Yes, I think a shared memory budget is the right answer (this is what I was referring to with "If this were hooked up to proper memory accounting" in the original post).

Would large transactions consuming too much budget unnecessarily affect smaller ones

This is the biggest potential issue in my view. As you suggest below, I think we need some minimum allowance per transaction so that smallish transactions can get consistent performance.

Thinking further about this, should we have multiple budgets (per tenant or per user) so that there's less risk of a noisy neighbor disrupting other tenant/users' operations?

aliher1911 · 2021-06-22T14:47:41Z

Ideally I think it would nice to have something similar to how cgroups manages cpu slices where every tenant has guaranteed allocation but any unused slices are up for grabs, but as soon as owner wants them freeloaders are slowed down. But that works for them because slices are ephemeral. For lazy condensing it would be more tricky as long running bulk transaction would hold them uncompressed while others will already starve.

I image it could work with multiple "subpools" where we could actually go above global pool limit for well behaved tenants while misbehaving one would capped on next update.

Or maybe have separate "guaranteed" pool for small transactions per tenant and have a shared one for larger ones.

If the concept of budget/policy that spanset interacts with is well defined we can actually start with global pool and then change it to accommodate per tenant/client pools.

It would be helpful to know how much transaction size imbalance we actually have within a node on multi tenant clusters.

sumeerbhola · 2021-06-22T17:41:51Z

Yes, I think a shared memory budget is the right answer (this is what I was referring to with "If this were hooked up to proper memory accounting" in the original post).

There's going to be plumbing to account for memory once #66362 merges. It would be good to account in other places in KV-land.

Also, do we need this if we have #66268 for 21.2?

aliher1911 · 2021-06-24T11:30:33Z

Also, do we need this if we have #66268 for 21.2?

That would allow us to use ranged resolve more aggressively without current penalties, right? In that case increasing limits might not make sense. Having a cap on memory usage might still be helpful.

Do we have any other resources that we track/limit per tenant inside KV to shield them from each other?

aliher1911 · 2021-06-24T17:45:16Z

There's another aspect to range cleanup that I missed, we add range to latchspan and condensed ranges won't be free till there's a latchless cleanup. So even in presence of pebble optimizations it still looks as an improvement.
Memory budget wise it is less clear as there's some demand for better resource accounting overall but it will need some plumbing for client/tenant propagation.

andreimatei · 2021-06-27T21:08:08Z

Would large transactions consuming too much budget unnecessarily affect smaller ones

What I was thinking of doing is maintain a maximum per txn in addition to the global limit, but have it be much larger - maybe 100MB. Particularly since we're also in the process of introducing an option to reject txns over their limit, instead of condensing the lock spans.

sumeerbhola · 2021-06-28T12:13:58Z

Do we have any other resources that we track/limit per tenant inside KV to shield them from each other?

There is the rate limiting (tenantrate and tenantcostmodel packages).
Note that the memory accounting being added is not per tenant, though we should consider extending it to be per-tenant (e.g. no tenant can consume more than the limit/4).

66859: kvcoord: increase max_intent_bytes default to 4MB r=andreimatei,erikgrinaker a=aliher1911 Previously limit was set conservatively to 256KB. Having a conservative limit can cause intent cleanup batches to condense aggressively resorting to range cleanups which are more expensive. Increasing defaults will reduce contention during cleanup at the expense of increased memory for large transactions on tx_coord_sender. Release note (performance improvement): Increase default value for kv.transaction.max_intents_bytes cluster setting from 256KB to 4MB to improve transaction performance for large transactions at the expense of increased memory usage. Transactions above this size limit use slower cleanup mode for commits and aborts. #54029 Co-authored-by: Oleg Afanasyev <[email protected]>

lunevalex · 2021-07-06T20:02:39Z

#66859 sorts this

bdarnell added A-kv-transactions Relating to MVCC and the transactional model. C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels Sep 8, 2020

ericharmeling mentioned this issue Sep 17, 2020

Update batch delete guidance when max_intents_bytes default is changed. cockroachdb/docs#8422

Open

mgartner mentioned this issue Apr 7, 2021

Using index for NULL values is slower than full table scan (not using index) #62963

Closed

erikgrinaker self-assigned this May 14, 2021

erikgrinaker mentioned this issue May 14, 2021

kvcoord: replace max_intents_bytes with shared memory budget #65192

Open

erikgrinaker assigned aliher1911 and unassigned erikgrinaker May 18, 2021

jlinder added the T-kv KV Team label Jun 16, 2021

aliher1911 mentioned this issue Jun 24, 2021

kvcoord: increase max_intent_bytes default to 4MB #66859

Merged

lunevalex added N-followup Needs followup. O-postmortem Originated from a Postmortem action item. labels Jun 25, 2021

lunevalex closed this as completed Jul 6, 2021

jordanlewis mentioned this issue Aug 18, 2021

sql: bulk UPDATE statement in implicit txn will retry indefinitely due to txn commit deadline errors #69089

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvcoord: Consider increasing `max_intents_bytes` default #54029

kvcoord: Consider increasing `max_intents_bytes` default #54029

bdarnell commented Sep 8, 2020 •

edited by lunevalex

Loading

blathers-crl bot commented Sep 8, 2020

jordanlewis commented Sep 8, 2020

ajwerner commented Sep 9, 2020

aliher1911 commented Jun 22, 2021

bdarnell commented Jun 22, 2021

aliher1911 commented Jun 22, 2021

sumeerbhola commented Jun 22, 2021

aliher1911 commented Jun 24, 2021

aliher1911 commented Jun 24, 2021

andreimatei commented Jun 27, 2021

sumeerbhola commented Jun 28, 2021

lunevalex commented Jul 6, 2021

kvcoord: Consider increasing max_intents_bytes default #54029

kvcoord: Consider increasing max_intents_bytes default #54029

Comments

bdarnell commented Sep 8, 2020 • edited by lunevalex Loading

blathers-crl bot commented Sep 8, 2020

jordanlewis commented Sep 8, 2020

ajwerner commented Sep 9, 2020

aliher1911 commented Jun 22, 2021

Condensable Span Set Budgeting

Problem

Naive Solution

Issues

Details

Current spanset usage

Changes

Other uses

Alternatively…

bdarnell commented Jun 22, 2021

aliher1911 commented Jun 22, 2021

sumeerbhola commented Jun 22, 2021

aliher1911 commented Jun 24, 2021

aliher1911 commented Jun 24, 2021

andreimatei commented Jun 27, 2021

sumeerbhola commented Jun 28, 2021

lunevalex commented Jul 6, 2021

kvcoord: Consider increasing `max_intents_bytes` default #54029

kvcoord: Consider increasing `max_intents_bytes` default #54029

bdarnell commented Sep 8, 2020 •

edited by lunevalex

Loading