*: lower default GC TTL #89233

irfansharif · 2022-10-03T19:49:53Z

This is tracking issue to lower the default GC TTL from 25h. There's an internal document that more fully spells out some of the motivations and requisite steps, the text for which was copied over below. There's also this internal slack thread.

Is your feature request related to a problem? Please describe.

The default GC TTL today is 25h, which means overwritten values in the last 25h window are retained. This translates to higher storage use and for outbox-like workloads where rows are deleted frequently, can make for costlier reads CPU wise since they have to scan over a larger number of overwritten values to get to the one of interest (this is not fundamental). Since all versions of key are stored within a single range (we split along key boundaries, not within different values of a key), we’ve seen incidents where we have large (>> 512 MB) unsplittable ranges causing cluster instability: snapshot timeouts all get triggered, other snapshots get queued behind it, range starts backpressuring writes (can be indistinguishable from an outage).

We chose a default of 25h originally to accommodate daily incremental backups with revision history. They fail (informatively) if the data you are trying to backup was GC-ed, which happens when incremental backups are taken less frequently than the GC periods for any of the objects in the base backup. In 22.2 however scheduled backups “chain together” protected timestamp records, which lets your scheduled backup protect only what’s needed and ensure coverage of revision history across each incremental backup. The short of it is we no longer need a 25h default, hence this issue.

Describe the solution you'd like

Lower it to something like 4h or 1.5h.

Jira issue: CRDB-20144

irfansharif · 2022-10-03T19:49:59Z

@shralex, @nvanbenschoten, @williamkulju, @mwang1026.

ajwerner · 2022-10-03T19:51:24Z

@fqazi is actively working on #84911. It should block any change to the GC TTL. A fix should come together soon.

miretskiy · 2022-10-05T22:23:17Z

CDC would need to address #89450

And add an upgrade for existing clusters to explicitly retain whatever value they were using pre-upgrade. Fixes cockroachdb#89233. Release note (general change): The GC TTL previously defaulted to 25h. This value was configurable using `ALTER RANGE DEFAULT CONFIGURE ZONE USING gc.ttlseconds = <whatever>`, but also possible to scope to specific schema objects using `ALTER {DATABASE,TABLE,INDEX} CONFIGURE ZONE USING ...`. This value determine how long overwritten values were retained. The `RANGE DEFAULT` value is now lowered to 4h but only for freshly created clusters. When existing clusters upgrade onto this release, we will respect whatever value they were using before the upgrade for all their schema objects. This will be 25h if the GC TTL was never altered, or some specific value if set explicitly. We've found the 25h value to translate to higher-than-necessary storage costs, especially for workloads where rows are deleted frequently. It can also make for costlier reads with respect to CPU since we currently have to scan over overwritten values to get to the one of interest. Finally, we've also observed cluster instability due to large unsplittable ranges that have accumulated an excessive amount of MVCC garbage. We chose a default of 25h originally to accommodate daily incremental backups with revision history. But with the introduction of scheduled backups introduced in 22.2, we no longer need a large GC TTL. Scheduled backups "chain together" and prevent garbage collection of relevant data to ensure coverage of revision history across backups, decoupling it from whatever value is used for GC TTL. So we no longer need a 25h default, hence this change. The GC TTL determines how far back AS OF SYSTEM TIME queries can go, which now if going past `now()-4h`, will start failing informatively. To support larger windows for AS OF SYSTEM TIME queries, users are encouraged to pick a more appropriate GC TTL and set it using `ALTER ... CONFIGURE ZONE using gc.ttlseconds = <whatever>`. The earlier considerations around storage use, read costs, and stability still apply. Release note (backward-incompatible change): See release note above. Technically this is not a backwards-incompatible change since we're only changing the default value that new clusters are initialized with -- existing clusters will remain unaffected. But it might be worth highlighting this change more prominently to our users for added scrutiny.

Fixes cockroachdb#89233. Release note (general change): The GC TTL previously defaulted to 25h. This value was configurable using `ALTER RANGE DEFAULT CONFIGURE ZONE USING gc.ttlseconds = <whatever>`, but also possible to scope to specific schema objects using `ALTER {DATABASE,TABLE,INDEX} CONFIGURE ZONE USING ...`. This value determine how long overwritten values were retained. The `RANGE DEFAULT` value is now lowered to 4h but only for freshly created clusters. When existing clusters upgrade onto this release, we will respect whatever value they were using before the upgrade for all their schema objects. This will be 25h if the GC TTL was never altered, or some specific value if set explicitly. Full cluster backups taken on earlier version clusters, when restored to clusters that started off at v23.1, will use the GC TTL recorded in the backup image. We've found the 25h value to translate to higher-than-necessary storage costs, especially for workloads where rows are deleted frequently. It can also make for costlier reads with respect to CPU since we currently have to scan over overwritten values to get to the one of interest. Finally, we've also observed cluster instability due to large unsplittable ranges that have accumulated an excessive amount of MVCC garbage. We chose a default of 25h originally to accommodate daily incremental backups with revision history. But with the introduction of scheduled backups introduced in 22.2, we no longer need a large GC TTL. Scheduled backups "chain together" and prevent garbage collection of relevant data to ensure coverage of revision history across backups, decoupling it from whatever value is used for GC TTL. So we no longer need a 25h default, hence this change. The GC TTL determines how far back AS OF SYSTEM TIME queries can go, which now if going past `now()-4h`, will start failing informatively. To support larger windows for AS OF SYSTEM TIME queries, users are encouraged to pick a more appropriate GC TTL and set it using `ALTER ... CONFIGURE ZONE using gc.ttlseconds = <whatever>`. The earlier considerations around storage use, read costs, and stability still apply. Release note (backward-incompatible change): See release note above. Technically this is not a backwards-incompatible change since we're only changing the default value that new clusters are initialized with -- existing clusters will remain unaffected. But it might be worth highlighting this change more prominently to our users for added scrutiny.

93836: *: lower default GC TTL to 4h r=irfansharif a=irfansharif Fixes #89233. Release note (general change): The GC TTL previously defaulted to 25h. This value was configurable using `ALTER RANGE DEFAULT CONFIGURE ZONE USING gc.ttlseconds = <whatever>`, but also possible to scope to specific schema objects using `ALTER {DATABASE,TABLE,INDEX} CONFIGURE ZONE USING ...`. This value determine how long overwritten values were retained. The `RANGE DEFAULT` value is now lowered to 4h but only for freshly created clusters. When existing clusters upgrade onto this release, we will respect whatever value they were using before the upgrade for all their schema objects. This will be 25h if the GC TTL was never altered, or some specific value if set explicitly. Full cluster backups taken on earlier version clusters, when restored to clusters that started off at v23.1, will use the GC TTL recorded in the backup image. We've found the 25h value to translate to higher-than-necessary storage costs, especially for workloads where rows are deleted frequently. It can also make for costlier reads with respect to CPU since we currently have to scan over overwritten values to get to the one of interest. Finally, we've also observed cluster instability due to large unsplittable ranges that have accumulated an excessive amount of MVCC garbage. We chose a default of 25h originally to accommodate daily incremental backups with revision history. But with the introduction of scheduled backups introduced in 22.2, we no longer need a large GC TTL. Scheduled backups "chain together" and prevent garbage collection of relevant data to ensure coverage of revision history across backups, decoupling it from whatever value is used for GC TTL. So we no longer need a 25h default, hence this change. The GC TTL determines how far back AS OF SYSTEM TIME queries can go, which now if going past `now()-4h`, will start failing informatively. To support larger windows for AS OF SYSTEM TIME queries, users are encouraged to pick a more appropriate GC TTL and set it using `ALTER ... CONFIGURE ZONE using gc.ttlseconds = <whatever>`. The earlier considerations around storage use, read costs, and stability still apply. Release note (backward-incompatible change): See release note above. Technically this is not a backwards-incompatible change since we're only changing the default value that new clusters are initialized with -- existing clusters will remain unaffected. But it might be worth highlighting this change more prominently to our users for added scrutiny. Co-authored-by: irfan sharif <[email protected]>

see #1158 see cockroachdb/cockroach#89233 Cockroach is changing their default to 4h in dedicated and 1.25h in Serverless. This commit changes the lowest common denominator, sans some padding

irfansharif added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Oct 3, 2022

blathers-crl bot added the T-kv KV Team label Oct 3, 2022

irfansharif mentioned this issue Dec 14, 2022

changefeedccl: cleanup deprecated pts handling #89975

Merged

irfansharif self-assigned this Dec 15, 2022

irfansharif mentioned this issue Dec 16, 2022

*: lower default GC TTL to 4h #93836

Merged

craig bot closed this as completed in 1d01816 Jan 6, 2023

This was referenced Jan 11, 2023

*: (maybe) undo lowering of GC TTL #95122

Closed

spanconfig,kv: enable spanconfig.storage_coalesce_adjacent by default #81008

Closed

chancesm mentioned this issue Feb 9, 2023

CockroachDB GC Window Change authzed/spicedb#1158

Closed

vroldanbet mentioned this issue Feb 10, 2023

makes default GC window 1h authzed/spicedb#1161

Closed

irfansharif mentioned this issue Mar 6, 2023

*: add user-facing documentation for lowered GC TTL #98105

Closed

rmloveland mentioned this issue Mar 16, 2023

Update all GC TTL and range size mentions to DRY cockroachdb/docs#16506

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: lower default GC TTL #89233

*: lower default GC TTL #89233

irfansharif commented Oct 3, 2022 •

edited

Loading

irfansharif commented Oct 3, 2022

ajwerner commented Oct 3, 2022

miretskiy commented Oct 5, 2022

*: lower default GC TTL #89233

*: lower default GC TTL #89233

Comments

irfansharif commented Oct 3, 2022 • edited Loading

irfansharif commented Oct 3, 2022

ajwerner commented Oct 3, 2022

miretskiy commented Oct 5, 2022

irfansharif commented Oct 3, 2022 •

edited

Loading