Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update row level TTL RFC #83908

Merged
merged 1 commit into from
Jul 7, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 53 additions & 43 deletions docs/RFCS/20220120_row_level_ttl.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@
* Status: in-progress
* Start Date: 2021-12-14
* Authors: Oliver Tan
* RFC PR: [#75189](#75189)
* Cockroach Issue: [#20239](#20239)
* RFC PR: [#75189]
* Cockroach Issue: [#20239]

# Changelog
* 2022-07-07: Removed `ttl_automatic_column` to simplify configuration - [PR][#83908]

# Summary
Row-level "time to live" (TTL) is a mechanism in which rows from a table
automatically get deleted once the row surpasses an expiration time (the "TTL").
This has been a [feature commonly asked for](#20239).
This has been a [feature commonly asked for][#20239].

This RFC proposes a CockroachDB level mechanism to support row-level TTL, where
rows will be deleted after a certain period of time. As a further extension in a
Expand All @@ -29,7 +32,7 @@ CREATE TABLE tbl (
text TEXT,
expiration TIMESTAMPTZ,
should_delete BOOL,
) WITH (ttl = 'on', ttl_expiration_expression = 'if(should_delete, expiration, NULL)');
) WITH (ttl_expiration_expression = 'if(should_delete, expiration, NULL)');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a section near the top here with something like

# Changelog

* 2022-07-07: Removed the `ttl_automatic_column` <brief summary about why> <link to this PR>

for a paper trail?

```

By implementing row-level TTL, we are saving developers from writing a complex
Expand All @@ -44,7 +47,7 @@ Today, developers who need row-level TTL need to roll out their own mechanism
for deleting rows as well as adding application logic to filter out the expired
rows.

The deletion job itself can get complex to write. We have a [guide](cockroach TTL advice)
The deletion job itself can get complex to write. We have a [guide][cockroach TTL advice]
which was difficult to perfect - when we implemented it ourselves,
we found there were multiple issues related to performance. Developers have to
implement and manage several knobs to balance deletion time and performance on
Expand Down Expand Up @@ -84,7 +87,7 @@ CREATE TABLE tbl (

This automatically creates a repeating scheduled job for the given table, as
well as adding the HIDDEN column `crdb_internal_expiration` to symbolize the
TTL and implicitly adds the `ttl` and `ttl_automatic_column` parameters:
TTL and implicitly adds the `ttl` parameter:

```sql
CREATE TABLE tbl (
Expand All @@ -95,7 +98,7 @@ CREATE TABLE tbl (
NOT NULL
DEFAULT current_timestamp() + '5 minutes'
ON UPDATE current_timestamp() + '5 minutes'
) WITH (ttl = 'on', ttl_automatic_column = 'on', ttl_expire_after = '5 minutes')
) WITH (ttl = 'on', ttl_expire_after = '5 minutes')
```

Users can also opt to use their own expression which evaluates to a NULLABLE
Expand Down Expand Up @@ -127,30 +130,37 @@ CREATE TABLE tbl (
id INT PRIMARY KEY,
text TEXT,
should_delete BOOL,
) WITH (ttl = 'on', ttl_expiration_expression = 'if(should_delete, crdb_internal_expiration, NULL)', ttl_expire_after = '10 mins');
) WITH (ttl_expiration_expression = 'if(should_delete, crdb_internal_expiration, NULL)', ttl_expire_after = '10 mins');
```

TTL metadata is stored on the TableDescriptor:
```protobuf
message TableDescriptor {
message RowLevelTTL {
// DurationExpr is the automatically assigned interval for when the TTL should apply to a row.
optional string duration_expr = 1 [(gogoproto.nullable)=false];
// DeletionCron is the cron-syntax scheduling of the deletion job.
optional string deletion_cron = 2 [(gogoproto.nullable)=false];
// DeletionPause is true if the TTL job should not run.
// Intended to be a temporary pause.
optional bool deletion_pause = 3 [(gogoproto.nullable)=false];
// DeleteBatchSize is the number of rows to delete in each batch.
optional int64 delete_batch_size = 4;
// SelectBatchSize is the number of rows to select at a time.
optional int64 select_batch_size = 5;
// MaximumRowsDeletedPerSecond controls the amount of rows to delete per second.
// At zero, it will not impose any limit.
optional int64 max_rows_deleted_per_second = 6;
// RangeConcurrency controls the amount of ranges to delete at a time.
// Defaults to 0 (number of CPU cores).
optional int64 range_concurrency = 7;
optional string duration_expr = 1 [(gogoproto.nullable)=false, (gogoproto.casttype)="Expression"];
// SelectBatchSize is the amount of rows that should be fetched at a time
optional int64 select_batch_size = 2 [(gogoproto.nullable)=false];
// DeleteBatchSize is the amount of rows that should be deleted at a time.
optional int64 delete_batch_size = 3 [(gogoproto.nullable)=false];
// DeletionCron signifies how often the TTL deletion job runs in a cron format.
optional string deletion_cron = 4 [(gogoproto.nullable)=false];
// ScheduleID is the ID of the row-level TTL job schedules.
optional int64 schedule_id = 5 [(gogoproto.customname)="ScheduleID",(gogoproto.nullable)=false];
// RangeConcurrency is the number of ranges to process at a time.
optional int64 range_concurrency = 6 [(gogoproto.nullable)=false];
// DeleteRateLimit is the maximum amount of rows to delete per second.
optional int64 delete_rate_limit = 7 [(gogoproto.nullable)=false];
// Pause is set if the TTL job should not run.
optional bool pause = 8 [(gogoproto.nullable)=false];
// RowStatsPollInterval is the interval to report row statistics (number of rows on table, number of expired
// rows on table) during row level TTL. If zero, no statistics are reported.
optional int64 row_stats_poll_interval = 9 [(gogoproto.nullable)=false, (gogoproto.casttype)="time.Duration"];
// LabelMetrics is true if metrics for the TTL job should add a label containing
// the relation name.
optional bool label_metrics = 10 [(gogoproto.nullable) = false];
// ExpirationExpr is the custom assigned expression for calculating when the TTL should apply to a row.
optional string expiration_expr = 11 [(gogoproto.nullable)=false, (gogoproto.casttype)="Expression"];
}

// ...
Expand All @@ -163,19 +173,18 @@ message TableDescriptor {
As part of the `(option = value, …)` storage parameter syntax, we will support
the following options to control the TTL job:

Option | Description
--- | ---
`ttl` | Automatically set option. Signifies if a TTL is active. Not used for the job.
`ttl_automatic_column` | Automatically set option if automatic connection is enabled. Not used for the job.
`ttl_expire_after` | When a TTL would expire. Accepts any interval. Defaults to ''30 days''. Minimum of `'5 minutes'`.
`ttl_expiration_expression` | If set, uses the expression specified as the TTL expiration. Defaults to just using the `crdb_internal_expiration` column.
`ttl_select_batch_size` | How many rows to fetch from the range that have expired at a given time. Defaults to 500. Must be at least `1`.
`ttl_delete_batch_size` | How many rows to delete at a time. Defaults to 100. Must be at least `1`.
`ttl_range_concurrency` | How many concurrent ranges are being worked on at a time. Defaults to `cpu_core_count`. Must be at least `1`.
`ttl_delete_rate_limit` | Maximum number of rows to be deleted per second (acts as the rate limit). Defaults to 0 (signifying none).
`ttl_row_stats_poll_interval` | Whilst the TTL job is running, counts rows and expired rows on the table to report as prometheus metrics. By default unset, meaning no stats are fetched.
`ttl_pause` | Stops the TTL job from executing.
`ttl_job_cron` | Frequency the job runs, specified using the CRON syntax.
| Option | Description |
|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ttl` | Automatically set option. Signifies if a TTL is active. Not used for the job. |
| `ttl_expire_after` | When a TTL would expire. Accepts any interval. Defaults to ''30 days''. Minimum of `'5 minutes'`. |
| `ttl_expiration_expression` | If set, uses the expression specified as the TTL expiration. Defaults to just using the `crdb_internal_expiration` column. |
| `ttl_select_batch_size` | How many rows to fetch from the range that have expired at a given time. Defaults to 500. Must be at least `1`. |
| `ttl_delete_batch_size` | How many rows to delete at a time. Defaults to 100. Must be at least `1`. |
| `ttl_range_concurrency` | How many concurrent ranges are being worked on at a time. Defaults to `cpu_core_count`. Must be at least `1`. |
| `ttl_delete_rate_limit` | Maximum number of rows to be deleted per second (acts as the rate limit). Defaults to 0 (signifying none). |
| `ttl_row_stats_poll_interval` | Whilst the TTL job is running, counts rows and expired rows on the table to report as prometheus metrics. By default unset, meaning no stats are fetched. |
| `ttl_pause` | Stops the TTL job from executing. |
| `ttl_job_cron` | Frequency the job runs, specified using the CRON syntax. |

### Applying or Altering TTL for a table
TTL can be configured using `ALTER TABLE`:
Expand All @@ -196,16 +205,16 @@ will not apply whilst the deletion job is running; the user must
restart the job for the settings to take effect. A HINT will be displayed to the
user if this is required.

### Converting between `ttl_automatic_column` and `ttl_expiration_expression`
### Converting between `ttl_expire_after` and `ttl_expiration_expression`

Users can convert from using the automatic column to the expiration expression
by re-using the `SET` syntax:

```sql
ALTER TABLE tbl SET (ttl_expiration_expression = 'other_column');
-- Resetting the ttl_automatic_column will drop the TTL column.
-- Resetting ttl_expire_after will drop the TTL column.
-- This step is optional in case the automatic column is still used.
ALTER TABLE tbl RESET (ttl_automatic_column);
ALTER TABLE tbl RESET (ttl_expire_after);
```

To go the other way:
Expand Down Expand Up @@ -287,7 +296,7 @@ performs:

### Admission Control
To ensure the deletion job does not affect foreground traffic, we plan on using
[admission control](admission control) on a SQL transaction at a low value
[admission control] on a SQL transaction at a low value
(`-100`). This leaves room for lower values in future.

From previous experimentation, this largely regulates the amount of knob
Expand Down Expand Up @@ -378,7 +387,7 @@ a few problems:
process. This adds further complexity to CDC.

As row-level TTL is a "SQL level" feature, it makes sense that something in the
SQL layer would be most appropriate to handle it. See [comparison doc](comparison doc)
SQL layer would be most appropriate to handle it. See [comparison doc]
for other observations.

### Alternative TTL columns
Expand Down Expand Up @@ -440,7 +449,7 @@ ELSE max(crdb_internal_expiration, current_timestamp() + 'table ttl value')

However, due to the limitation that `ON UPDATE` cannot reference a table
expression, this cannot yet be implemented. We may choose to use
this as a later default when we implement [triggers](#28296).
this as a later default when we implement [triggers][#28296].

## Future Improvements

Expand Down Expand Up @@ -475,6 +484,7 @@ N/A
[#20239]: https://github.com/cockroachdb/cockroach/issues/20239
[#75189]: https://github.com/cockroachdb/cockroach/pull/75189
[#28296]: https://github.com/cockroachdb/cockroach/issues/28296
[#83908]: https://github.com/cockroachdb/cockroach/issues/83908
[cockroach TTL advice]: https://www.cockroachlabs.com/docs/stable/bulk-delete-data.html
[admission control]: https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/admission_control.md
[comparison doc]: https://docs.google.com/document/d/1HkFg3S-k3s2PahPRQhTgUkCR4WIAtjkSNVylarMC-gY/edit#heading=h.o6cn5faoiokv