Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update all GC TTL and range size mentions to DRY #16506

Merged
merged 1 commit into from
Mar 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _includes/v22.2/zone-configs/variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Variable | Description
------|------------
`range_min_bytes` | <a name="range-min-bytes"></a> The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.<br><br>**Default:** `134217728` (128 MiB)
`range_max_bytes` | <a name="range-max-bytes"></a> The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will split it into two ranges.<br><br>**Default:** `536870912` (512 MiB)
`gc.ttlseconds` | <a name="gc-ttlseconds"></a> The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 512 MiB; such oversized ranges could contribute to the server running out of memory or other problems. {{site.data.alerts.callout_info}} Ensure that you set `gc.ttlseconds` long enough to accommodate your [backup schedule](create-schedule-for-backup.html), otherwise your incremental backups will fail with [this error](common-errors.html#protected-ts-verification-error). For example, if you set up your backup schedule to recur daily, but you set `gc.ttlseconds` to less than one day, all your incremental backups will fail.{{site.data.alerts.end}} **Default:** `90000` (25 hours) <br> However, all {{ site.data.products.serverless }} clusters have a default `gc.ttlseconds` of 4500 seconds (1 hour and 15 minutes) that cannot be altered.
`gc.ttlseconds` | <a name="gc-ttlseconds"></a> The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 512 MiB; such oversized ranges could contribute to the server running out of memory or other problems. {{site.data.alerts.callout_info}} If you are not yet using [scheduled backups](create-schedule-for-backup.html), and instead issuing [`BACKUP`](backup.html) statements manually, you must ensure that you set `gc.ttlseconds` long enough to accommodate your manual backup schedule. Otherwise, your incremental backups will fail with [the error message `protected ts verification error`](common-errors.html#protected-ts-verification-error). We recommend using [scheduled backups](create-schedule-for-backup.html) instead, which automatically [use protected timestamps](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) to ensure they succeed. {{site.data.alerts.end}} **Default:** `90000` (25 hours) <br> However, all {{ site.data.products.serverless }} clusters have a default `gc.ttlseconds` of 4500 seconds (1 hour and 15 minutes) that cannot be altered.
`num_replicas` | <a name="num_replicas"></a> The number of replicas in the zone, also called the "replication factor".<br><br>**Default:** `3`<br><br>For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.<br /><br />For [multi-region databases configured to survive region failures](multiregion-overview.html#surviving-region-failures), the default value is `5`; this will include both [voting](#num_voters) and [non-voting replicas](architecture/replication-layer.html#non-voting-replicas).
`constraints` | <a name="constraints"></a> An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.<br/><br/>To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled. To prevent this error, make sure at least one active node is configured to match the constraint. For example, apply `constraints = '[+region=west]'` only if you had set `--locality=region=west` for at least one node while starting the cluster.<br/><br/>**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities.
`lease_preferences` <a name="lease_preferences"></a> | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/glossary.html#architecture-leaseholder). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all. <br /><br /> If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.<br /><br />Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."<br /><br /> For a usage example, see [Constrain leaseholders to specific availability zones](configure-replication-zones.html#constrain-leaseholders-to-specific-availability-zones).<br /><br />**Default**: No lease location preferences are applied if this field is not specified.
Expand Down
2 changes: 1 addition & 1 deletion _includes/v23.1/faq/auto-generate-unique-ids.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
(3 rows)
~~~

In either case, generated IDs will be 128-bit, large enough for there to be virtually no chance of generating non-unique values. Also, once the table grows beyond a single key-value range (more than 512 MiB by default), new IDs will be scattered across all of the table's ranges and, therefore, likely across different nodes. This means that multiple nodes will share in the load.
In either case, generated IDs will be 128-bit, large enough for there to be virtually no chance of generating non-unique values. Also, once the table grows beyond a single key-value range's [default size](configure-replication-zones.html#range-max-bytes), new IDs will be scattered across all of the table's ranges and, therefore, likely across different nodes. This means that multiple nodes will share in the load.

This approach has the disadvantage of creating a primary key that may not be useful in a query directly, which can require a join with another table or a secondary index.

Expand Down
2 changes: 1 addition & 1 deletion _includes/v23.1/misc/basic-terms.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ An individual instance of CockroachDB. One or more nodes form a cluster.
<a name="architecture-range"></a>
CockroachDB stores all user data (tables, indexes, etc.) and almost all system data in a sorted map of key-value pairs. This keyspace is divided into contiguous chunks called _ranges_, such that every key is found in one range.

From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the _primary index_ because the table is sorted by the primary key) or a single row in a secondary index. As soon as the size of a range reaches 512 MiB ([the default](../configure-replication-zones.html#range-max-bytes)), it is split into two ranges. This process continues for these new ranges as the table and its indexes continue growing.
From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the _primary index_ because the table is sorted by the primary key) or a single row in a secondary index. As soon as the size of a range reaches [the default range size](../configure-replication-zones.html#range-max-bytes), it is [split into two ranges](distribution-layer.html#range-splits). This process continues for these new ranges as the table and its indexes continue growing.

### Replica
<a name="architecture-replica"></a>
Expand Down
2 changes: 1 addition & 1 deletion _includes/v23.1/sql/range-splits.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CockroachDB breaks data into ranges. By default, CockroachDB attempts to keep ranges below a size of 512 MiB. To do this, the system will automatically [split a range](architecture/distribution-layer.html#range-splits) if it grows larger than this limit. For most use cases, this automatic range splitting is sufficient, and you should never need to worry about when or where the system decides to split ranges.
CockroachDB breaks data into ranges. By default, CockroachDB attempts to keep ranges below [the default range size](configure-replication-zones.html#range-max-bytes). To do this, the system will automatically [split a range](architecture/distribution-layer.html#range-splits) if it grows larger than this limit. For most use cases, this automatic range splitting is sufficient, and you should never need to worry about when or where the system decides to split ranges.

However, there are reasons why you may want to perform manual splits on the ranges that store tables or indexes:

Expand Down
4 changes: 2 additions & 2 deletions _includes/v23.1/zone-configs/variables.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Variable | Description
------|------------
`range_min_bytes` | <a name="range-min-bytes"></a> The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.<br><br>**Default:** `134217728` (128 MiB)
`range_max_bytes` | <a name="range-max-bytes"></a> The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will split it into two ranges.<br><br>**Default:** `536870912` (512 MiB)
`gc.ttlseconds` | <a name="gc-ttlseconds"></a> The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 512 MiB; such oversized ranges could contribute to the server running out of memory or other problems. {{site.data.alerts.callout_info}} Ensure that you set `gc.ttlseconds` long enough to accommodate your [backup schedule](create-schedule-for-backup.html), otherwise your incremental backups will fail with [this error](common-errors.html#protected-ts-verification-error). For example, if you set up your backup schedule to recur daily, but you set `gc.ttlseconds` to less than one day, all your incremental backups will fail.{{site.data.alerts.end}} **Default:** `90000` (25 hours)
`range_max_bytes` | <a name="range-max-bytes"></a> The maximum size, in bytes, for a [range]({{link_prefix}}architecture/glossary.html#architecture-range) of data in the zone. When a range reaches this size, CockroachDB will [split it]({{link_prefix}}architecture/distribution-layer.html#range-splits) into two ranges.<br><br>**Default:** `536870912` (512 MiB)
`gc.ttlseconds` | <a name="gc-ttlseconds"></a> The number of seconds overwritten [MVCC values]({{link_prefix}}architecture/storage-layer.html#mvcc) will be retained before [garbage collection]({{link_prefix}}architecture/storage-layer.html#garbage-collection). Smaller values can save disk space if values are frequently overwritten; larger values increase the interval allowed for [`AS OF SYSTEM TIME`](as-of-system-time.html) queries, also known as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).<br><br>It is not recommended to set this below `600` (10 minutes); doing so will cause problems for [long-running queries](manage-long-running-queries.html). Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than [the default range size](#range-max-bytes); such oversized ranges could contribute to the server [running out of memory](cluster-setup-troubleshooting.html#out-of-memory-oom-crash) or other problems. {{site.data.alerts.callout_info}} If you are not yet using [scheduled backups](create-schedule-for-backup.html), and instead issuing [`BACKUP`](backup.html) statements manually, you must ensure that you set `gc.ttlseconds` long enough to accommodate your manual backup schedule. Otherwise, your incremental backups will fail with [the error message `protected ts verification error`](common-errors.html#protected-ts-verification-error). We recommend using [scheduled backups](create-schedule-for-backup.html) instead, which automatically [use protected timestamps](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) to ensure they succeed. {{site.data.alerts.end}} **Default:** `14400` (4 hours)
`num_replicas` | <a name="num_replicas"></a> The number of replicas in the zone, also called the "replication factor".<br><br>**Default:** `3`<br><br>For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.<br /><br />For [multi-region databases configured to survive region failures](multiregion-overview.html#surviving-region-failures), the default value is `5`; this will include both [voting](#num_voters) and [non-voting replicas](architecture/replication-layer.html#non-voting-replicas).
`constraints` | <a name="constraints"></a> An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.<br/><br/>To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled. To prevent this error, make sure at least one active node is configured to match the constraint. For example, apply `constraints = '[+region=west]'` only if you had set `--locality=region=west` for at least one node while starting the cluster.<br/><br/>**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities.
`lease_preferences` <a name="lease_preferences"></a> | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/glossary.html#architecture-leaseholder). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all. <br /><br /> If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.<br /><br />Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."<br /><br /> For a usage example, see [Constrain leaseholders to specific availability zones](configure-replication-zones.html#constrain-leaseholders-to-specific-availability-zones).<br /><br />**Default**: No lease location preferences are applied if this field is not specified.
Expand Down
2 changes: 1 addition & 1 deletion v21.2/backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ To view the contents of a backup created with the `BACKUP` statement, use [`SHOW
`subdirectory` | The name of the specific backup (e.g., `2021/03/23-213101.37`) in the collection to which you want to add an [incremental backup](take-full-and-incremental-backups.html#incremental-backups). To view available backup subdirectories, use [`SHOW BACKUPS IN destination`](show-backup.html). If the backup `subdirectory` is not provided, a [full backup](take-full-and-incremental-backups.html#full-backups) will be created in the collection using a date-based naming scheme (i.e., `<year>/<month>/<day>-<timestamp>`).<br><br>**Warning:** If you use an arbitrary `STRING` as the subdirectory, a new full backup will be created, but it will never be shown in `SHOW BACKUPS IN`. We do not recommend using arbitrary strings as subdirectory names.
`LATEST` | Append an incremental backup to the latest completed full backup's subdirectory.
`destination` | The URL where you want to store the backup.<br/><br/>For information about this URL structure, see [Backup File URLs](#backup-file-urls).
`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your cluster's last garbage collection (which defaults to occur every 25 hours, but is [configurable per table](configure-replication-zones.html#replication-zone-variables)).
`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your data's garbage collection TTL (which is controlled by the [`gc.ttlseconds` replication zone variable](configure-replication-zones.html#gc-ttlseconds)).
`backup_options` | Control the backup behavior with a comma-separated list of [these options](#options).

### Targets
Expand Down
Loading