diff --git a/_includes/v22.2/zone-configs/variables.md b/_includes/v22.2/zone-configs/variables.md index 1eef61dbdde..db9bec1a79d 100644 --- a/_includes/v22.2/zone-configs/variables.md +++ b/_includes/v22.2/zone-configs/variables.md @@ -2,7 +2,7 @@ Variable | Description ------|------------ `range_min_bytes` | The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.

**Default:** `134217728` (128 MiB) `range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will split it into two ranges.

**Default:** `536870912` (512 MiB) -`gc.ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).

It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 512 MiB; such oversized ranges could contribute to the server running out of memory or other problems. {{site.data.alerts.callout_info}} Ensure that you set `gc.ttlseconds` long enough to accommodate your [backup schedule](create-schedule-for-backup.html), otherwise your incremental backups will fail with [this error](common-errors.html#protected-ts-verification-error). For example, if you set up your backup schedule to recur daily, but you set `gc.ttlseconds` to less than one day, all your incremental backups will fail.{{site.data.alerts.end}} **Default:** `90000` (25 hours)
However, all {{ site.data.products.serverless }} clusters have a default `gc.ttlseconds` of 4500 seconds (1 hour and 15 minutes) that cannot be altered. +`gc.ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).

It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 512 MiB; such oversized ranges could contribute to the server running out of memory or other problems. {{site.data.alerts.callout_info}} If you are not yet using [scheduled backups](create-schedule-for-backup.html), and instead issuing [`BACKUP`](backup.html) statements manually, you must ensure that you set `gc.ttlseconds` long enough to accommodate your manual backup schedule. Otherwise, your incremental backups will fail with [the error message `protected ts verification error`](common-errors.html#protected-ts-verification-error). We recommend using [scheduled backups](create-schedule-for-backup.html) instead, which automatically [use protected timestamps](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) to ensure they succeed. {{site.data.alerts.end}} **Default:** `90000` (25 hours)
However, all {{ site.data.products.serverless }} clusters have a default `gc.ttlseconds` of 4500 seconds (1 hour and 15 minutes) that cannot be altered. `num_replicas` | The number of replicas in the zone, also called the "replication factor".

**Default:** `3`

For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.

For [multi-region databases configured to survive region failures](multiregion-overview.html#surviving-region-failures), the default value is `5`; this will include both [voting](#num_voters) and [non-voting replicas](architecture/replication-layer.html#non-voting-replicas). `constraints` | An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.

To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled. To prevent this error, make sure at least one active node is configured to match the constraint. For example, apply `constraints = '[+region=west]'` only if you had set `--locality=region=west` for at least one node while starting the cluster.

**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities. `lease_preferences` | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/glossary.html#architecture-leaseholder). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all.

If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.

Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."

For a usage example, see [Constrain leaseholders to specific availability zones](configure-replication-zones.html#constrain-leaseholders-to-specific-availability-zones).

**Default**: No lease location preferences are applied if this field is not specified. diff --git a/_includes/v23.1/faq/auto-generate-unique-ids.html b/_includes/v23.1/faq/auto-generate-unique-ids.html index ee56e21b7e0..d96933a2c18 100644 --- a/_includes/v23.1/faq/auto-generate-unique-ids.html +++ b/_includes/v23.1/faq/auto-generate-unique-ids.html @@ -66,7 +66,7 @@ (3 rows) ~~~ -In either case, generated IDs will be 128-bit, large enough for there to be virtually no chance of generating non-unique values. Also, once the table grows beyond a single key-value range (more than 512 MiB by default), new IDs will be scattered across all of the table's ranges and, therefore, likely across different nodes. This means that multiple nodes will share in the load. +In either case, generated IDs will be 128-bit, large enough for there to be virtually no chance of generating non-unique values. Also, once the table grows beyond a single key-value range's [default size](configure-replication-zones.html#range-max-bytes), new IDs will be scattered across all of the table's ranges and, therefore, likely across different nodes. This means that multiple nodes will share in the load. This approach has the disadvantage of creating a primary key that may not be useful in a query directly, which can require a join with another table or a secondary index. diff --git a/_includes/v23.1/misc/basic-terms.md b/_includes/v23.1/misc/basic-terms.md index 71f437af4a6..60badf799b3 100644 --- a/_includes/v23.1/misc/basic-terms.md +++ b/_includes/v23.1/misc/basic-terms.md @@ -14,7 +14,7 @@ An individual instance of CockroachDB. One or more nodes form a cluster. CockroachDB stores all user data (tables, indexes, etc.) and almost all system data in a sorted map of key-value pairs. This keyspace is divided into contiguous chunks called _ranges_, such that every key is found in one range. -From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the _primary index_ because the table is sorted by the primary key) or a single row in a secondary index. As soon as the size of a range reaches 512 MiB ([the default](../configure-replication-zones.html#range-max-bytes)), it is split into two ranges. This process continues for these new ranges as the table and its indexes continue growing. +From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the _primary index_ because the table is sorted by the primary key) or a single row in a secondary index. As soon as the size of a range reaches [the default range size](../configure-replication-zones.html#range-max-bytes), it is [split into two ranges](distribution-layer.html#range-splits). This process continues for these new ranges as the table and its indexes continue growing. ### Replica diff --git a/_includes/v23.1/sql/range-splits.md b/_includes/v23.1/sql/range-splits.md index f83cafb80e5..eb94ce14533 100644 --- a/_includes/v23.1/sql/range-splits.md +++ b/_includes/v23.1/sql/range-splits.md @@ -1,4 +1,4 @@ -CockroachDB breaks data into ranges. By default, CockroachDB attempts to keep ranges below a size of 512 MiB. To do this, the system will automatically [split a range](architecture/distribution-layer.html#range-splits) if it grows larger than this limit. For most use cases, this automatic range splitting is sufficient, and you should never need to worry about when or where the system decides to split ranges. +CockroachDB breaks data into ranges. By default, CockroachDB attempts to keep ranges below [the default range size](configure-replication-zones.html#range-max-bytes). To do this, the system will automatically [split a range](architecture/distribution-layer.html#range-splits) if it grows larger than this limit. For most use cases, this automatic range splitting is sufficient, and you should never need to worry about when or where the system decides to split ranges. However, there are reasons why you may want to perform manual splits on the ranges that store tables or indexes: diff --git a/_includes/v23.1/zone-configs/variables.md b/_includes/v23.1/zone-configs/variables.md index c80cf0921dc..02778872aaa 100644 --- a/_includes/v23.1/zone-configs/variables.md +++ b/_includes/v23.1/zone-configs/variables.md @@ -1,8 +1,8 @@ Variable | Description ------|------------ `range_min_bytes` | The minimum size, in bytes, for a range of data in the zone. When a range is less than this size, CockroachDB will merge it with an adjacent range.

**Default:** `134217728` (128 MiB) -`range_max_bytes` | The maximum size, in bytes, for a range of data in the zone. When a range reaches this size, CockroachDB will split it into two ranges.

**Default:** `536870912` (512 MiB) -`gc.ttlseconds` | The number of seconds overwritten values will be retained before garbage collection. Smaller values can save disk space if values are frequently overwritten; larger values increase the range allowed for `AS OF SYSTEM TIME` queries, also know as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).

It is not recommended to set this below `600` (10 minutes); doing so will cause problems for long-running queries. Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than 512 MiB; such oversized ranges could contribute to the server running out of memory or other problems. {{site.data.alerts.callout_info}} Ensure that you set `gc.ttlseconds` long enough to accommodate your [backup schedule](create-schedule-for-backup.html), otherwise your incremental backups will fail with [this error](common-errors.html#protected-ts-verification-error). For example, if you set up your backup schedule to recur daily, but you set `gc.ttlseconds` to less than one day, all your incremental backups will fail.{{site.data.alerts.end}} **Default:** `90000` (25 hours) +`range_max_bytes` | The maximum size, in bytes, for a [range]({{link_prefix}}architecture/glossary.html#architecture-range) of data in the zone. When a range reaches this size, CockroachDB will [split it]({{link_prefix}}architecture/distribution-layer.html#range-splits) into two ranges.

**Default:** `536870912` (512 MiB) +`gc.ttlseconds` | The number of seconds overwritten [MVCC values]({{link_prefix}}architecture/storage-layer.html#mvcc) will be retained before [garbage collection]({{link_prefix}}architecture/storage-layer.html#garbage-collection). Smaller values can save disk space if values are frequently overwritten; larger values increase the interval allowed for [`AS OF SYSTEM TIME`](as-of-system-time.html) queries, also known as [Time Travel Queries](select-clause.html#select-historical-data-time-travel).

It is not recommended to set this below `600` (10 minutes); doing so will cause problems for [long-running queries](manage-long-running-queries.html). Also, since all versions of a row are stored in a single range that never splits, it is not recommended to set this so high that all the changes to a row in that time period could add up to more than [the default range size](#range-max-bytes); such oversized ranges could contribute to the server [running out of memory](cluster-setup-troubleshooting.html#out-of-memory-oom-crash) or other problems. {{site.data.alerts.callout_info}} If you are not yet using [scheduled backups](create-schedule-for-backup.html), and instead issuing [`BACKUP`](backup.html) statements manually, you must ensure that you set `gc.ttlseconds` long enough to accommodate your manual backup schedule. Otherwise, your incremental backups will fail with [the error message `protected ts verification error`](common-errors.html#protected-ts-verification-error). We recommend using [scheduled backups](create-schedule-for-backup.html) instead, which automatically [use protected timestamps](create-schedule-for-backup.html#protected-timestamps-and-scheduled-backups) to ensure they succeed. {{site.data.alerts.end}} **Default:** `14400` (4 hours) `num_replicas` | The number of replicas in the zone, also called the "replication factor".

**Default:** `3`

For the `system` database and `.meta`, `.liveness`, and `.system` ranges, the default value is `5`.

For [multi-region databases configured to survive region failures](multiregion-overview.html#surviving-region-failures), the default value is `5`; this will include both [voting](#num_voters) and [non-voting replicas](architecture/replication-layer.html#non-voting-replicas). `constraints` | An array of required (`+`) and/or prohibited (`-`) constraints influencing the location of replicas. See [Types of Constraints](configure-replication-zones.html#types-of-constraints) and [Scope of Constraints](configure-replication-zones.html#scope-of-constraints) for more details.

To prevent hard-to-detect typos, constraints placed on [store attributes and node localities](configure-replication-zones.html#descriptive-attributes-assigned-to-nodes) must match the values passed to at least one node in the cluster. If not, an error is signalled. To prevent this error, make sure at least one active node is configured to match the constraint. For example, apply `constraints = '[+region=west]'` only if you had set `--locality=region=west` for at least one node while starting the cluster.

**Default:** No constraints, with CockroachDB locating each replica on a unique node and attempting to spread replicas evenly across localities. `lease_preferences` | An ordered list of required and/or prohibited constraints influencing the location of [leaseholders](architecture/glossary.html#architecture-leaseholder). Whether each constraint is required or prohibited is expressed with a leading `+` or `-`, respectively. Note that lease preference constraints do not have to be shared with the `constraints` field. For example, it's valid for your configuration to define a `lease_preferences` field that does not reference any values from the `constraints` field. It's also valid to define a `lease_preferences` field with no `constraints` field at all.

If the first preference cannot be satisfied, CockroachDB will attempt to satisfy the second preference, and so on. If none of the preferences can be met, the lease will be placed using the default lease placement algorithm, which is to base lease placement decisions on how many leases each node already has, trying to make all the nodes have around the same amount.

Each value in the list can include multiple constraints. For example, the list `[[+zone=us-east-1b, +ssd], [+zone=us-east-1a], [+zone=us-east-1c, +ssd]]` means "prefer nodes with an SSD in `us-east-1b`, then any nodes in `us-east-1a`, then nodes in `us-east-1c` with an SSD."

For a usage example, see [Constrain leaseholders to specific availability zones](configure-replication-zones.html#constrain-leaseholders-to-specific-availability-zones).

**Default**: No lease location preferences are applied if this field is not specified. diff --git a/v21.2/backup.md b/v21.2/backup.md index 75e634c85ed..8ea9495c8e0 100644 --- a/v21.2/backup.md +++ b/v21.2/backup.md @@ -71,7 +71,7 @@ To view the contents of a backup created with the `BACKUP` statement, use [`SHOW `subdirectory` | The name of the specific backup (e.g., `2021/03/23-213101.37`) in the collection to which you want to add an [incremental backup](take-full-and-incremental-backups.html#incremental-backups). To view available backup subdirectories, use [`SHOW BACKUPS IN destination`](show-backup.html). If the backup `subdirectory` is not provided, a [full backup](take-full-and-incremental-backups.html#full-backups) will be created in the collection using a date-based naming scheme (i.e., `//-`).

**Warning:** If you use an arbitrary `STRING` as the subdirectory, a new full backup will be created, but it will never be shown in `SHOW BACKUPS IN`. We do not recommend using arbitrary strings as subdirectory names. `LATEST` | Append an incremental backup to the latest completed full backup's subdirectory. `destination` | The URL where you want to store the backup.

For information about this URL structure, see [Backup File URLs](#backup-file-urls). -`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your cluster's last garbage collection (which defaults to occur every 25 hours, but is [configurable per table](configure-replication-zones.html#replication-zone-variables)). +`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your data's garbage collection TTL (which is controlled by the [`gc.ttlseconds` replication zone variable](configure-replication-zones.html#gc-ttlseconds)). `backup_options` | Control the backup behavior with a comma-separated list of [these options](#options). ### Targets diff --git a/v21.2/delete-data.md b/v21.2/delete-data.md index e1a60985c6a..351a57f25dd 100644 --- a/v21.2/delete-data.md +++ b/v21.2/delete-data.md @@ -220,7 +220,7 @@ with conn.cursor() as cur: ## Performance considerations -Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. This process runs every 25 hours by default to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and running [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#replication-zone-variables) setting. +Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. Once the marked records are older than [the specified TTL interval](configure-replication-zones.html#gc-ttlseconds), they are eligible to be removed. The garbage collection interval is designed to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#gc-ttlseconds) setting. The practical implications of the above are: diff --git a/v22.1/backup.md b/v22.1/backup.md index 7b6b2d7c797..4761a974d53 100644 --- a/v22.1/backup.md +++ b/v22.1/backup.md @@ -73,7 +73,7 @@ CockroachDB stores full backups in a backup collection. Each full backup in a co `LATEST` | Append an incremental backup to the latest completed full backup's subdirectory. `collectionURI` | The URI where you want to store the backup. (Or, the default locality for a locality-aware backup.)

For information about this URL structure, see [Backup File URLs](#backup-file-urls). `localityURI` | The URI containing the `COCKROACH_LOCALITY` parameter for a non-default locality that is part of a single locality-aware backup. -`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your cluster's last garbage collection (which defaults to occur every 25 hours, but is [configurable per table](configure-replication-zones.html#replication-zone-variables)). +`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your data's garbage collection TTL (which is controlled by the [`gc.ttlseconds` replication zone variable](configure-replication-zones.html#gc-ttlseconds)). `backup_options` | Control the backup behavior with a comma-separated list of [these options](#options). ### Targets diff --git a/v22.1/delete-data.md b/v22.1/delete-data.md index 376ac09c5b6..88205d26865 100644 --- a/v22.1/delete-data.md +++ b/v22.1/delete-data.md @@ -220,7 +220,7 @@ with conn.cursor() as cur: ## Performance considerations -Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. This process runs every 25 hours by default to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and running [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#replication-zone-variables) setting. +Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. Once the marked records are older than [the specified TTL interval](configure-replication-zones.html#gc-ttlseconds), they are eligible to be removed. The garbage collection interval is designed to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#gc-ttlseconds) setting. The practical implications of the above are: diff --git a/v22.2/backup.md b/v22.2/backup.md index cc2d3f4d098..4366d8d91db 100644 --- a/v22.2/backup.md +++ b/v22.2/backup.md @@ -81,7 +81,7 @@ CockroachDB stores full backups in a backup collection. Each full backup in a co `LATEST` | Append an incremental backup to the latest completed full backup's subdirectory. `collectionURI` | The URI where you want to store the backup. (Or, the default locality for a locality-aware backup.)

For information about this URL structure, see [Backup File URLs](#backup-file-urls). `localityURI` | The URI containing the `COCKROACH_LOCALITY` parameter for a non-default locality that is part of a single locality-aware backup. -`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your cluster's last garbage collection (which defaults to occur every 25 hours, but is [configurable per table](configure-replication-zones.html#replication-zone-variables)). +`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your data's garbage collection TTL (which is controlled by the [`gc.ttlseconds` replication zone variable](configure-replication-zones.html#gc-ttlseconds)). `backup_options` | Control the backup behavior with a comma-separated list of [these options](#options). ### Targets diff --git a/v22.2/delete-data.md b/v22.2/delete-data.md index 376ac09c5b6..88205d26865 100644 --- a/v22.2/delete-data.md +++ b/v22.2/delete-data.md @@ -220,7 +220,7 @@ with conn.cursor() as cur: ## Performance considerations -Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. This process runs every 25 hours by default to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and running [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#replication-zone-variables) setting. +Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. Once the marked records are older than [the specified TTL interval](configure-replication-zones.html#gc-ttlseconds), they are eligible to be removed. The garbage collection interval is designed to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#gc-ttlseconds) setting. The practical implications of the above are: diff --git a/v23.1/architecture/distribution-layer.md b/v23.1/architecture/distribution-layer.md index 052af92569d..78b8df414d3 100644 --- a/v23.1/architecture/distribution-layer.md +++ b/v23.1/architecture/distribution-layer.md @@ -47,9 +47,9 @@ Each node caches values of the `meta2` range it has accessed before, which optim After the node's meta ranges is the KV data your cluster stores. -Each table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as a range reaches 512 MiB in size, it splits into two ranges. This process continues as a table and its indexes continue growing. Once a table is split across multiple ranges, it's likely that the table and secondary indexes will be stored in separate ranges. However, a range can still contain data for both the table and a secondary index. +Each table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as a range reaches [the default range size](../configure-replication-zones.html#range-max-bytes), it splits into two ranges. This process continues as a table and its indexes continue growing. Once a table is split across multiple ranges, it's likely that the table and secondary indexes will be stored in separate ranges. However, a range can still contain data for both the table and a secondary index. -The default 512 MiB range size represents a sweet spot for us between a size that's small enough to move quickly between nodes, but large enough to store a meaningfully contiguous set of data whose keys are more likely to be accessed together. These ranges are then shuffled around your cluster to ensure survivability. +The [default range size](../configure-replication-zones.html#range-max-bytes) represents a sweet spot for us between a size that's small enough to move quickly between nodes, but large enough to store a meaningfully contiguous set of data whose keys are more likely to be accessed together. These ranges are then shuffled around your cluster to ensure survivability. These table ranges are replicated (in the aptly named replication layer), and have the addresses of each replica stored in the `meta2` range. @@ -186,13 +186,13 @@ All of these updates to the range descriptor occur locally on the range, and the ### Range splits -By default, CockroachDB attempts to keep ranges/replicas at the default range size (currently 512 MiB). Once a range reaches that limit, we split it into two smaller ranges (composed of contiguous key spaces). +By default, CockroachDB attempts to keep ranges/replicas at [the default range size](../configure-replication-zones.html#range-max-bytes). Once a range reaches that limit, we split it into two smaller ranges (composed of contiguous key spaces). During this range split, the node creates a new Raft group containing all of the same members as the range that was split. The fact that there are now two ranges also means that there is a transaction that updates `meta2` with the new keyspace boundaries, as well as the addresses of the nodes using the range descriptor. ### Range merges -By default, CockroachDB automatically merges small ranges of data together to form fewer, larger ranges (up to the default range size). This can improve both query latency and cluster survivability. +By default, CockroachDB automatically merges small ranges of data together to form fewer, larger ranges (up to [the default range size](../configure-replication-zones.html#range-max-bytes)). This can improve both query latency and cluster survivability. #### How range merges work diff --git a/v23.1/architecture/overview.md b/v23.1/architecture/overview.md index 01871014e59..904be7d9911 100644 --- a/v23.1/architecture/overview.md +++ b/v23.1/architecture/overview.md @@ -53,7 +53,7 @@ Once the CockroachDB cluster is initialized, developers interact with CockroachD After receiving SQL remote procedure calls (RPCs), nodes convert them into key-value (KV) operations that work with our [distributed, transactional key-value store](transaction-layer.html). -As these RPCs start filling your cluster with data, CockroachDB starts [algorithmically distributing your data among the nodes of the cluster](distribution-layer.html), breaking the data up into 512 MiB chunks that we call ranges. Each range is replicated to at least 3 nodes by default to ensure survivability. This ensures that if any nodes go down, you still have copies of the data which can be used for: +As these RPCs start filling your cluster with data, CockroachDB starts [algorithmically distributing your data among the nodes of the cluster](distribution-layer.html), breaking the data up into chunks that we call ranges. Each range is replicated to at least 3 nodes by default to ensure survivability. This ensures that if any nodes go down, you still have copies of the data which can be used for: - Continuing to serve reads and writes. - Consistently replicating the data to other nodes. diff --git a/v23.1/backup.md b/v23.1/backup.md index f3a064cd847..a8abb973498 100644 --- a/v23.1/backup.md +++ b/v23.1/backup.md @@ -81,7 +81,7 @@ CockroachDB stores full backups in a backup collection. Each full backup in a co `LATEST` | Append an incremental backup to the latest completed full backup's subdirectory. `collectionURI` | The URI where you want to store the backup. (Or, the default locality for a locality-aware backup.)

For information about this URL structure, see [Backup File URLs](#backup-file-urls). `localityURI` | The URI containing the `COCKROACH_LOCALITY` parameter for a non-default locality that is part of a single locality-aware backup. -`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your cluster's last garbage collection (which defaults to occur every 25 hours, but is [configurable per table](configure-replication-zones.html#replication-zone-variables)). +`timestamp` | Back up data as it existed as of [`timestamp`](as-of-system-time.html). The `timestamp` must be more recent than your data's garbage collection TTL (which is controlled by the [`gc.ttlseconds` replication zone variable](configure-replication-zones.html#gc-ttlseconds)). `backup_options` | Control the backup behavior with a comma-separated list of [these options](#options). ### Targets diff --git a/v23.1/common-errors.md b/v23.1/common-errors.md index 8aba49d3168..5576bee9d73 100644 --- a/v23.1/common-errors.md +++ b/v23.1/common-errors.md @@ -128,7 +128,7 @@ When running a multi-node CockroachDB cluster, if you see an error like the one ## split failed while applying backpressure; are rows updated in a tight loop? -In CockroachDB, a table row is stored on disk as a key-value pair. Whenever the row is updated, CockroachDB also stores a distinct version of the key-value pair to enable concurrent request processing while guaranteeing consistency (see [multi-version concurrency control (MVCC)](architecture/storage-layer.html#mvcc)). All versions of a key-value pair belong to a larger ["range"](architecture/overview.html#architecture-range) of the total key space, and the historical versions remain until the garbage collection period defined by the `gc.ttlseconds` variable in the applicable [zone configuration](configure-replication-zones.html#gc-ttlseconds) has passed (25 hours by default). Once a range reaches a size threshold (512 MiB by default), CockroachDB splits the range into two ranges. However, this message indicates that a range cannot be split as intended. +In CockroachDB, a table row is stored on disk as a key-value pair. Whenever the row is updated, CockroachDB also stores a distinct version of the key-value pair to enable concurrent request processing while guaranteeing consistency (see [multi-version concurrency control (MVCC)](architecture/storage-layer.html#mvcc)). All versions of a key-value pair belong to a larger ["range"](architecture/overview.html#architecture-range) of the total key space, and the historical versions remain until the garbage collection period defined by the `gc.ttlseconds` variable in the applicable [zone configuration](configure-replication-zones.html#gc-ttlseconds) has passed. Once a range reaches a [size threshold](configure-replication-zones.html#range-max-bytes), CockroachDB [splits the range](architecture/distribution-layer.html#range-splits) into two ranges. However, this message indicates that a range cannot be split as intended. One possible cause is that the range consists only of MVCC version data due to a row being repeatedly updated, and the range cannot be split because doing so would spread MVCC versions for a single row across multiple ranges. diff --git a/v23.1/create-changefeed.md b/v23.1/create-changefeed.md index a08079c9dfc..3a341fb170a 100644 --- a/v23.1/create-changefeed.md +++ b/v23.1/create-changefeed.md @@ -151,7 +151,7 @@ Option | Value | Description `avro_schema_prefix` | Schema prefix name | Provide a namespace for the schema of a table in addition to the default, the table name. This allows multiple databases or clusters to share the same schema registry when the same table name is present in multiple databases.

Example: `CREATE CHANGEFEED FOR foo WITH format=avro, confluent_schema_registry='registry_url', avro_schema_prefix='super'` will register subjects as `superfoo-key` and `superfoo-value` with the namespace `super`. `compression` | `gzip` | Compress changefeed data files written to a [cloud storage sink](changefeed-sinks.html#cloud-storage-sink). Currently, only [Gzip](https://www.gnu.org/software/gzip/) is supported for compression. `confluent_schema_registry` | Schema Registry address | The [Schema Registry](https://docs.confluent.io/current/schema-registry/docs/index.html#sr) address is required to use `avro`.

{% include {{ page.version.version }}/cdc/confluent-cloud-sr-url.md %} -`cursor` | [Timestamp](as-of-system-time.html#parameters) | Emit any changes after the given timestamp, but does not output the current state of the table first. If `cursor` is not specified, the changefeed starts by doing an initial scan of all the watched rows and emits the current value, then moves to emitting any changes that happen after the scan.

When starting a changefeed at a specific `cursor`, the `cursor` cannot be before the configured garbage collection window (see [`gc.ttlseconds`](configure-replication-zones.html#replication-zone-variables)) for the table you're trying to follow; otherwise, the changefeed will error. With default garbage collection settings, this means you cannot create a changefeed that starts more than 25 hours in the past.

`cursor` can be used to [start a new changefeed where a previous changefeed ended.](#start-a-new-changefeed-where-another-ended)

Example: `CURSOR='1536242855577149065.0000000000'` +`cursor` | [Timestamp](as-of-system-time.html#parameters) | Emit any changes after the given timestamp, but does not output the current state of the table first. If `cursor` is not specified, the changefeed starts by doing an initial scan of all the watched rows and emits the current value, then moves to emitting any changes that happen after the scan.

When starting a changefeed at a specific `cursor`, the `cursor` cannot be before the configured garbage collection window (see [`gc.ttlseconds`](configure-replication-zones.html#replication-zone-variables)) for the table you're trying to follow; otherwise, the changefeed will error. With default garbage collection settings, this means you cannot create a changefeed that starts more than [the default MVCC garbage collection interval](configure-replication-zones.html#gc-ttlseconds) in the past.

`cursor` can be used to [start a new changefeed where a previous changefeed ended.](#start-a-new-changefeed-where-another-ended)

Example: `CURSOR='1536242855577149065.0000000000'` `diff` | N/A | Publish a `before` field with each message, which includes the value of the row before the update was applied. `end_time` | [Timestamp](as-of-system-time.html#parameters) | Indicate the timestamp up to which the changefeed will emit all events and then complete with a `successful` status. Provide a future timestamp to `end_time` in number of nanoseconds since the [Unix epoch](https://en.wikipedia.org/wiki/Unix_time). For example, `end_time="1655402400000000000"`. You cannot use `end_time` and [`initial_scan = 'only'`](#initial-scan) simultaneously. `envelope` | `key_only` / `row`* / `wrapped` | `key_only` emits only the key and no value, which is faster if you only want to know when the key changes.

`row` emits the row without any additional metadata fields in the message. *You can only use `row` with Kafka sinks or sinkless changefeeds. `row` does not support [`avro` format](#format).

`wrapped` emits the full message including any metadata fields. See [Responses](changefeed-messages.html#responses) for more detail on message format.

Default: `envelope=wrapped` diff --git a/v23.1/delete-data.md b/v23.1/delete-data.md index 376ac09c5b6..88205d26865 100644 --- a/v23.1/delete-data.md +++ b/v23.1/delete-data.md @@ -220,7 +220,7 @@ with conn.cursor() as cur: ## Performance considerations -Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. This process runs every 25 hours by default to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and running [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#replication-zone-variables) setting. +Because of the way CockroachDB works under the hood, deleting data from the database does not immediately reduce disk usage. Instead, records are marked as "deleted" and processed asynchronously by a background garbage collection process. Once the marked records are older than [the specified TTL interval](configure-replication-zones.html#gc-ttlseconds), they are eligible to be removed. The garbage collection interval is designed to allow sufficient time for running [backups](take-full-and-incremental-backups.html) and [time travel queries using `AS OF SYSTEM TIME`](as-of-system-time.html). The garbage collection interval is controlled by the [`gc.ttlseconds`](configure-replication-zones.html#gc-ttlseconds) setting. The practical implications of the above are: diff --git a/v23.1/delete.md b/v23.1/delete.md index 14a9ae08f4c..d07d8745da5 100644 --- a/v23.1/delete.md +++ b/v23.1/delete.md @@ -104,9 +104,9 @@ You can use the `@primary` alias to use the table's primary key in your query if ### Preserving `DELETE` performance over time -CockroachDB relies on [multi-version concurrency control (MVCC)](architecture/storage-layer.html#mvcc) to process concurrent requests while guaranteeing [strong consistency](frequently-asked-questions.html#how-is-cockroachdb-strongly-consistent). As such, when you delete a row, it is not immediately removed from disk. The MVCC values for the row will remain until the garbage collection period defined by the [`gc.ttlseconds`](configure-replication-zones.html#gc-ttlseconds) variable in the applicable [zone configuration](show-zone-configurations.html) has passed. By default, this period is 25 hours. +CockroachDB relies on [multi-version concurrency control (MVCC)](architecture/storage-layer.html#mvcc) to process concurrent requests while guaranteeing [strong consistency](frequently-asked-questions.html#how-is-cockroachdb-strongly-consistent). As such, when you delete a row, it is not immediately removed from disk. The MVCC values for the row will remain until the garbage collection period defined by the [`gc.ttlseconds`](configure-replication-zones.html#gc-ttlseconds) variable in the applicable [zone configuration](show-zone-configurations.html) has passed. -This means that with the default settings, each iteration of your `DELETE` statement must scan over all of the rows previously marked for deletion within the last 25 hours. If you try to delete 10,000 rows 10 times within the same 25 hour period, the 10th command will have to scan over the 90,000 rows previously marked for deletion. +This means that with the default settings, each iteration of your `DELETE` statement must scan over all of the rows previously marked for deletion within [the defined GC TTL window](configure-replication-zones.html#gc-ttlseconds). If you try to delete 10,000 rows 10 times within the GC TTL window, the 10th command will have to scan over the 90,000 rows previously marked for deletion. To preserve performance over iterative `DELETE` queries, we recommend taking one of the following approaches: diff --git a/v23.1/demo-replication-and-rebalancing.md b/v23.1/demo-replication-and-rebalancing.md index 1367e21c2c2..38040c967dd 100644 --- a/v23.1/demo-replication-and-rebalancing.md +++ b/v23.1/demo-replication-and-rebalancing.md @@ -242,7 +242,7 @@ You'll use a non-`root` user for running a client workload and accessing the DB Concept | Description --------|------------ - **Range** | CockroachDB stores all user data (tables, indexes, etc.) and almost all system data in a giant sorted map of key-value pairs. This keyspace is divided into "ranges", contiguous chunks of the keyspace, so that every key can always be found in a single range.

From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as that range reaches 512 MiB in size, it splits into two ranges. This process continues for these new ranges as the table and its indexes continue growing. + **Range** | CockroachDB stores all user data (tables, indexes, etc.) and almost all system data in a giant sorted map of key-value pairs. This keyspace is divided into "ranges", contiguous chunks of the keyspace, so that every key can always be found in a single range.

From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as that range reaches [the maximum range size](configure-replication-zones.html#range-max-bytes), it [splits into two ranges](architecture/distribution-layer.html#range-splits). This process continues for these new ranges as the table and its indexes continue growing. **Replica** | CockroachDB replicates each range (3 times by default) and stores each replica on a different node. 1. With those concepts in mind, open the DB Console at http://localhost:8080 and log in with the `maxroach` user. diff --git a/v23.1/disaster-recovery.md b/v23.1/disaster-recovery.md index ee9858aabe4..97ce06dadf2 100644 --- a/v23.1/disaster-recovery.md +++ b/v23.1/disaster-recovery.md @@ -301,7 +301,7 @@ To give yourself more time to recover and clean up the corrupted data, put your ### Run differentials -If you are within the [garbage collection window](configure-replication-zones.html#replication-zone-variables) (default is 25 hours), run [`AS OF SYSTEM TIME`](as-of-system-time.html) queries and use [`CREATE TABLE AS … SELECT * FROM`](create-table-as.html) to create comparison data and run differentials to find the offending rows to fix. +If you are within the [garbage collection window](configure-replication-zones.html#gc-ttlseconds), run [`AS OF SYSTEM TIME`](as-of-system-time.html) queries and use [`CREATE TABLE AS … SELECT * FROM`](create-table-as.html) to create comparison data and run differentials to find the offending rows to fix. If you are outside of the garbage collection window, you will need to use a [backup](backup.html) to run comparisons. @@ -312,10 +312,10 @@ If you are outside of the garbage collection window, you will need to use a [bac ### Create a new backup -If your cluster is running, you do not have a backup that encapsulates the time you want to [restore](restore.html) to, and the data you want to recover is still in the [garbage collection window](configure-replication-zones.html#replication-zone-variables), there are two actions you can take: +If your cluster is running, you do not have a backup that encapsulates the time you want to [restore](restore.html) to, and the data you want to recover is still in the [garbage collection window](configure-replication-zones.html#gc-ttlseconds), there are two actions you can take: -- If you are a core user, trigger a [backup](backup.html) using [`AS OF SYSTEM TIME`](as-of-system-time.html) to create a new backup that encapsulates the specific time. The `AS OF SYSTEM TIME` must be within the [garbage collection window](configure-replication-zones.html#replication-zone-variables) (default is 25 hours). -- If you are an {{ site.data.products.enterprise }} user, trigger a new [backup `with_revision_history`](take-backups-with-revision-history-and-restore-from-a-point-in-time.html) and you will have a backup you can use to restore to the desired point in time within the [garbage collection window](configure-replication-zones.html#replication-zone-variables) (default is 25 hours). +- If you are a core user, trigger a [backup](backup.html) using [`AS OF SYSTEM TIME`](as-of-system-time.html) to create a new backup that encapsulates the specific time. The `AS OF SYSTEM TIME` must be within the [garbage collection window](configure-replication-zones.html#gc-ttlseconds). +- If you are an {{ site.data.products.enterprise }} user, trigger a new [backup `with_revision_history`](take-backups-with-revision-history-and-restore-from-a-point-in-time.html) and you will have a backup you can use to restore to the desired point in time within the [garbage collection window](configure-replication-zones.html#gc-ttlseconds). ### Recover from corrupted data in a database or table diff --git a/v23.1/frequently-asked-questions.md b/v23.1/frequently-asked-questions.md index 10ddaa78ec3..9498c82bd18 100644 --- a/v23.1/frequently-asked-questions.md +++ b/v23.1/frequently-asked-questions.md @@ -54,7 +54,7 @@ For more details, see [Choose a Deployment Option](choose-a-deployment-option.ht CockroachDB scales horizontally with minimal operator overhead. -At the key-value level, CockroachDB starts off with a single, empty range. As you put data in, this single range eventually reaches a threshold size (512 MiB by default). When that happens, the data splits into two ranges, each covering a contiguous segment of the entire key-value space. This process continues indefinitely; as new data flows in, existing ranges continue to split into new ranges, aiming to keep a relatively small and consistent range size. +At the key-value level, CockroachDB starts off with a single, empty range. As you put data in, this single range eventually reaches [a threshold size](configure-replication-zones.html#range-max-bytes). When that happens, the data [splits into two ranges](architecture/distribution-layer.html#range-splits), each covering a contiguous segment of the entire key-value space. This process continues indefinitely; as new data flows in, existing ranges continue to split into new ranges, aiming to keep a relatively small and consistent range size. When your cluster spans multiple nodes (physical machines, virtual machines, or containers), newly split ranges are automatically rebalanced to nodes with more capacity. CockroachDB communicates opportunities for rebalancing using a peer-to-peer [gossip protocol](https://en.wikipedia.org/wiki/Gossip_protocol) by which nodes exchange network addresses, store capacity, and other information. diff --git a/v23.1/known-limitations.md b/v23.1/known-limitations.md index f3f80d0a9a9..7019dce2d2b 100644 --- a/v23.1/known-limitations.md +++ b/v23.1/known-limitations.md @@ -587,7 +587,7 @@ pq: unsupported binary operator: || ### Max size of a single column family -When creating or updating a row, if the combined size of all values in a single [column family](column-families.html) exceeds the max range size (512 MiB by default) for the table, the operation may fail, or cluster performance may suffer. +When creating or updating a row, if the combined size of all values in a single [column family](column-families.html) exceeds the [max range size](configure-replication-zones.html#range-max-bytes) for the table, the operation may fail, or cluster performance may suffer. As a workaround, you can either [manually split a table's columns into multiple column families](column-families.html#manual-override), or you can [create a table-specific zone configuration](configure-replication-zones.html#create-a-replication-zone-for-a-table) with an increased max range size. diff --git a/v23.1/migrate-from-oracle.md b/v23.1/migrate-from-oracle.md index c4b986e9280..89d30ee3207 100644 --- a/v23.1/migrate-from-oracle.md +++ b/v23.1/migrate-from-oracle.md @@ -235,7 +235,7 @@ When moving from Oracle to CockroachDB data types, consider the following: - [Schema changes within transactions](known-limitations.html#schema-changes-within-transactions) - [Schema changes between executions of prepared statements](online-schema-changes.html#no-online-schema-changes-between-executions-of-prepared-statements) - If [`JSON`](jsonb.html) columns are used only for payload, consider switching to [`BYTES`](bytes.html). -- Max size of a single column family (512 MiB by default). +- Max size of a single [column family](column-families.html) (by default, the [maximum size of a range](configure-replication-zones.html#range-max-bytes)). For more information, see [Known Limitations](known-limitations.html), [Online Schema Changes](online-schema-changes.html), and [Transactions](transactions.html). diff --git a/v23.1/show-jobs.md b/v23.1/show-jobs.md index 3d1f8554ca4..ef410bb00b3 100644 --- a/v23.1/show-jobs.md +++ b/v23.1/show-jobs.md @@ -26,7 +26,7 @@ To block a call to `SHOW JOBS` that returns after all specified job ID(s) have a - For jobs older than 12 hours, query the `crdb_internal.jobs` table. - Jobs are deleted after 14 days. This interval can be changed via the `jobs.retention_time` [cluster setting](cluster-settings.html). - While the `SHOW JOBS WHEN COMPLETE` statement is blocking, it will time out after 24 hours. -- Garbage collection jobs are created for [dropped tables](drop-table.html) and [dropped indexes](drop-index.html), and will execute after the [GC TTL](configure-replication-zones.html#replication-zone-variables) has elapsed (default is 25 hours). These jobs cannot be canceled. +- Garbage collection jobs are created for [dropped tables](drop-table.html) and [dropped indexes](drop-index.html), and will execute after the [GC TTL](configure-replication-zones.html#gc-ttlseconds) has elapsed. These jobs cannot be canceled. - CockroachDB automatically retries jobs that fail due to [retry errors](transaction-retry-error-reference.html) or job coordination failures, with [exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff). The `jobs.registry.retry.initial_delay` [cluster setting](cluster-settings.html) sets the initial delay between retries and `jobs.registry.retry.max_delay` sets the maximum delay. ## Required privileges diff --git a/v23.1/sql-tuning-with-explain.md b/v23.1/sql-tuning-with-explain.md index 26242899b69..67b6043e168 100644 --- a/v23.1/sql-tuning-with-explain.md +++ b/v23.1/sql-tuning-with-explain.md @@ -126,7 +126,7 @@ To understand why the performance improved, use [`EXPLAIN`](explain.html) to see This shows you that CockroachDB starts with the secondary index (`users@users_name_idx`). Because it is sorted by `name`, the query can jump directly to the relevant value (`/'Cheyenne Smith' - /'Cheyenne Smith'`). However, the query needs to return values not in the secondary index, so CockroachDB grabs the primary key (`city`/`id`) stored with the `name` value (the primary key is always stored with entries in a secondary index), jumps to that value in the primary index, and then returns the full row. -Because the `users` table is under 512 MiB, the primary index and all secondary indexes are contained in a single range with a single leaseholder. If the table were bigger, however, the primary index and secondary index could reside in separate ranges, each with its own leaseholder. In this case, if the leaseholders were on different nodes, the query would require more network hops, further increasing latency. +Because the `users` table is under [the maximum range size](configure-replication-zones.html#range-max-bytes), the primary index and all secondary indexes are contained in a single range with a single leaseholder. If the table were bigger, however, the primary index and secondary index could reside in separate ranges, each with its own leaseholder. In this case, if the leaseholders were on different nodes, the query would require more network hops, further increasing latency. ### Solution: Filter by a secondary index storing additional columns