-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission: enable disk bandwidth bottleneck resource #86857
Comments
Part of cockroachdb#86857. This commit eliminate the need to provide the disk-name common environments e.g. linux with store on EBS or GCP PD. To make use of AC's disk bandwidth tokens, users still need to specify the provisioned bandwidth, for now. So in a sense this machinery is still "disabled by default". Next steps: - automatically measure provisioned bandwidth, using something like github.com/irfansharif/probe, gate behind envvars or cluster settings; - add roachtests that make use of these disk bandwidth tokens; - roll it out in managed environments; - roll it out elsewhere. Release note: None
Part of cockroachdb#86857. This commit eliminate the need to provide the disk-name common environments e.g. linux with store on EBS or GCP PD. To make use of AC's disk bandwidth tokens, users still need to specify the provisioned bandwidth, for now. So in a sense this machinery is still "disabled by default". They can also do this through the kvadmission.store.provisioned_bandwidth cluster setting. Next steps: - add roachtests that make use of these disk bandwidth tokens; - automatically measure provisioned bandwidth, using something like github.com/irfansharif/probe, gate behind envvars or cluster settings; - roll it out in managed environments; - roll it out elsewhere. Release note: None
Part of cockroachdb#86857. This commit eliminate the need to provide the disk-name common environments e.g. linux with store on EBS or GCP PD. To make use of AC's disk bandwidth tokens, users still need to specify the provisioned bandwidth, for now. So in a sense this machinery is still "disabled by default". They can also do this through the kvadmission.store.provisioned_bandwidth cluster setting. Next steps: - add roachtests that make use of these disk bandwidth tokens; - automatically measure provisioned bandwidth, using something like github.com/irfansharif/probe, gate behind envvars or cluster settings; - roll it out in managed environments; - roll it out elsewhere. Release note: None
Integration test for disk bandwidth tokens, copying over what we ran in \cockroachdb#82813. Part of cockroachdb#86857 Release note: None
We should remember to also extend the disk bandwidth control to encompass disk writes of sstables due to incoming range snapshots. |
This commit cleans up changes from cockroachdb#119885. There is no longer a need for users to specify a disk name when specifying the provisioned bandwidth since we can now automatically infer disk names from the `StoreSpec.Path` and the underlying block device. Informs: cockroachdb#86857. Epic: None. Release note (ops change): The provisioned-rate field, if specified, should no longer accept a disk-name or an optional bandwidth field. To use the disk bandwidth constraint the store-spec must contain provisioned-rate=bandwidth=<bandwidth-bytes/s>, otherwise the cluster setting kv.store.admission.provisioned_bandwidth will be used.
120895: admission: remove `DiskName` from `StoreSpec.ProvisionedRateSpec` r=sumeerbhola a=CheranMahalingam This commit cleans up changes from #119885. There is no longer a need for users to specify a disk name when specifying the provisioned bandwidth since we can now automatically infer disk names from the `StoreSpec.Path` and the underlying block device. Informs: #86857. Epic: None. Release note (ops change): The provisioned-rate field, if specified, should no longer accept a disk-name or an optional bandwidth field. To use the disk bandwidth constraint the store-spec must contain provisioned-rate=bandwidth=<bandwidth-bytes/s>, otherwise the cluster setting kv.store.admission.provisioned_bandwidth will be used. Co-authored-by: Cheran Mahalingam <[email protected]>
As a part of online restore, we'll need admission control to control downloading of external sstables so that we can download as fast as we can without affecting foreground workload latencies. |
Previously, we would calculate elastic disk bandwidth tokens using arbitrary load thresholds and an estimate on incoming bytes into the LSM through flushes and ingestions. This calculation lacked accounting for write amplification in the LSM. This patch simplifies the disk bandwidth limiter to remove the disk load watcher and simply adjust tokens using the known provisioned disk bandwidth. For token deducation, we create a write-amp model that is a relationship between incoming LSM bytes to actual disk writes. The token granting semantics are as follows: - elastic writes: deduct tokens, and wait for positive count in bucket. - regular writes: deduct tokens, but proceed even with no tokens available. The requested write bytes are "inflated" using the estimated write amplification to account for async compactions in the LSM. This patch also lays the framework for future integrations where we can account for range snapshot ingestions separately as those don't incur the same write amplification as flushed LSM bytes do. Informs: cockroachdb#86857 Release note: None
Previously, we would calculate elastic disk bandwidth tokens using arbitrary load thresholds and an estimate on incoming bytes into the LSM through flushes and ingestions. This calculation lacked accounting for write amplification in the LSM. This patch simplifies the disk bandwidth limiter to remove the disk load watcher and simply adjust tokens using the known provisioned disk bandwidth. For token deducation, we create a write-amp model that is a relationship between incoming LSM bytes to actual disk writes. The token granting semantics are as follows: - elastic writes: deduct tokens, and wait for positive count in bucket. - regular writes: deduct tokens, but proceed even with no tokens available. The requested write bytes are "inflated" using the estimated write amplification to account for async compactions in the LSM. This patch also lays the framework for future integrations where we can account for range snapshot ingestions separately as those don't incur the same write amplification as flushed LSM bytes do. Informs: cockroachdb#86857 Release note: None
Previously, we would calculate elastic disk bandwidth tokens using arbitrary load thresholds and an estimate on incoming bytes into the LSM through flushes and ingestions. This calculation lacked accounting for write amplification in the LSM. This patch simplifies the disk bandwidth limiter to remove the disk load watcher and simply adjust tokens using the known provisioned disk bandwidth. For token deducation, we create a write-amp model that is a relationship between incoming LSM bytes to actual disk writes. The token granting semantics are as follows: - elastic writes: deduct tokens, and wait for positive count in bucket. - regular writes: deduct tokens, but proceed even with no tokens available. The requested write bytes are "inflated" using the estimated write amplification to account for async compactions in the LSM. This patch also lays the framework for future integrations where we can account for range snapshot ingestions separately as those don't incur the same write amplification as flushed LSM bytes do. Informs: cockroachdb#86857 Release note: None
129005: admission: account for write-amp in disk bandwidth limiter r=sumeerbhola a=aadityasondhi Previously, we would calculate elastic disk bandwidth tokens using arbitrary load thresholds and an estimate on incoming bytes into the LSM through flushes and ingestions. This calculation lacked accounting for write amplification in the LSM. This patch simplifies the disk bandwidth limiter to remove the disk load watcher and simply adjust tokens using the known provisioned disk bandwidth. For token deducation, we create a write-amp model that is a relationship between incoming LSM bytes to actual disk writes. The token granting semantics are as follows: - elastic writes: deduct tokens, and wait for positive count in bucket. - regular writes: deduct tokens, but proceed even with no tokens available. The requested write bytes are "inflated" using the estimated write amplification to account for async compactions in the LSM. This patch also lays the framework for future integrations where we can account for range snapshot ingestions separately as those don't incur the same write amplification as flushed LSM bytes do. Informs: #86857 Release note: None Co-authored-by: Aaditya Sondhi <[email protected]>
@aadityasondhi @nicktrav Is this in progress or done or still in backlog? If it is in progress or done, can we update the status? |
The functionality should be complete as of #129005. The remaining work would be to:
The enable part means setting the cluster setting or store config flags for the provisioned bandwidth. |
Informs cockroachdb#86857 Release note: None
This patch contains some small improvements to better test the bandwidth subtest of the snapshot ingest roachtest. Informs cockroachdb#86857 Release note: None
This patch contains some small improvements to better test the bandwidth subtest of the snapshot ingest roachtest. Informs cockroachdb#86857 Release note: None
This patch contains some small improvements to better test the bandwidth subtest of the snapshot ingest roachtest. Informs cockroachdb#86857 Release note: None
This patch contains some small improvements to better test the bandwidth subtest of the snapshot ingest roachtest. Informs cockroachdb#86857 Release note: None
135019: roachtest: use disk stall utility to limit bandwidth in AC tests r=itsbilal a=aadityasondhi Informs #86857 Release note: None Co-authored-by: Aaditya Sondhi <[email protected]>
Informs cockroachdb#86857 Release note: None
This patch contains some small improvements to better test the bandwidth subtest of the snapshot ingest roachtest. Informs cockroachdb#86857 Release note: None
135022: roachtest: snapshot ingest roachtest improvements r=sumeerbhola a=aadityasondhi This patch contains some small improvements to better test the bandwidth subtest of the snapshot ingest roachtest. Informs #86857 Release note: None Co-authored-by: Aaditya Sondhi <[email protected]>
At this point, this is complete. The remaining work to enable in cloud is tracked internally in Jira. |
Followup to #82898 which created the basic infrastructure, configuration scheme, and did experimentation with regular and elastic kv0 traffic.
cc: @irfansharif @andrewbaptist
Jira issue: CRDB-18968
Epic: CRDB-37479
The text was updated successfully, but these errors were encountered: