-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: improve planning heuristics #113898
Comments
cc @cockroachdb/cdc |
@jayshrivastava FYI: with #114710 the work on this issue might become |
Linking this related issue: https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1701101549923459 |
This commit introduces several changes. It makes `replicaoracle.BinPackingChoice` the explicit default for changefeeds. Previously, we would use `physicalplan.DefaultReplicaChooser`, which would point to the bin packing oracle. This change is better since it gives the changefeed package more control over its dependencies. A new cluster setting `changefeed.default_range_distribution_strategy` is added to specify how changefeeds should distribute work. In the default case, `none`, we defer to distsql to select nodes and distribute work among them. This is the same behavior as running a changefeed today. In the other case, `balanced_simple`, we still let distsql choose nodes for us, but we attempt to evently distribute ranges over them. This case is exactly the same as setting `changefeed.balance_range_distribution.enable = true`, but it is allowed to be used outside of initial scans (`changefeed.balance_range_distribution.enable` was only allowed with `initial_scan=only`. This change also deprecates `changefeed.balance_range_distribution.enable` in favor of `changefeed.default_range_distribution_strategy` because the latter can be expanded in the future to add more functionality. This change also moves changefeed distribution tests to a dedicated file: `changefeed_dist_test.go`. This file contains a tester which can be used to start a cluster with a given topology and range distribution. Changefeed tests can use the tester to perform various load balancing strategies. Right now, the new tests assert how ranges are distributed after planning changefeeds, which implicitly tests code outside of the changefeedccl package (ie. `dsp.PartitionSpans()`). However, this is important since changefeeds rely on the behavior of this black box to perform well. In the future, cdc will implement more distribution strategies and likely implement its own partitioning logic. Tests for these strategies can be added in this file. Over time, cdc can move away from distsql planning. Informs: cockroachdb#113898 Epic: None Release note (enterprise change): `changefeed.balance_range_distribution.enable` is now deprecated. Users should use a new cluster setting `changefeed.default_range_distribution_strategy` instead. `changefeed.default_range_distribution_strategy='balanced_simple'` has the same effect as setting `changefeed.balance_range_distribution.enable=true`. It does not require `initial_scan='only'`, which was required by the old setting.
This commit introduces several changes. It makes `replicaoracle.BinPackingChoice` the explicit default for changefeeds. Previously, we would use `physicalplan.DefaultReplicaChooser`, which would point to the bin packing oracle. This change is better since it gives the changefeed package more control over its dependencies. A new cluster setting `changefeed.default_range_distribution_strategy` is added to specify how changefeeds should distribute work. In the default case, `none`, we defer to distsql to select nodes and distribute work among them. This is the same behavior as running a changefeed today. In the other case, `balanced_simple`, we still let distsql choose nodes for us, but we attempt to evently distribute ranges over them. This case is exactly the same as setting `changefeed.balance_range_distribution.enable = true`, but it is allowed to be used outside of initial scans (`changefeed.balance_range_distribution.enable` was only allowed with `initial_scan=only`. This change also deprecates `changefeed.balance_range_distribution.enable` in favor of `changefeed.default_range_distribution_strategy` because the latter can be expanded in the future to add more functionality. This change also moves changefeed distribution tests to a dedicated file: `changefeed_dist_test.go`. This file contains a tester which can be used to start a cluster with a given topology and range distribution. Changefeed tests can use the tester to perform various load balancing strategies. Right now, the new tests assert how ranges are distributed after planning changefeeds, which implicitly tests code outside of the changefeedccl package (ie. `dsp.PartitionSpans()`). However, this is important since changefeeds rely on the behavior of this black box to perform well. In the future, cdc will implement more distribution strategies and likely implement its own partitioning logic. Tests for these strategies can be added in this file. Over time, cdc can move away from distsql planning. Informs: cockroachdb#113898 Epic: None Release note (enterprise change): `changefeed.balance_range_distribution.enable` is now deprecated. Users should use a new cluster setting `changefeed.default_range_distribution_strategy` instead. `changefeed.default_range_distribution_strategy='balanced_simple'` has the same effect as setting `changefeed.balance_range_distribution.enable=true`. It does not require `initial_scan='only'`, which was required by the old setting.
…ans) Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: cockroachdb#113898 Epic: None
…ans) Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: cockroachdb#113898 Epic: None
This change adds support for `changefeed.default_range_distribution_strategy='balanced_full'`. `balanced_full` distributes ranges evenly across all healthy nodes. The implementation first gets a plan from sql and rebalances it. Rebalancing is better than balancing / assigning ranges directly because the final plan will leverage some optimizations from distsql. For example, distsql tends to assign local ranges to nodes. If a node has many local ranges and is assigned more ranges compared to other nodes, then we reassign some of those ranges during rebalancing. The node still gets to keep as many local ranges as possible. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies. This setting avoids rebalancing a small changefeed across many nodes. The default value is 1024. This change also retires the `changefeed.balance_range_distribution.sensitivity` setting. This setting existed to reduce the amount of work to perform during rebalancing (ie. only rebalance a partition to within 5% of the ideal number of ranges per partition). This optimization is not necessary at this moment because rebalancing is relatively cheap and only happens at the start of a changefeed. Also, mechanism would sometimes cause bad distributions. Consider a scenario where we 25 ranges are rebalanced over 5 nodes. With some sensitivity, 6 ranges would be allowed per node. This can result in the imbalanced distribution: 1, 6, 6, 6, 6. Removing sensitivity results in the ideal distribution: 5, 5, 5, 5, 5 (or close to the ideal distribution when ranges cannot be divided evenly). Informs: cockroachdb#113898 Epic: None Release note (enterprise change): This change adds a new value `balanced_full` for the setting `changefeed.default_range_distribution_strategy`. Setting this value will make changefeeds attempt to balance work evenly over at many nodes as possible (while obeying locality restrictions). This is different than `balanced_simple` which does not always leverage all possible nodes. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies (ie. changing `changefeed.default_range_distribution_strategy`) will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies.
This change adds support for `changefeed.default_range_distribution_strategy='balanced_full'`. `balanced_full` distributes ranges evenly across all healthy nodes. The implementation first gets a plan from sql and rebalances it. Rebalancing is better than balancing / assigning ranges directly because the final plan will leverage some optimizations from distsql. For example, distsql tends to assign local ranges to nodes. If a node has many local ranges and is assigned more ranges compared to other nodes, then we reassign some of those ranges during rebalancing. The node still gets to keep as many local ranges as possible. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies. This setting avoids rebalancing a small changefeed across many nodes. The default value is 1024. This change also retires the `changefeed.balance_range_distribution.sensitivity` setting. This setting existed to reduce the amount of work to perform during rebalancing (ie. only rebalance a partition to within 5% of the ideal number of ranges per partition). This optimization is not necessary at this moment because rebalancing is relatively cheap and only happens at the start of a changefeed. Also, mechanism would sometimes cause bad distributions. Consider a scenario where we 25 ranges are rebalanced over 5 nodes. With some sensitivity, 6 ranges would be allowed per node. This can result in the imbalanced distribution: 1, 6, 6, 6, 6. Removing sensitivity results in the ideal distribution: 5, 5, 5, 5, 5 (or close to the ideal distribution when ranges cannot be divided evenly). Informs: cockroachdb#113898 Epic: None Release note (enterprise change): This change adds a new value `balanced_full` for the setting `changefeed.default_range_distribution_strategy`. Setting this value will make changefeeds attempt to balance work evenly over at many nodes as possible (while obeying locality restrictions). This is different than `balanced_simple` which does not always leverage all possible nodes. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies (ie. changing `changefeed.default_range_distribution_strategy`) will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies.
…ans) Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: cockroachdb#113898 Epic: None
This change adds support for `changefeed.default_range_distribution_strategy='balanced_full'`. `balanced_full` distributes ranges evenly across all healthy nodes. The implementation first gets a plan from sql and rebalances it. Rebalancing is better than balancing / assigning ranges directly because the final plan will leverage some optimizations from distsql. For example, distsql tends to assign local ranges to nodes. If a node has many local ranges and is assigned more ranges compared to other nodes, then we reassign some of those ranges during rebalancing. The node still gets to keep as many local ranges as possible. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies. This setting avoids rebalancing a small changefeed across many nodes. The default value is 1024. This change also retires the `changefeed.balance_range_distribution.sensitivity` setting. This setting existed to reduce the amount of work to perform during rebalancing (ie. only rebalance a partition to within 5% of the ideal number of ranges per partition). This optimization is not necessary at this moment because rebalancing is relatively cheap and only happens at the start of a changefeed. Also, mechanism would sometimes cause bad distributions. Consider a scenario where we 25 ranges are rebalanced over 5 nodes. With some sensitivity, 6 ranges would be allowed per node. This can result in the imbalanced distribution: 1, 6, 6, 6, 6. Removing sensitivity results in the ideal distribution: 5, 5, 5, 5, 5 (or close to the ideal distribution when ranges cannot be divided evenly). Informs: cockroachdb#113898 Epic: None Release note (enterprise change): This change adds a new value `balanced_full` for the setting `changefeed.default_range_distribution_strategy`. Setting this value will make changefeeds attempt to balance work evenly over at many nodes as possible (while obeying locality restrictions). This is different than `balanced_simple` which does not always leverage all possible nodes. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies (ie. changing `changefeed.default_range_distribution_strategy`) will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies.
This change adds support for `changefeed.default_range_distribution_strategy='balanced_full'`. `balanced_full` distributes ranges evenly across all healthy nodes. The implementation first gets a plan from sql and rebalances it. Rebalancing is better than balancing / assigning ranges directly because the final plan will leverage some optimizations from distsql. For example, distsql tends to assign local ranges to nodes. If a node has many local ranges and is assigned more ranges compared to other nodes, then we reassign some of those ranges during rebalancing. The node still gets to keep as many local ranges as possible. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies. This setting avoids rebalancing a small changefeed across many nodes. The default value is 1024. This change also retires the `changefeed.balance_range_distribution.sensitivity` setting. This setting existed to reduce the amount of work to perform during rebalancing (ie. only rebalance a partition to within 5% of the ideal number of ranges per partition). This optimization is not necessary at this moment because rebalancing is relatively cheap and only happens at the start of a changefeed. Also, mechanism would sometimes cause bad distributions. Consider a scenario where we 25 ranges are rebalanced over 5 nodes. With some sensitivity, 6 ranges would be allowed per node. This can result in the imbalanced distribution: 1, 6, 6, 6, 6. Removing sensitivity results in the ideal distribution: 5, 5, 5, 5, 5 (or close to the ideal distribution when ranges cannot be divided evenly). Informs: cockroachdb#113898 Epic: None Release note (enterprise change): This change adds a new value `balanced_full` for the setting `changefeed.default_range_distribution_strategy`. Setting this value will make changefeeds attempt to balance work evenly over at many nodes as possible (while obeying locality restrictions). This is different than `balanced_simple` which does not always leverage all possible nodes. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies (ie. changing `changefeed.default_range_distribution_strategy`) will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies.
This change adds support for `changefeed.default_range_distribution_strategy='balanced_full'`. `balanced_full` distributes ranges evenly across all healthy nodes. The implementation first gets a plan from sql and rebalances it. Rebalancing is better than balancing / assigning ranges directly because the final plan will leverage some optimizations from distsql. For example, distsql tends to assign local ranges to nodes. If a node has many local ranges and is assigned more ranges compared to other nodes, then we reassign some of those ranges during rebalancing. The node still gets to keep as many local ranges as possible. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies. This setting avoids rebalancing a small changefeed across many nodes. The default value is 1024. This change also retires the `changefeed.balance_range_distribution.sensitivity` setting. This setting existed to reduce the amount of work to perform during rebalancing (ie. only rebalance a partition to within 5% of the ideal number of ranges per partition). This optimization is not necessary at this moment because rebalancing is relatively cheap and only happens at the start of a changefeed. Also, mechanism would sometimes cause bad distributions. Consider a scenario where we 25 ranges are rebalanced over 5 nodes. With some sensitivity, 6 ranges would be allowed per node. This can result in the imbalanced distribution: 1, 6, 6, 6, 6. Removing sensitivity results in the ideal distribution: 5, 5, 5, 5, 5 (or close to the ideal distribution when ranges cannot be divided evenly). Informs: cockroachdb#113898 Epic: None Release note (enterprise change): This change adds a new value `balanced_full` for the setting `changefeed.default_range_distribution_strategy`. Setting this value will make changefeeds attempt to balance work evenly over at many nodes as possible (while obeying locality restrictions). This is different than `balanced_simple` which does not always leverage all possible nodes. This change also adds the setting `changefeed.range_distribution_threshold` which controls when distribution strategies (ie. changing `changefeed.default_range_distribution_strategy`) will take effect. It defines the minimum number of ranges which a changefeed must watch to trigger distribution strategies.
…ans) Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: cockroachdb#113898 Epic: None
Reducing priority level to P-3. We already have a rebalancing strategy (balanced simple distribution) that addresses the overload concerns in the support issue, so we do not have an urgent need for additional strategies. |
…ans) Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: cockroachdb#113898 Epic: None
…ans) Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: #113898 Epic: None
115375: changefeedccl: reduce rebalancing memory usage from O(ranges) to O(spans) r=jayshrivastava a=jayshrivastava ### sql: count ranges per partition in PartitionSpans This change updates span partitioning to count ranges while making partitions. This allows callers to rebalance partitions based on range counts without having to iterate over the spans to count ranges. Release note: None Epic: None ### changefeedccl: reduce rebalancing memory usage from O(ranges) to O(spans) #115375 Previously, the `rebalanceSpanPartitions` would use O(ranges) memory. This change rewrites it to use range iterators, reducing the memory usage to O(spans). This change also adds a randomized test to assert that all spans are accounted for after rebalancing. It also adds one more unit test. Informs: #113898 Epic: None ### changefeedccl: add rebalancing checks This change adds extra test coverage for partition rebalancing in changefeeds. It adds checks which are performed after rebalancing to assert that the output list of spans covers exactly the same keys as the input list of spans. These checks are expensive so they only run if the environment variable `COCKROACH_CHANGEFEED_TESTING_REBALANCING_CHECKS` is true. This variable is true in cdc roachtests and unit tests. Release note: None Epic: None 119885: storage: support per-store IO metrics with fine granularity r=jbowens,abarganier a=CheranMahalingam Currently, timeseries metrics are collected on a 10s interval which hides momentary spikes in IO. This commit introduces a central disk monitoring system that polls for disk stats at a 100ms interval. Additionally, the current system accumulates disk metrics across all block devices which includes noise from unrelated processes. This commit also adds support for exporting per-store IO metrics (i.e. IO stats for block devices that map to stores used by Cockroach). These changes will be followed up by a PR to remove the need for customers to specify disk names when setting the provisioned bandwidth for each store as described in #109350. Fixes: #104114, #112898. Informs: #89786. Epic: None. Release note: None. 120649: changefeedccl: avoid undefined behavior in distribution test r=wenyihu6 a=jayshrivastava The `rangeDistributionTester` would sometimes calculate log(0) when determining the node to move a range too. Most of the time, this would be some garbage value which gets ignored. Sometimes, this may return a valid node id, causing the range distribution to be wrong and failing the test failures. This change updates the tester to handle this edge case. Closes: #120470 Release note: None Co-authored-by: Jayant Shrivastava <[email protected]> Co-authored-by: Cheran Mahalingam <[email protected]>
See https://github.com/cockroachlabs/support/issues/2679.
The problem:
In this scenario, there's a changefeed running with
execution_locality = foo
on a table which is configured to with a leaseholder preference in regionbar
. The table foo has 150k ranges. Regionbar
has ~20 nodes and regionfoo
has ~10.We observed the following problems:
foo
was assigned ~60k ranges to watch with the next highest being ~3k ranges. This imbalance makes the changefeed run slower. When the leaseholders are non-local, we don't know what heuristics distsql will use to assign work to change aggregators. We should investigate and consider making our own planning logic. We should consider usingchangefeed.balance_range_distribution.enable
always, even after initial scans.The solution:
We want to add "planning modes" where we can choose how to distribute work to nodes when we plan a changefeed. Namely, we want 3 modes:
Jira issue: CRDB-33248
The text was updated successfully, but these errors were encountered: