-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: avoid estimated row counts of 0 #32578
Comments
@RaduBerinde, can you take a look? |
The estimated row count after "offset 9999" is 0, which makes the sort "free", and the plan with the sort is considered first. There are two fixes (we probably want both):
|
Add a one-time cpuCostFactor to all operators. This reflects the time taken to set up execution for the operator, and will result in plans with fewer operators all else being equal (e.g. when the estimated row count is 0). Informs cockroachdb#32578. Release note: None
32616: opt: add one-time cost for all operators r=RaduBerinde a=RaduBerinde Add a one-time cpuCostFactor to all operators. This reflects the time taken to set up execution for the operator, and will result in plans with fewer operators all else being equal (e.g. when the estimated row count is 0). Informs #32578. Release note: None Note: while this fixes the specific issue mentioned and seems like a reasonable idea on its own, I think having row count = 0 is also problematic because everything above that operator won't be optimized properly. 0 should be reserved for the case where we know for sure there are 0 rows. I don't know how to make that happen without making the "count" something more complicated (e.g. a count + a variance, or a confidence interval). Co-authored-by: Radu Berinde <[email protected]>
The specific case reported here is fixed, however I'm going to repurpose the issue to track the larger problem of 0 row counts. We should avoid estimated row counts of 0 short of situations where we know it's exactly 0. A row count of 0 prevents the optimizer from making reasonable choices since operators have the same cost. Note that in this example, without stats, we are subtracting a definite count from an "arbitrary unit" which is also problematic. |
I think we could fix this without causing ripples by always adding 1 to the estimated row count. That way, it's always > 0, but still smoothly preserves relative costs between plans. |
Every operator would add 1? So for example, a project would have 1 more row than its input? Or maybe just in those cases where we are making an estimation? Related to this, I think that whenever we are using real table stats that say 0 rows, we should override them to 1 row (or a few rows). |
I was mostly thinking about table stats. I'm suggesting we could always add 1 to them so we never get an estimate of 0 rows. I'm assuming that most (all?) other estimates will be non-zero as long as they have non-zero inputs. The only operators that should return 0 rows would be those where we know there are zero rows, like a |
Oh, yeah, that makes sense. |
This commit improves our statistics estimates so that we never estimate zero rows unless the row count is provably zero (e.g., SELECT ... WHERE false). We want to avoid estimating zero rows since the stats may be stale, and we can end up with weird and inefficient plans if we estimate zero rows. Therefore, this commit changes the logic in the statisticsBuilder so that a row count of 0 is replaced with 1, unless that would be inconsistent with the cardinality. This commit also updates all estimates for distinct count and null count to ensure that they are never larger than the row count. We also ensure that there is at least one distinct or null value if row count > 0. Fixes cockroachdb#32578 Release note: None
This commit improves our statistics estimates so that we never estimate zero rows unless the row count is provably zero (e.g., SELECT ... WHERE false). We want to avoid estimating zero rows since the stats may be stale, and we can end up with weird and inefficient plans if we estimate zero rows. This commit also updates all estimates for distinct count and null count to ensure that they are never larger than the row count. We also ensure that there is at least one distinct or null value if row count > 0. Fixes cockroachdb#32578 Release note: None
37729: opt: avoid estimating row count = 0 r=rytaft a=rytaft This commit improves our statistics estimates so that we never estimate zero rows unless the row count is provably zero (e.g., `SELECT ... WHERE false`). We want to avoid estimating zero rows since the stats may be stale, and we can end up with weird and inefficient plans if we estimate zero rows. This commit also updates all estimates for distinct count and null count to ensure that they are never larger than the row count. We also ensure that there is at least one distinct or null value if row count > 0. Fixes #32578 Release note: None Co-authored-by: Rebecca Taft <[email protected]>
Given the simple table:
Check out the following two queries. I expect the output to look like the first one, always, and never the second one.
Good plan (only a single sort):
Bad plan (two sorts):
As far as I can tell, the table should already be sorted after the offset - so there should be no reason to put another sort in between the offset and the limit.
The text was updated successfully, but these errors were encountered: