Skip to content

Commit

Permalink
sql/stats: convert between histograms and quantile functions
Browse files Browse the repository at this point in the history
To predict histograms in statistics forecasts, we will use linear
regression over quantile functions. (Quantile functions are another
representation of histogram data, in a form more amenable to statistical
manipulation.)

This commit defines quantile functions and adds methods to convert
between histograms and quantile functions.

This code was originally part of #77070 but has been pulled out to
simplify that PR. A few changes have been made:
- Common code has been factored into closures.
- More checks have been added for positive values.
- In `makeQuantile` we now trim leading empty buckets as well as
  trailing empty buckets.
- The logic in `quantile.toHistogram` to steal from `NumRange` if
  `NumEq` is zero now checks that `NumRange` will still be >= 1.
- More tests have been added.

Assists: #79872

Release note: None
  • Loading branch information
michae2 committed Jul 11, 2022
1 parent c16f589 commit 4ad11bd
Show file tree
Hide file tree
Showing 3 changed files with 888 additions and 20 deletions.
19 changes: 19 additions & 0 deletions pkg/sql/stats/histogram.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@
package stats

import (
"fmt"
"math"
"sort"
"strings"

"github.com/cockroachdb/cockroach/pkg/settings"
"github.com/cockroachdb/cockroach/pkg/sql/opt/cat"
Expand Down Expand Up @@ -420,6 +422,23 @@ func (h histogram) toHistogramData(colType *types.T) (HistogramData, error) {
return histogramData, nil
}

// String prints a histogram to a string.
func (h histogram) String() string {
var b strings.Builder
b.WriteString("{[")
for i, bucket := range h.buckets {
if i > 0 {
b.WriteRune(' ')
}
fmt.Fprintf(
&b, "{%v %v %v %v}",
bucket.NumEq, bucket.NumRange, bucket.DistinctRange, bucket.UpperBound.String(),
)
}
b.WriteString("]}")
return b.String()
}

// estimatedDistinctValuesInRange returns the estimated number of distinct
// values in the range [lowerBound, upperBound), given that the total number
// of values is numRange.
Expand Down
Loading

0 comments on commit 4ad11bd

Please sign in to comment.