-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API usability improvements in sampling package #31940
Conversation
…tor-contrib into jmacd/tvaluesampler
…tor-contrib into jmacd/tvaluesampler
…ype similar to configcomprsesion.CompressionType
…tor-contrib into jmacd/tvaluesampler
// | ||
// return fixedThreshold.ShouldSample(rnd) | ||
// } | ||
// func MakeDecision(tracestate string, tid TraceID) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the Go toolchain is reformatting this comment block for me. 🤷
// Calculate the amount of precision needed to encode the | ||
// threshold with reasonable precision. Here, we count the | ||
// number of leading `0` or `f` characters and automatically | ||
// add precision to preserve relative error near the extremes. | ||
// | ||
// Frexp() normalizes both the fraction and one-minus the | ||
// fraction, because more digits of precision are needed if | ||
// either value is near zero. Frexp returns an exponent <= 0. | ||
// | ||
// If `exp <= -4`, there will be a leading hex `0` or `f`. | ||
// For every multiple of -4, another leading `0` or `f` | ||
// appears, so this raises precision accordingly. | ||
_, expF := math.Frexp(fraction) | ||
_, expR := math.Frexp(1 - fraction) | ||
precision = min(NumHexDigits, max(precision+expF/-hexBits, precision+expR/-hexBits)) | ||
|
||
// Compute the threshold | ||
scaled := uint64(math.Round(fraction * float64(MaxAdjustedCount))) | ||
threshold := MaxAdjustedCount - scaled | ||
|
||
// Round to the specified precision, if less than the maximum. | ||
if shift := hexBits * (NumHexDigits - precision); shift != 0 { | ||
half := uint64(1) << (shift - 1) | ||
threshold += half | ||
threshold >>= shift | ||
threshold <<= shift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note I removed ErrPrecisionUnderflow
because it was not a meaningful error condition -- I would otherwise work around the problem. While investigating the head-sampler changes for OTel-Go I found a much nicer solution, copied here (with new testing).
Reviewers: the new API surface is exercised in #31946. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Sorry @jmacd for the long time waiting, Just rerunning the build and hopefully we should be all good to merge this in. |
…ation, prepare for OTEP 235 support (#31946) **Description:** Refactors the probabilistic sampling processor to prepare it for more OTEP 235 support. This clarifies existing inconsistencies between tracing and logging samplers, see the updated README. The tracing priority mechanism applies a 0% or 100% sampling override (e.g., "1" implies 100% sampling), whereas the logging sampling priority mechanism supports variable-probability override (e.g., "1" implies 1% sampling). This pins down cases where no randomness is available, and organizes the code to improve readability. A new type called `randomnessNamer` carries the randomness information (from the sampling pacakge) and a name of the policy that derived it. When sampling priority causes the effective sampling probability to change, the value "sampling.priority" replaces the source of randomness, which is currently limited to "trace_id_hash" or the name of the randomess-source attribute, for logs. While working on #31894, I discovered that some inputs fall through to the hash function with zero bytes of input randomness. The hash function, computed on an empty input (for logs) or on 16 bytes of zeros (which OTel calls an invalid trace ID), would produce a fixed random value. So, for example, when logs are sampled and there is no TraceID and there is no randomness attribute value, the result will be sampled at approximately 82.9% and above. In the refactored code, an error is returned when there is no input randomness. A new boolean configuration field determines the outcome when there is an error extracting randomness from an item of telemetry. By default, items of telemetry with errors will not pass through the sampler. When `FailClosed` is set to false, items of telemetry with errors will pass through the sampler. The original hash function, which uses 14 bits of information, is structured as an "acceptance threshold", ultimately the test for sampling translated into a positive decision when `Randomness < AcceptThreshold`. In the OTEP 235 scheme, thresholds are rejection thresholds--this PR modifies the original 14-bit accept threshold into a 56-bit reject threshold, using Threshold and Randomness types from the sampling package. Reframed in this way, in the subsequent PR (i.e., #31894) the effective sampling probability will be seamlessly conveyed using OTEP 235 semantic conventions. Note, both traces and logs processors are now reduced to a function like this: ``` return commonSamplingLogic( ctx, l, lsp.sampler, lsp.failClosed, lsp.sampler.randomnessFromLogRecord, lsp.priorityFunc, "logs sampler", lsp.logger, ) ``` which is a generic function that handles the common logic on a per-item basis and ends in a single metric event. This structure makes it clear how traces and logs are processed differently and have different prioritization schemes, currently. This structure also makes it easy to introduce new sampler modes, as shown in #31894. After this and #31940 merge, the changes in #31894 will be relatively simple to review as the third part in a series. **Link to tracking Issue:** Depends on #31940. Part of #31918. **Testing:** Added. Existing tests already cover the exact random behavior of the current hashing mechanism. Even more testing will be introduced with the last step of this series. Note that #32360 is added ahead of this test to ensure refactoring does not change results. **Documentation:** Added. --------- Co-authored-by: Kent Quirk <[email protected]>
Description:
Several usability issues were ironed out while working on #31894. This PR is the pkg/sampling changes from that PR.
Highlights:
NeverSampleThreshold
value, which is the exclusive upper-boundary of threshold values. This makes negative sampling decisions easier to manage, as shown in [probabilistic sampling processor] encoded sampling probability (support OTEP 235) #31894.AllProbabilitiesRandomness
value, which is the inclusive upper-boundary of Randomness values. This makes error handling more natural as shown in [probabilistic sampling processor] encoded sampling probability (support OTEP 235) #31894. All thresholds exceptNeverSampleThreshold
will be sampled atAllProbabilitiesRandomness
.UnsignedToThreshold
constructor for explicit threshold construction. This is useful in the case of [probabilistic sampling processor] encoded sampling probability (support OTEP 235) #31894 because it constructs a 14-bit threshold value.UnsignedToRandomness
constructor for explicit randomness construction. This is useful in the case of [probabilistic sampling processor] encoded sampling probability (support OTEP 235) #31894 because it constructs randomness values from log records w/o use of TraceState.UpdateTValueWithSampling
to avoid the potential for an inconsistent update through mis-use (as identified in [probabilistic sampling processor] encoded sampling probability (support OTEP 235) #31894, there is less optimization potential because sampling.threshold modifies thresholds in all modes).ErrPrecisionUnderflow
error condition and automatically corrects the problem by extending precision near 0.0 and 1.0 where there are obligatory leadingf
or0
digits.Link to tracking Issue: #31918
Testing: New tests added for coverage.
Documentation: New comments to explain.