feat: optimize calculateSampleRates when there's one bucket #78

cartermp · 2024-07-13T00:41:56Z

Friday afternoon fun, don't take too seriously.

It's not uncommon to have a dynamic sampler with a single field in refinery configs, like so:

          - Name: Catchall rule
            Sampler:
              EMAThroughputSampler:
                GoalThroughputPerSec: 500 # This is spans per second for the entire cluster
                UseClusterSize: true # Ensures GoalThroughputPerSec is for the full refinery cluster and not per node
                FieldList:
                  - service.name

This change seems to run faster when I benchmark it locally. Example output:

λ ~/repos/dynsampler-go/ cartermp/maybe-optimize go test -run=^$ -bench=BenchmarkCalculateSampleRates -benchmem
goos: darwin
goarch: arm64
pkg: github.com/honeycombio/dynsampler-go
BenchmarkCalculateSampleRates/One_Optimized-8            8908784               116.1 ns/op           256 B/op          2 allocs/op
BenchmarkCalculateSampleRates/One_Original-8             7840144               152.0 ns/op           272 B/op          3 allocs/op
BenchmarkCalculateSampleRates/TwoOptimized-8             5492028               218.4 ns/op           288 B/op          3 allocs/op
BenchmarkCalculateSampleRates/TwoOriginal-8              5421360               219.9 ns/op           288 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Three_Optimized-8          4284099               279.2 ns/op           304 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Three_Original-8           4278279               280.4 ns/op           304 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Four_Optimized-8           3393247               349.5 ns/op           320 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Four_Original-8            3416569               351.9 ns/op           320 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Five_Optimized-8           2422359               499.4 ns/op           336 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Five_Original-8            2399866               499.8 ns/op           336 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Ten_Optimized-8             925006              1235 ns/op             836 B/op          4 allocs/op
BenchmarkCalculateSampleRates/Ten_Original-8              916884              1254 ns/op             836 B/op          4 allocs/op
PASS
ok      github.com/honeycombio/dynsampler-go    17.381s

I also played with some other optimizations that make the Ten_ case faster too (i.e., someone specified ten fields), but it involved replacing the math.IsNan call and I...had some tingly spidey senses regarding messing with that.

keyCalculation_test.go

MikeGoldsmith

This looks good and makes sense if we regularly see a single field entry.

We should remove the original func before merging but keep the benchmark.

kentquirk · 2024-07-15T15:15:00Z

I'd be interested to know how you know "it's not uncommon" to have a single entry -- I don't think I've ever seen it in any of the sample files I've worked with. Far more common is way too many keys.

I don't object to this optimization (other than the generic "it adds complexity", but it's really isolated here), but I'm not sure if it's actually going to be seen in practice.

cartermp · 2024-07-15T20:19:09Z

@kentquirk It's actually a way that we onboard customers now, the final dynamic sampling rule we give people is the one I mention in the PR title.

Far more common is way too many keys.

I did think about this one too and couldn't come up with anything that was meaningfully better. I would also guess that when there's a lot of key fields listed, maybe the most time is spent in creating the key rather than rebalancing.

codeboten · 2024-12-19T16:31:28Z

Closing this for now,

cartermp added 2 commits July 12, 2024 17:33

optimize calculateSampleRates when there's one bucket

8f9f441

lol

55e2616

cartermp requested a review from a team as a code owner July 13, 2024 00:41

cartermp commented Jul 13, 2024

View reviewed changes

keyCalculation_test.go Show resolved Hide resolved

slightly more realistic fake benchmark data

744a8d0

MikeGoldsmith approved these changes Jul 15, 2024

View reviewed changes

robbkidd self-assigned this Aug 20, 2024

codeboten closed this Dec 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optimize calculateSampleRates when there's one bucket #78

feat: optimize calculateSampleRates when there's one bucket #78

cartermp commented Jul 13, 2024

MikeGoldsmith left a comment

kentquirk commented Jul 15, 2024

cartermp commented Jul 15, 2024

codeboten commented Dec 19, 2024

feat: optimize calculateSampleRates when there's one bucket #78

feat: optimize calculateSampleRates when there's one bucket #78

Conversation

cartermp commented Jul 13, 2024

MikeGoldsmith left a comment

Choose a reason for hiding this comment

kentquirk commented Jul 15, 2024

cartermp commented Jul 15, 2024

codeboten commented Dec 19, 2024