Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize calculateSampleRates when there's one bucket #78

Closed
wants to merge 3 commits into from

Conversation

cartermp
Copy link
Member

Friday afternoon fun, don't take too seriously.

It's not uncommon to have a dynamic sampler with a single field in refinery configs, like so:

          - Name: Catchall rule
            Sampler:
              EMAThroughputSampler:
                GoalThroughputPerSec: 500 # This is spans per second for the entire cluster
                UseClusterSize: true # Ensures GoalThroughputPerSec is for the full refinery cluster and not per node
                FieldList:
                  - service.name

This change seems to run faster when I benchmark it locally. Example output:

λ ~/repos/dynsampler-go/ cartermp/maybe-optimize go test -run=^$ -bench=BenchmarkCalculateSampleRates -benchmem
goos: darwin
goarch: arm64
pkg: github.com/honeycombio/dynsampler-go
BenchmarkCalculateSampleRates/One_Optimized-8            8908784               116.1 ns/op           256 B/op          2 allocs/op
BenchmarkCalculateSampleRates/One_Original-8             7840144               152.0 ns/op           272 B/op          3 allocs/op
BenchmarkCalculateSampleRates/TwoOptimized-8             5492028               218.4 ns/op           288 B/op          3 allocs/op
BenchmarkCalculateSampleRates/TwoOriginal-8              5421360               219.9 ns/op           288 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Three_Optimized-8          4284099               279.2 ns/op           304 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Three_Original-8           4278279               280.4 ns/op           304 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Four_Optimized-8           3393247               349.5 ns/op           320 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Four_Original-8            3416569               351.9 ns/op           320 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Five_Optimized-8           2422359               499.4 ns/op           336 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Five_Original-8            2399866               499.8 ns/op           336 B/op          3 allocs/op
BenchmarkCalculateSampleRates/Ten_Optimized-8             925006              1235 ns/op             836 B/op          4 allocs/op
BenchmarkCalculateSampleRates/Ten_Original-8              916884              1254 ns/op             836 B/op          4 allocs/op
PASS
ok      github.com/honeycombio/dynsampler-go    17.381s

I also played with some other optimizations that make the Ten_ case faster too (i.e., someone specified ten fields), but it involved replacing the math.IsNan call and I...had some tingly spidey senses regarding messing with that.

@cartermp cartermp requested a review from a team as a code owner July 13, 2024 00:41
Copy link
Contributor

@MikeGoldsmith MikeGoldsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and makes sense if we regularly see a single field entry.

We should remove the original func before merging but keep the benchmark.

@kentquirk
Copy link
Contributor

I'd be interested to know how you know "it's not uncommon" to have a single entry -- I don't think I've ever seen it in any of the sample files I've worked with. Far more common is way too many keys.

I don't object to this optimization (other than the generic "it adds complexity", but it's really isolated here), but I'm not sure if it's actually going to be seen in practice.

@cartermp
Copy link
Member Author

@kentquirk It's actually a way that we onboard customers now, the final dynamic sampling rule we give people is the one I mention in the PR title.

Far more common is way too many keys.

I did think about this one too and couldn't come up with anything that was meaningfully better. I would also guess that when there's a lot of key fields listed, maybe the most time is spent in creating the key rather than rebalancing.

@robbkidd robbkidd self-assigned this Aug 20, 2024
@codeboten
Copy link

Closing this for now,

@codeboten codeboten closed this Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants