-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding mapping cache #198
Adding mapping cache #198
Conversation
72875fd
to
c6a635a
Compare
c6a635a
to
1f952fe
Compare
There are two unrelated changes in this PR: caching mapping lookups and changing the way config file reloads are handled. IMO, these should be two PRs - one for each behavioral change. |
I think that's a fair point, I went down this path after a prior PR where it sounded like the preference was to group like-changes together, but the context was different (grouping together a behavioral change and an observability increase VS grouping together two behavioral changes). I'll split off the config reloading into a new PR |
74f4bbf
to
a9b901f
Compare
7ed8341
to
2fd8a00
Compare
So, using the LRU cache brought me from 70 ns/op -> ~118 ns/op, which to me is acceptable for the feature gain of a max size, but adding a hit/miss counter bumped me up to 500 ns/op from a goroutine, and to ~220 ns/op for a single threaded approach, neither of which seemed worth it to me. With that in mind, I'll be tracking the size of the cache, but not a counter of cache requests. |
2fd8a00
to
ae2681e
Compare
main.go
Outdated
@@ -144,6 +144,8 @@ func main() { | |||
mappingConfig = kingpin.Flag("statsd.mapping-config", "Metric mapping configuration file name.").String() | |||
readBuffer = kingpin.Flag("statsd.read-buffer", "Size (in bytes) of the operating system's transmit read buffer associated with the UDP connection. Please make sure the kernel parameters net.core.rmem_max is set to a value greater than the value specified.").Int() | |||
dumpFSMPath = kingpin.Flag("debug.dump-fsm", "The path to dump internal FSM generated for glob matching as Dot file.").Default("").String() | |||
|
|||
cacheSize = kingpin.Flag("statsd.cache-size", "Maximum size of your metric mapping cache. Relies on least recently used replacement policy if max size is reached.").Default("1000").Int() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep the LRU out of the usage and document it in the README maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the unit is not immediately clear from this – "size" suggests bytes but I think it's the number of cached mappings?
cacheSize = kingpin.Flag("statsd.cache-size", "Maximum size of your metric mapping cache. Relies on least recently used replacement policy if max size is reached.").Default("1000").Int() | |
cacheSize = kingpin.Flag("statsd.cache-size", "Maximum number of cached mappings.").Default("1000").Int() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've got it!
pkg/mapper/mapper.go
Outdated
} else { | ||
cache, err := NewMetricMapperCache(cacheSize) | ||
if err != nil { | ||
log.Warnf("Unable to setup metric cache. Performance may be negatively impacted. Caused by: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the bread crumbs, the only error that could happen here would be invalid configuration. I would prefer to
- prevent negative cache sizes ourselves
- if we fail to initialize the cache for another reason, fail hard
I'm not a fan of silent failures, even if it's something "optional" like a cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from that, this is purely a matter of taste (so no need to change it if you disagree): I'm not a fan of "if use $feature …" – could we install a NOOP cache instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do so! I should have followed the breadcrumbs more carefully myself.
ae2681e
to
5c9edb7
Compare
Signed-off-by: SpencerMalone <[email protected]>
5c9edb7
to
35d1a99
Compare
I think this should be good to go! |
Beautiful, thanks a lot for pushing this through! |
Signed-off-by: Matthias Rampke <[email protected]>
* [CHANGE] Do not run as root in the Docker container by default ([#202](#202)) * [FEATURE] Add metric for count of events by action ([#193](#193)) * [FEATURE] Add metric for count of distinct metric names ([#200](#200)) * [FEATURE] Add UNIX socket listener support ([#199](#199)) * [FEATURE] Accept Datadog [distributions](https://docs.datadoghq.com/graphing/metrics/distributions/) ([#211](#211)) * [ENHANCEMENT] Add a health check to the Docker container ([#182](#182)) * [ENHANCEMENT] Allow inconsistent label sets ([#194](#194)) * [ENHANCEMENT] Speed up sanitization of metric names ([#197](#197)) * [ENHANCEMENT] Enable pprof endpoints ([#205](#205)) * [ENHANCEMENT] DogStatsD tag parsing is faster ([#210](#210)) * [ENHANCEMENT] Cache mapped metrics ([#198](#198)) * [BUGFIX] Fix panic if a mapping resulted in an empty name ([#192](#192)) * [BUGFIX] Ensure that there are always default quantiles if using summaries ([#212](#212)) * [BUGFIX] Prevent ingesting conflicting metric types that would make scraping fail ([#213](#213)) With #192, the count of events rejected because of negative counter increments has moved into the `statsd_exporter_events_error_total` metric, instead of being lumped in with the different kinds of successful events. Signed-off-by: Matthias Rampke <[email protected]>
Mapping is one of the most expensive operations done, but throwing a cache in front of it is an easy way to reduce that pain.
Alongside that, updating the config invokes a lock and wipes the cache, and on k8s if your config is a configmap, it will be regularly synced into the container (even with no changes), at which point you will get needless config reloads (which now wipes the cache, just to be safe). Only reloading if the sha changes is a good way to keep the cache around as long as possible.
Here's the benchmark results: