Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding mapping cache #198

Merged
merged 1 commit into from
May 15, 2019
Merged

Conversation

SpencerMalone
Copy link
Contributor

@SpencerMalone SpencerMalone commented Apr 3, 2019

Mapping is one of the most expensive operations done, but throwing a cache in front of it is an easy way to reduce that pain.

Alongside that, updating the config invokes a lock and wipes the cache, and on k8s if your config is a configmap, it will be regularly synced into the container (even with no changes), at which point you will get needless config reloads (which now wipes the cache, just to be safe). Only reloading if the sha changes is a good way to keep the cache around as long as possible.

Here's the benchmark results:

BenchmarkGlob10Rules-12                                 	 5000000	       258 ns/op
BenchmarkGlob10RulesCached-12                           	20000000	        71.8 ns/op
BenchmarkRegex10RulesAverage-12                         	 1000000	      1134 ns/op
BenchmarkRegex10RulesAverageCached-12                   	20000000	        71.9 ns/op
BenchmarkGlob100Rules-12                                	 5000000	       260 ns/op
BenchmarkGlob100RulesCached-12                          	20000000	        72.3 ns/op
BenchmarkGlob100RulesMultipleCaptures-12                	 1000000	      1908 ns/op
BenchmarkGlob100RulesMultipleCapturesCached-12          	20000000	        73.6 ns/op
BenchmarkRegex100RulesMultipleCapturesWorst-12          	  100000	     11116 ns/op
BenchmarkRegex100RulesMultipleCapturesWorstCached-12    	20000000	        74.7 ns/op

@SpencerMalone SpencerMalone force-pushed the mapping-cache branch 2 times, most recently from 72875fd to c6a635a Compare April 3, 2019 16:03
@SpencerMalone SpencerMalone changed the title Adding mapping cache Adding mapping cache, only refresh config if file contents are changed Apr 3, 2019
pkg/mapper/mapper_cache.go Outdated Show resolved Hide resolved
@bakins
Copy link
Contributor

bakins commented Apr 3, 2019

There are two unrelated changes in this PR: caching mapping lookups and changing the way config file reloads are handled. IMO, these should be two PRs - one for each behavioral change.

@SpencerMalone
Copy link
Contributor Author

SpencerMalone commented Apr 3, 2019

I think that's a fair point, I went down this path after a prior PR where it sounded like the preference was to group like-changes together, but the context was different (grouping together a behavioral change and an observability increase VS grouping together two behavioral changes). I'll split off the config reloading into a new PR

@SpencerMalone SpencerMalone force-pushed the mapping-cache branch 3 times, most recently from 74f4bbf to a9b901f Compare April 4, 2019 14:29
@SpencerMalone SpencerMalone changed the title Adding mapping cache, only refresh config if file contents are changed Adding mapping cache Apr 4, 2019
@SpencerMalone SpencerMalone force-pushed the mapping-cache branch 6 times, most recently from 7ed8341 to 2fd8a00 Compare April 15, 2019 14:32
@SpencerMalone
Copy link
Contributor Author

So, using the LRU cache brought me from 70 ns/op -> ~118 ns/op, which to me is acceptable for the feature gain of a max size, but adding a hit/miss counter bumped me up to 500 ns/op from a goroutine, and to ~220 ns/op for a single threaded approach, neither of which seemed worth it to me.

With that in mind, I'll be tracking the size of the cache, but not a counter of cache requests.

main.go Outdated
@@ -144,6 +144,8 @@ func main() {
mappingConfig = kingpin.Flag("statsd.mapping-config", "Metric mapping configuration file name.").String()
readBuffer = kingpin.Flag("statsd.read-buffer", "Size (in bytes) of the operating system's transmit read buffer associated with the UDP connection. Please make sure the kernel parameters net.core.rmem_max is set to a value greater than the value specified.").Int()
dumpFSMPath = kingpin.Flag("debug.dump-fsm", "The path to dump internal FSM generated for glob matching as Dot file.").Default("").String()

cacheSize = kingpin.Flag("statsd.cache-size", "Maximum size of your metric mapping cache. Relies on least recently used replacement policy if max size is reached.").Default("1000").Int()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep the LRU out of the usage and document it in the README maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the unit is not immediately clear from this – "size" suggests bytes but I think it's the number of cached mappings?

Suggested change
cacheSize = kingpin.Flag("statsd.cache-size", "Maximum size of your metric mapping cache. Relies on least recently used replacement policy if max size is reached.").Default("1000").Int()
cacheSize = kingpin.Flag("statsd.cache-size", "Maximum number of cached mappings.").Default("1000").Int()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got it!

} else {
cache, err := NewMetricMapperCache(cacheSize)
if err != nil {
log.Warnf("Unable to setup metric cache. Performance may be negatively impacted. Caused by: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the bread crumbs, the only error that could happen here would be invalid configuration. I would prefer to

  • prevent negative cache sizes ourselves
  • if we fail to initialize the cache for another reason, fail hard

I'm not a fan of silent failures, even if it's something "optional" like a cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from that, this is purely a matter of taste (so no need to change it if you disagree): I'm not a fan of "if use $feature …" – could we install a NOOP cache instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do so! I should have followed the breadcrumbs more carefully myself.

pkg/mapper/mapper_cache.go Show resolved Hide resolved
Signed-off-by: SpencerMalone <[email protected]>
@SpencerMalone
Copy link
Contributor Author

I think this should be good to go!

@matthiasr
Copy link
Contributor

Beautiful, thanks a lot for pushing this through!

@matthiasr matthiasr merged commit 698bcdf into prometheus:master May 15, 2019
matthiasr pushed a commit that referenced this pull request May 15, 2019
Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this pull request May 15, 2019
* [CHANGE] Do not run as root in the Docker container by default ([#202](#202))
* [FEATURE] Add metric for count of events by action ([#193](#193))
* [FEATURE] Add metric for count of distinct metric names ([#200](#200))
* [FEATURE] Add UNIX socket listener support ([#199](#199))
* [FEATURE] Accept Datadog [distributions](https://docs.datadoghq.com/graphing/metrics/distributions/) ([#211](#211))
* [ENHANCEMENT] Add a health check to the Docker container ([#182](#182))
* [ENHANCEMENT] Allow inconsistent label sets ([#194](#194))
* [ENHANCEMENT] Speed up sanitization of metric names ([#197](#197))
* [ENHANCEMENT] Enable pprof endpoints ([#205](#205))
* [ENHANCEMENT] DogStatsD tag parsing is faster ([#210](#210))
* [ENHANCEMENT] Cache mapped metrics ([#198](#198))
* [BUGFIX] Fix panic if a mapping resulted in an empty name ([#192](#192))
* [BUGFIX] Ensure that there are always default quantiles if using summaries ([#212](#212))
* [BUGFIX] Prevent ingesting conflicting metric types that would make scraping fail ([#213](#213))

With #192, the count of events rejected because of negative counter increments has moved into the `statsd_exporter_events_error_total` metric, instead of being lumped in with the different kinds of successful events.

Signed-off-by: Matthias Rampke <[email protected]>
matthiasr pushed a commit that referenced this pull request May 15, 2019
In #198 the signature for setting up the exporter changed slightly, a
change that #212 and #213 didn't have. This doesn't change anything
substantial.

Signed-off-by: Matthias Rampke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants