Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster glob matching using a finite state machine #157

Merged
merged 23 commits into from
Oct 10, 2018

Conversation

fffonion
Copy link
Contributor

@fffonion fffonion commented Sep 10, 2018

This patch proposes a fast and efficient matching algorithm that as a drop in replacement to existing glob matching. On our production environment, the go regexp engine which glob matching uses takes a lot more CPU time than system calls under heavy workloads. We have deployed this patch on our environment and seeing a great performance boost. See this blog if you are interested with the background : )

By default, a general string takes higher priority than * when matching. Backtrack will run under the following circumstances:

  • The rules needing backtrack to be mapped correctly. A warning log like WARN[0000] backtracking required because of match "test2.*.*",matching performance may be degraded source="mapper.go:418" will be printed.
  • glob_disable_ordering is set to false or unset.

Full changelog:

  • Add a new cli argument statsd.dump-fsm to dump the current state machine to a GNU dot file. This can help to debug the behaviour of internal state machine.
  • Add a new config glob_disable_ordering to defaults section to indicate that the rules are not strictly in the order of it's occurrence in the mapping config. Setting this to on will boost matching performance under certain large amount of sets of rules.

Signed-off-by: Wangchong Zhou <[email protected]>
@matthiasr
Copy link
Contributor

Hmm hmm, this is very intriguing. I am not sure about adding another match type that is so similar to glob … is it theoretically possible to make the FSM method fully compatible with the current behavior of the glob type? What would be the performance impact of that be? Where fallbacks are necessary, can this be automatically detected (e.g. whether backtracking is necessary?)

@matthiasr
Copy link
Contributor

… what I am thinking is: can we turn this into a full-featured replacement of the current glob implementation? And then, if necessary, add specific settings for further optimizations like treating * with lower priority?

@matthiasr
Copy link
Contributor

and where ever we can, auto-detect and warn instead of requiring an explicit setting, like

WARN: backtracking required for match client.*.request.size.*, falling back to slower glob method

(but I would prefer if we could sensibly do the backtracking in the FSM implementation instead of indefinitely carrying around the regex-based-glob implementation)

@fffonion
Copy link
Contributor Author

fffonion commented Sep 10, 2018

Yes it's possible to implement the backtrack. Basically it will traverse over all possible transitions until it get one match. The performance impact added will be very little as I can see, since each lookup of transitions is in a map. I like this approach better too, optionally we can detect it when loading as well and also give user an option to do backtracking or not. (And we don't need the fsm_fallback anymore)

The ordering of rules can be implemented by replacing the map with array, so that we can keep the order when iterating over it. The performance impact will depend on the types of rules, the more transitions it has for each state, the longer the array will be, and the more the performance impact it will have. Having multiple captures in the label is similar to that and we can basically have a array to store the captureIdx'es and iterate it when formatting. One thing is that once we go this approach, the performance impact on doing backtrack will be magnified.

How does this sound to you:

  • By default this fsm matcher replaces glob, it uses array to find transitions, having multiple captures and supports backtrack and should pass all the tests for glob matching
  • Have a option that satisfy simpler use cases, it uses map to find transitions, can have only one capture per label

@matthiasr
Copy link
Contributor

matthiasr commented Sep 10, 2018 via email

@fffonion
Copy link
Contributor Author

That sounds perfect! what would be a good name for the latter option?
Should it be one, or a separate toggle for each?

Hmm it's hard to name it separately but I also feel it's better to separate them from end-user's view. Do you have any suggestions on this?

I would like to pull the debug "dump FSM" option out of the mapping
configuration into a flag

Yep this sounds good to me.

@matthiasr
Copy link
Contributor

matthiasr commented Sep 10, 2018 via email

@fffonion
Copy link
Contributor Author

Yep let me try to implement this : ) Thanks for discussion!

Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
@fffonion
Copy link
Contributor Author

Hi @matthiasr, I've finished developing fsm as a drop-in replacement for glob matching. Could you take another look?

Ordering is implemented with minimum performance penalty, still using a map. And we do backtrack only we detect that rules need backtracking. Also I added a test TestRPS and shows that the new algorithm is about 33% faster than using regex for glob, when ordering is disabled and no backtrack is needed.

Signed-off-by: Wangchong Zhou <[email protected]>
Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Functionality wise, this is the best solution. However, I would like to work on the code organization a bit:

With all the FSM traversal and regex matching in the GetMapping function, it becomes really hard to follow that. Can you break that out into one or more functions?

Would it be possible to collect everything FSM related (except for method calls from the initialization and GetMapping) into a separate file, or even a separate package? I would prefer the latter, because it forces us to think about a sensible interface for it.

I think for dumping the FSM state, there is still too much responsibility in the mapper – I don't want it to handle opening and closing a file. Can you change it so that there is a method to dump the FSM? This method accept a writer, and then main.go should decide when to call this method (after loading the configuration? or only after the initial configuration load) and what to write to – the FSM specific code should not have to care whether that's a bufio.Writer, a string writer, or stdout.

Finally, since the motivation for this change is performance, could you please add benchmarks that demonstrate the different cases (regex/glob, ordering/no ordering, backtracking necessary / no backtracking)? Ideally with different numbers of mappings. That will also give us something to point to in the README to help users understand the choices and their consequences.

And finally, all this needs to be documented in the README, especially the user-facing configuration and flags.

Thank you for working through this with me, and sorry for the delay in reviewing!

@fffonion
Copy link
Contributor Author

I'll address the comment soon! Thank you!

Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
@fffonion
Copy link
Contributor Author

@matthiasr I've finished the dev part of your comment.

GetMapping func in mapper is cleaned up(no gotos 😄) also more comments are added.

FSM parts are moved to a sub package fsm inside mapper package, because I feel like they are not in a equal level and mapper should include fsm .

The function dumpFSM is also changed to accept a io.Writer instance.

For the performance part, the test func TestPerformance is refined and charged with more cases. In short, fsm will have best performance if the rules amount is not so less and captures (use of *) is not to much in a single rule. Ordering and backtracking won't hurt performance much.

I'll start working on the doc. Would you like that to be a separate PR or inside this one with separate commits?

Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
@matthiasr matthiasr changed the title Implement "fsm" match type Faster glob matching using a finite state machine Sep 26, 2018
@matthiasr
Copy link
Contributor

I renamed the PR to reflect the direction it has taken

@fffonion
Copy link
Contributor Author

I'll reflect those in the doc : ) Awesome, thanks!

Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really starting to take on shape!

I am wondering about all the places where the FSM needs to understand the statsd name format and does split("."), and conversely does all the formatting into "names" and "labels". I feel like we are really close to not having this entanglement – what if we promote both into the mapper package? (or break them into another sub-package but that may be excessive)

For example, what if instead of func (f *FSM) GetMapping(statsdMetric string, statsdMetricType string) (interface{}, string, prometheus.Labels, bool) we had func (f *FSM) Get(fragments []string) (interface{} result, bool matched)? The caller (the mapping package) would have to take care of splitting the metric name, and prepending the mapping type since it's essentially a match before we even start matching names. The FSM in turn could be completely agnostic to these – it would simply feed a slice of strings into the state machine and spit out the result, without knowing the meaning of either.

@@ -118,6 +120,20 @@ func watchConfig(fileName string, mapper *mapper.MetricMapper) {
}
}

func dumpFSM(mapper *mapper.MetricMapper, dumpFilename string) error {
f, err := os.Create(dumpFilename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the file already exists? What should the behavior be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will rewrite the existing file, I feel this is acceptable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sounds good!

main.go Outdated
@@ -126,6 +142,7 @@ func main() {
statsdListenTCP = kingpin.Flag("statsd.listen-tcp", "The TCP address on which to receive statsd metric lines. \"\" disables it.").Default(":9125").String()
mappingConfig = kingpin.Flag("statsd.mapping-config", "Metric mapping configuration file name.").String()
readBuffer = kingpin.Flag("statsd.read-buffer", "Size (in bytes) of the operating system's transmit read buffer associated with the UDP connection. Please make sure the kernel parameters net.core.rmem_max is set to a value greater than the value specified.").Int()
dumpFSMPath = kingpin.Flag("statsd.dump-fsm", "The path to dump internal FSM generated for glob matching as Dot file.").Default("").String()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now I'm wondering if the whole --statsd.* flag namespace makes sense. Maybe there should be a "mapping" namespace? (then --statsd.mapping-config should also move into that, but that should be a separate change)

I'm really not sure about this, so it's fine to leave as is. just putting the idea here …

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or for this particular one, it could be --debug.dump-fsm? since this is really only a debugging feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i second on the debug.dump-fsm, let me change it.

main.go Outdated
if *dumpFSMPath != "" {
err := dumpFSM(mapper, *dumpFSMPath)
if err != nil {
log.Fatal("Error dumpping FSM:", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: dumpping -> dumping

@@ -31,18 +31,24 @@ var (
metricLineRE = regexp.MustCompile(`^(\*\.|` + statsdMetricRE + `\.)+(\*|` + statsdMetricRE + `)$`)
metricNameRE = regexp.MustCompile(`^([a-zA-Z_]|` + templateReplaceRE + `)([a-zA-Z0-9_]|` + templateReplaceRE + `)*$`)
labelNameRE = regexp.MustCompile(`^[a-zA-Z_][a-zA-Z0-9_]+$`)

templateReplaceCaptureRE = regexp.MustCompile(`\$\{?([a-zA-Z0-9_\$]+)\}?`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still needed here? the only reference seems to be in the fsm package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it's not needed, will cleanup

)

var (
templateReplaceCaptureRE = regexp.MustCompile(`\$\{?([a-zA-Z0-9_\$]+)\}?`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to formatter.go please, since it's only used there

mappings = append(mappings, mapping.Match)
}
}
n.FSM.TestIfNeedBacktracking(mappings)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the value of this? as it is, isn't this a NOOP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, the method also mutates the FSM, I would prefer not to do that – see my comment there.

if i2 != i1 && len(re1.FindStringSubmatchIndex(r2)) > 0 {
// log if we care about ordering and the superset occurs before
if !f.disableOrdering && i1 < i2 {
log.Warnf("match \"%s\" is a super set of match \"%s\" but in a lower order, "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity: match \"%s\" is a super set of match \"%s\" but comes first, it will never be matched

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need \n when logging

// backtracking will always be needed if ordering of rules is not disabled
// since transistions are stored in (unordered) map
// note: don't move this branch to the beginning of this function
// since we need logs for superset rules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only log if !f.disableOrdering, and we only honor the result of the computation otherwise. would it make sense to split this into two methods:

  • one that, assuming ordered matches, logs warnings for dead rules
  • one that, assuming unordered matches, calculates whether backtracking is needed

then call one or the other, depending on the disableOrdering configuration?

// since transistions are stored in (unordered) map
// note: don't move this branch to the beginning of this function
// since we need logs for superset rules
f.needsBacktracking = !f.disableOrdering || needBacktrack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please either change f (imperative style) or return something (functional style) and set in the caller. I prefer the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can see, if you split "log warnings for ordered mappings" from "calculate backtracking need for unordered mappings", then neither method needs the FSM itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i'll fix this along with the other comment


}

// DumpFSM accepts a io.writer and write the current FSM into dot file format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move this method to its own file? it's fairly extra from everything else

Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the documentation, I think I would like to have two for different audiences:

in the existing README, explain the ordering configuration option and why it matters.

in a new README in the mapper package explain how the FSM works, how mappings are translated into a state machine, and how a metric is matched against these. This can reference existing documents (Wikipedia or a some other intro). I don't have a CS background, and it took me a bit to get what is happening here. The FSM method is quite a bit more complex than the naive regexing that was happening before, and I would like to give me-in-half-a-year something to hold on to and understand what is happening.

In general, the comments before exported functions (what gets turned into godoc) could also be more extensive – describe what the inputs and outputs of each method mean, at least. The comments should form full sentences and end in a period.

@@ -75,6 +81,20 @@ var defaultQuantiles = []metricObjective{
{Quantile: 0.99, Error: 0.001},
}

func min(x, y int) int {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are moved into the FSM now too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah, sorry 😅

return rules
}

func TestPerformance(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write this in terms of a Go benchmark – it handles the number of iterations automatically for us. Also move it to a separate file, just because this one is so terribly long already.

@matthiasr
Copy link
Contributor

Thank you for continuing to work on this! I am sorry for all the comments and back and forth – this is a pretty massive change and I am a bit scared that we won't be able to understand and improve it later unless we get it really well structured and documented now.

@fffonion
Copy link
Contributor Author

No problem, I totally understand that.: ) I'll address the comments and finish the docs, it will take some time but I'll let you know when I get there.

Signed-off-by: Wangchong Zhou <[email protected]>
@matthiasr
Copy link
Contributor

almost forgot: please also add go test -bench . to the TravisCI build so we can continuously see the results there. thank you!

Signed-off-by: Wangchong Zhou <[email protected]>
Signed-off-by: Wangchong Zhou <[email protected]>
@fffonion fffonion force-pushed the feat/fsm-matcher branch 3 times, most recently from ea07aca to 581ea4e Compare October 3, 2018 23:54
Signed-off-by: Wangchong Zhou <[email protected]>
@fffonion
Copy link
Contributor Author

fffonion commented Oct 4, 2018

Hi @matthiasr, I've finished the doc and addressed your comments! Could you take another look please?

Matthias Rampke added 5 commits October 10, 2018 21:18
straightening out some wording, and fixing unquoted `*`.

Signed-off-by: Matthias Rampke <[email protected]>
Format quote as such, capitalize FSM.

Signed-off-by: Matthias Rampke <[email protected]>
Fixing a typo and language.

Signed-off-by: Matthias Rampke <[email protected]>
we should not change the vendored Makefile.common here.

Signed-off-by: Matthias Rampke <[email protected]>
Copy link
Contributor

@matthiasr matthiasr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a few minor tweaks, and it's good to go now! Thank you very much for getting this together, the performance improvements are significant, and now everybody using the glob matching can benefit from them.

@matthiasr matthiasr merged commit 00e2c3f into prometheus:master Oct 10, 2018
@fffonion
Copy link
Contributor Author

Thank you @matthiasr for reviewing and taking care of this!

@hbagdi hbagdi deleted the feat/fsm-matcher branch October 10, 2018 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants