Basic cache implementation for query conversion #1386

arnikola · 2019-02-19T16:16:32Z

Initial basic cache for query conversion. Results are definitely promising:

Without cache:

With cache:

Should ideally be some sort of LRU cache or similar, possibly with passed in configuration for size

robskillington · 2019-02-20T03:52:17Z

Hm, kinda sucks though that the first query will be slow. Is there any way we could speed up just the linear speed of building the DFA? (I know probably not, but if you have 20 query instances, the cache is going to take 20 queries to even build up across the different nodes, so since it's not shared it's kind of tough to swallow)

Also do we have tracing to know how much it actually slows down the request by wall clock time? CPU time is one thing, but if we're waiting for network most of the time during a request anyway, then CPU time doesn't matter much. I guess, I'm saying, we need to focus on wall clock time vs CPU time for individual queries (until we hit just starving CPU entirely).

arnikola · 2019-02-20T21:11:11Z

Hm, kinda sucks though that the first query will be slow. Is there any way we could speed up just the linear speed of building the DFA? (I know probably not, but if you have 20 query instances, the cache is going to take 20 queries to even build up across the different nodes, so since it's not shared it's kind of tough to swallow)

So this is just for query migration itself; to be fair there's probably a lot of skew in the benchmarks posted that's specific to the method of generating load I'm using; I was spamming refresh on the dashboards generated by the start_m3 script, all of which have regex matchers... so the over-indexing on the regex interpolation in the benchmarks makes sense. That being said, a productionized version of this diff would make sense regardless, since even more realistic workloads would have to deal with creating m3-style queries out of the same prom queries pretty often, considering most of the workload would come from alert monitoring and saved dashboards

Also do we have tracing to know how much it actually slows down the request by wall clock time? CPU time is one thing, but if we're waiting for network most of the time during a request anyway, then CPU time doesn't matter much. I guess, I'm saying, we need to focus on wall clock time vs CPU time for individual queries (until we hit just starving CPU entirely).

We should hopefully have tracing in main soon, Andrew has a PR up for getting jaeger integration that's 95% there, so should be able to get some decent stats out of that soon too :)

All that being said, this is definitely a nice to have rather than a necessary performance improvement

arnikola · 2019-02-26T15:53:47Z

Resolved by #1398

Basic cache implementation for query conversion

462ed76

arnikola assigned benraskin92 Feb 19, 2019

arnikola closed this Feb 26, 2019

justinjc deleted the query_conversion_cache branch June 17, 2019 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic cache implementation for query conversion #1386

Basic cache implementation for query conversion #1386

arnikola commented Feb 19, 2019

robskillington commented Feb 20, 2019

arnikola commented Feb 20, 2019 •

edited

Loading

arnikola commented Feb 26, 2019

Basic cache implementation for query conversion #1386

Basic cache implementation for query conversion #1386

Conversation

arnikola commented Feb 19, 2019

robskillington commented Feb 20, 2019

arnikola commented Feb 20, 2019 • edited Loading

arnikola commented Feb 26, 2019

arnikola commented Feb 20, 2019 •

edited

Loading