Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic cache implementation for query conversion #1386

Closed
wants to merge 1 commit into from

Conversation

arnikola
Copy link
Collaborator

Initial basic cache for query conversion. Results are definitely promising:

Without cache:

image

With cache:

image

Should ideally be some sort of LRU cache or similar, possibly with passed in configuration for size

@robskillington
Copy link
Collaborator

Hm, kinda sucks though that the first query will be slow. Is there any way we could speed up just the linear speed of building the DFA? (I know probably not, but if you have 20 query instances, the cache is going to take 20 queries to even build up across the different nodes, so since it's not shared it's kind of tough to swallow)

Also do we have tracing to know how much it actually slows down the request by wall clock time? CPU time is one thing, but if we're waiting for network most of the time during a request anyway, then CPU time doesn't matter much. I guess, I'm saying, we need to focus on wall clock time vs CPU time for individual queries (until we hit just starving CPU entirely).

@arnikola
Copy link
Collaborator Author

arnikola commented Feb 20, 2019

Hm, kinda sucks though that the first query will be slow. Is there any way we could speed up just the linear speed of building the DFA? (I know probably not, but if you have 20 query instances, the cache is going to take 20 queries to even build up across the different nodes, so since it's not shared it's kind of tough to swallow)

So this is just for query migration itself; to be fair there's probably a lot of skew in the benchmarks posted that's specific to the method of generating load I'm using; I was spamming refresh on the dashboards generated by the start_m3 script, all of which have regex matchers... so the over-indexing on the regex interpolation in the benchmarks makes sense. That being said, a productionized version of this diff would make sense regardless, since even more realistic workloads would have to deal with creating m3-style queries out of the same prom queries pretty often, considering most of the workload would come from alert monitoring and saved dashboards

Also do we have tracing to know how much it actually slows down the request by wall clock time? CPU time is one thing, but if we're waiting for network most of the time during a request anyway, then CPU time doesn't matter much. I guess, I'm saying, we need to focus on wall clock time vs CPU time for individual queries (until we hit just starving CPU entirely).

We should hopefully have tracing in main soon, Andrew has a PR up for getting jaeger integration that's 95% there, so should be able to get some decent stats out of that soon too :)

All that being said, this is definitely a nice to have rather than a necessary performance improvement

@arnikola
Copy link
Collaborator Author

Resolved by #1398

@arnikola arnikola closed this Feb 26, 2019
@justinjc justinjc deleted the query_conversion_cache branch June 17, 2019 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants