-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logql/parallel binop #5317
Logql/parallel binop #5317
Conversation
This reverts commit 23f6b55.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! happy all my ratio queries (exp1)/(exp2) is going to run faster now :)
Left one minor clarification though.
@@ -18,7 +18,7 @@ import ( | |||
) | |||
|
|||
const ( | |||
DefaultDownstreamConcurrency = 32 | |||
DefaultDownstreamConcurrency = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Failing to understand why it got increased (even after reading below comment on Downstreamer()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we can line it up with the max_query_parallelism ? Because 128 is half the biggest tenant we have at 256 so he will get limited by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or may be cap it higher ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was historically used to limit how many downstream queries (query-frontend -> querier) could be dispatched in parallel by a single time splitted logql query. Now that this part is controlled by the LimitedRoundTripper
instead, we only want to use the Downstreamer
concurrency to prevent us from creating unbounded goroutines. Increasing the limit to 128 still seems reasonable to that effect but is also high enough to not pre-limit anything the LimitedRoundTripper
would limit anyway. Basically, this is a crude attempt to prevent us from blowing up goroutines due to malicious queries without introducing a bottleneck in our query path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying Owen. 👍
Two things:
- Can we add this
Increasing the limit to 128 still seems reasonable to that effect but is also high enough to not pre-limit anything the LimitedRoundTripper would limit anyway.
as a comment to the const itself?
- Also +1 to have this value same as
max_query_parallelism
. Because havingmax_query_parallelism
as 256 will be allowed happily byLimitedRoundTripper
but still can get limited by thisDownstreamer
.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way we can line it up with the max_query_parallelism ? Because 128 is half the biggest tenant we have at 256 so he will get limited by this.
Yes, we could, but this is also post-splitted code, meaning that each split would need to schedule more than 128 queries for this to limit it. We may ultimately want to thread it into the MaxQueryParallelism
code, but I felt that was overengineering for the moment and could be done in another PR if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good !
* adds justification for keeping Downstreamer parallelism * loads binop legs in parallel * increases downstreamer default concurrency * astmapper spanlogger * always clone expr during mapping to prevent mutability bugs * Revert "astmapper spanlogger" This reverts commit 23f6b55. * cleanup + use errgroup
This PR does a few things:
Clone()
method for ourExpr
s to ensure we run into mutability bugs lessDownstreamer
concurrency. Now that we have theLimitedRoundTripper
, this structure is largely used to just prevent goroutine explosions via malicious queries, so it feels safe to increase this as it no longer needs to limit access to our tenant queues.Running this in one of our clusters resulted in sharded binary operations running ~10x faster 🎉