-
Notifications
You must be signed in to change notification settings - Fork 545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distributor: do inflight checks before reading request (for httpgrpc requests) #6300
Conversation
pkg/mimir/grpc_push_check.go
Outdated
httpUrl := getSingleMetadata(md, httpgrpc.MetadataURL) | ||
msgSize := getMessageSize(md, grpcutil.MetadataMessageSize) | ||
|
||
if httpMethod == http.MethodPost && (httpUrl == api.PrometheusPushEndpoint || httpUrl == api.OTLPPushEndpoint) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure checking the URL for equality will work. Reason is that the endpoint is exposed under a configurable prefix (and I think the prefix is still the URL received here). What if we check if the URL ends with the requested endpoint? That's what we do in the query-frontend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done in 5cd002d.
pkg/distributor/distributor.go
Outdated
inflight := d.inflightPushRequests.Inc() | ||
defer func() { | ||
if cleanupInDefer { | ||
d.inflightPushRequests.Dec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if to keep the code more future proof, we should decrease d.inflightPushRequestsBytes
here. My take is that in in the future we'll add more checks after d.checkRequestSize()
then we'll miss subtracting it. The check here could be done on rs.requestSize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we could just change the defer function into a call to cleanupAfterPushFinished()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in adfcca0.
dbb1f77
to
25e9f1f
Compare
I think this is now ready for review. Compared to previous version of the PR, we now check size of Extension of this (for future PR) could be to check for OTLP Request size too in OTLP Push handler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job Peter! I can't see any issue. I left a couple of nits and questions (don't block on me re-reviewing it).
@@ -49,6 +49,7 @@ | |||
* [ENHANCEMENT] Querier: reduce memory consumed for queries that hit store-gateways. #6348 | |||
* [ENHANCEMENT] Ruler: include corresponding trace ID with log messages associated with rule evaluation. #6379 | |||
* [ENHANCEMENT] Querier: clarify log messages and span events emitted while querying ingesters, and include both ingester name and address when relevant. #6381 | |||
* [ENHANCEMENT] Ingester, Distributor: added experimental support for rejecting push requests received via gRPC before reading them into memory, if ingester or distributor is unable to accept the request. This is activated by using `-ingester.limit-inflight-requests-using-grpc-method-limiter` for ingester, and `-distributor.limit-inflight-requests-using-grpc-method-limiter` for distributor. #5976 #6300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention the experimental flags in docs/sources/mimir/configure/about-versioning.md
too, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/mimir/grpc_push_check.go
Outdated
// Let's check httpgrpc metadata, if present. | ||
httpMethod := getSingleMetadata(md, httpgrpc.MetadataMethod) | ||
httpUrl := getSingleMetadata(md, httpgrpc.MetadataURL) | ||
msgSize := getMessageSize(md, grpcutil.MetadataMessageSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] We could move this inside the if
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pkg/mimir/grpc_push_check_test.go
Outdated
|
||
ctx, err = l.RPCCallStarting(context.Background(), httpgrpcHandleMethod, metadata.New(map[string]string{ | ||
httpgrpc.MetadataMethod: "POST", | ||
httpgrpc.MetadataURL: api.PrometheusPushEndpoint, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] Would be nice to add a prefix to ensure it matches anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
il := d.getInstanceLimits() | ||
if il.MaxInflightPushRequests > 0 && inflight > int64(il.MaxInflightPushRequests) { | ||
d.rejectedRequests.WithLabelValues(reasonDistributorMaxInflightPushRequests).Inc() | ||
return ctx, nil, errMaxInflightRequestsReached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm the reason why we removed the wrapping to util_log.DoNotLogError{}
is because all errors returned by this function will not be logged anyway? If so, I would add a comment to clarify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment in 00e660f. Wrapping is still done in limitsMiddleware
.
|
||
func (d *Distributor) checkWriteRequestSize(rs *requestState, writeRequestSize int64) error { | ||
// If writeRequestSize was already checked, don't check it again. | ||
if rs.writeRequestSize > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I'm good with this check, but could it ever happen that we call checkWriteRequestSize()
twice for the same request? If yes, in which case? I can't see it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how that would happen. It's only called from limitsMiddleware
, and not from grpc method check. (We only call checkHttpgrpcRequestSize
from there).
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
… requests. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
acc3661
to
304de04
Compare
Use prefix in tests. Signed-off-by: Peter Štibraný <[email protected]>
…otLogError. Signed-off-by: Peter Štibraný <[email protected]>
Signed-off-by: Peter Štibraný <[email protected]>
Thanks for review. I've addressed your feedback, setting auto-merge. 🎉 |
What this PR does
This PR implements inflight checks in Distributor by using gRPC method limiter introduced in grafana/dskit#377. This PR relies on unmerged dskit PR grafana/dskit#392 (waiting for new gRPC-go release).
This PR also relies on clients passing the correct metadata headers for httpgrpc requests (using methods introduced in grafana/dskit#391).
To be rebased on top of #6301, after it's merged.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]