-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Add ingester handler for shutdown and forget tokens #6179
Conversation
e3bd8f0
to
5211bb5
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.2%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
- loki -0.1% |
cddb5c9
to
cc4695e
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.2%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
- loki -0.1% |
cc4695e
to
4016250
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.2%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
- loki -0.1% |
4016250
to
18c75e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look great. Just left some comments.
docs/sources/api/_index.md
Outdated
@@ -47,6 +47,7 @@ These endpoints are exposed by the ingester: | |||
|
|||
- [`POST /flush`](#post-flush) | |||
- [`POST /ingester/flush_shutdown`](#post-ingesterflush_shutdown) | |||
- [`POST /ingester/shutdown_and_forget`](#post-ingestershutdown_and_forget) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shutdown_and_forget
doesn't make it clear what it would forget. What do you think about calling it shutdown_and_forget_ring_tokens
or shutdown_and_remove_ring_tokens
?
We can also add a query param to flush_shutdown
like /ingester/flush_shutdown?forget_ring_tokens=true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the one hand I like the idea of a single endpoint, but on the other hand, I find it weird to have query parameters in URLs that only allow POST
requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense then to change the URL to /ingester/shutdown
and make both flush
and delete_tokens
URL parameters.
/ingester/shutdown?flush=true
/ingester/shutdown?delete_tokens=true
(Of course keep the existing URL and deprecate it.)
What do you think @sandeepsukhani ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about /ingester/shutdown/?unregister=true
?
Keep in mind as well that documentation can clarify the purpose and mechanism for an API endpoint, if need be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it weird to have query parameters in URLs that only allow POST requests
I agree it looks ugly but it helps us keep it tidy. If we keep adding different paths for various use cases, it would start getting messier.
I think delete_ring_tokens
or reset_ring_tokens
or something similar would be a bit self-descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the endpoint only accepts POSTs, why not include these parameters in a request body? I do like the idea of a single endpoint with parameters.
// ingester service as "failed", so Loki will shut down entirely. | ||
// The module manager logs the failure `modules.ErrStopProcess` in a special way. | ||
if i.lifecycler.ClearTokensOnShutdown() && errs.Err() == nil { | ||
return modules.ErrStopProcess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we just rely on nil
error? We can return internal server error from ShutdownAndForgetHandler
handler when error is non-nil or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because we need to be able to distinguish between stopping of the ingester service that happened through the signal handler, and stopping of the service that was triggered by the shutdown handler. In case of the former, the error will be nil
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both cases, we are returning w.WriteHeader(http.StatusNoContent)
so I don't see any special handling in ShutdownAndForgetHandler
for either of the cases. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see what you mean, but the returned error modules.ErrStopProcess
is not only used by the shutdown handler, but also by the service manager in the Run()
function, which determines whether to stop Loki when one of its services failed.
Line 385 in 3b3fcf6
level.Info(util_log.Logger).Log("msg", "received stop signal via return error", "module", m, "error", service.FailureCase()) |
18c75e2
to
f2339b0
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.4%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0.1% |
@sandeepsukhani @dannykopping I opted for |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.4%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
b836d0d
to
d4ff4d2
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.4%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
I'll squash commits once approved. |
No need, we squash on merge |
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.4%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a quick review, but looks good!
`/ingester/shutdown` is similar to the [`/ingester/flush_shutdown`](#post-ingesterflush_shutdown) | ||
endpoint, but accepts three URL query parameters `flush`, `delete_ring_tokens`, and `terminate`. | ||
|
||
**URL query parameters:** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is a POST, curious why we opt for url params intead of a request body?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Query params are I guess easier to write when you execute the request using curl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think we do not follow any strict, restful HTTP API design guides in our endpoint design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work!
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.4%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
42b9b81
to
3cea8d7
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.4%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0.1% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: Christian Haudum <[email protected]>
This handler replaces the deprecated /ingester/flush_shutdown handler and can be used to gracefully shut down a Loki instance and delete the file that persists the tokens of the ingester ring. In production environments you usually want to persist ring tokens so that during a restart of an ingester instance, or during rollout, the tokens from that instance are not re-distributed to other instances, but instead kept so that the same streams end up on the same instance once it is up and running again. For that, the tokens are written to a file that can be specified via the `-ingester.tokens-file-path` argument. In certain cases, however, you want to forget the tokens and re-distribute them when shutting down an ingester instance. This was already possible by calling `/ingester/flush_shutdown`, deleting the tokens file and terminating the process. The new handler `/ingester/shutdown` combines these manual steps into a single handler. Signed-off-by: Christian Haudum <[email protected]>
Signed-off-by: Christian Haudum <[email protected]>
3cea8d7
to
095653c
Compare
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. - ingester -0.4%
+ distributor 0.3%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0.1% |
What this PR does / why we need it:
This handler can be used to gracefully shut down a Loki instance and
delete the file that persists the tokens of the ingester ring.
In production environments you usually want to persist ring tokens so
that during a restart of an ingester instance, or during rollout, the
tokens from that instance are not re-distributed to other instances, but
instead kept so that the same streams end up on the same instance once
it is up and running again. For that, the tokens are written to a file
that can be specified via the
-ingester.tokens-file-path
argument.In certain cases, however, you want to forget the tokens and
re-distribute them when shutting down an ingester instance. This was
already possible by calling
/ingester/flush_shutdown
, deleting thetokens file and terminating the instance. The new handler
/ingester/shutdown_and_forget
combines these manual steps into asingle handler.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
The first commit with the vendor changes can be removed once the PR against grafana/dskit has been merged.
Checklist
CHANGELOG.md
.docs/sources/upgrading/_index.md