-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
receive: Add liveness and readiness probe #1537
Conversation
37839f0
to
4d3d4ef
Compare
cc @FUSAKLA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks!
cmd/thanos/receive.go
Outdated
@@ -278,6 +284,7 @@ func runReceive( | |||
s := newStoreGRPCServer(logger, reg, tracer, tsdbStore, opts) | |||
|
|||
level.Info(logger).Log("msg", "listening for StoreAPI gRPC", "address", grpcBindAddr) | |||
statusProber.SetReady() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the receiver should probably not be ready until the TSDB is ready? Also not sure about the hashring and the receive interface is also not guarantied to be up at this point.
Maybe this will require some more complex condition for the ready state 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FUSAKLA For the TSDB, it's ready at this stage if you check line 270
. It runs after TSDB
is open.
For receive interface, I thought if something goes south it'll change liveness state so, readiness won't be needed.
I guess I need to double-check the hashring readiness.
I'll have another look at it.
105037a
to
43c5a8a
Compare
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Kemal Akkoyun <[email protected]>
43c5a8a
to
c4279bb
Compare
@@ -277,6 +290,8 @@ func runReceive( | |||
} | |||
s := newStoreGRPCServer(logger, reg, tracer, tsdbStore, opts) | |||
|
|||
// Wait hashring to be ready before start serving metrics | |||
<-hashringReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we waiting for the hashring to be ready before serving metrics from the store? These things are entirely independent IMO
* Add prober to receive Signed-off-by: Kemal Akkoyun <[email protected]> * Add changelog entries Signed-off-by: Kemal Akkoyun <[email protected]> * Update README Signed-off-by: Kemal Akkoyun <[email protected]> * Remove default Signed-off-by: Kemal Akkoyun <[email protected]> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <[email protected]>
* Add prober to receive Signed-off-by: Kemal Akkoyun <[email protected]> * Add changelog entries Signed-off-by: Kemal Akkoyun <[email protected]> * Update README Signed-off-by: Kemal Akkoyun <[email protected]> * Remove default Signed-off-by: Kemal Akkoyun <[email protected]> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]>
* Some updates to compact docs Signed-off-by: Ivan Kiselev <[email protected]> * some formatting Signed-off-by: Ivan Kiselev <[email protected]> * Update docs/components/compact.md accept PR suggestions Co-Authored-By: Bartlomiej Plotka <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Add metalmatze to list of maintainers (#1547) Signed-off-by: Matthias Loibl <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * resolve comments Signed-off-by: Ivan Kiselev <[email protected]> * resolve last comment Signed-off-by: Ivan Kiselev <[email protected]> * receive: Add liveness and readiness probe (#1537) * Add prober to receive Signed-off-by: Kemal Akkoyun <[email protected]> * Add changelog entries Signed-off-by: Kemal Akkoyun <[email protected]> * Update README Signed-off-by: Kemal Akkoyun <[email protected]> * Remove default Signed-off-by: Kemal Akkoyun <[email protected]> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * downsample: Add liveness and readiness probe (#1540) * Add readiness and liveness probes for downsampler Signed-off-by: Kemal Akkoyun <[email protected]> * Add changelog entry Signed-off-by: Kemal Akkoyun <[email protected]> * Remove default Signed-off-by: Kemal Akkoyun <[email protected]> * Set ready Signed-off-by: Kemal Akkoyun <[email protected]> * Update CHANGELOG Signed-off-by: Kemal Akkoyun <[email protected]> * Clean CHANGELOG Signed-off-by: Kemal Akkoyun <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Document the dnssrvnoa option (#1551) Signed-off-by: Antonio Santos <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * feat store: added readiness and livenes prober (#1460) Signed-off-by: Martin Chodur <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Add Hotstar to adopters. (#1553) It's the largest streaming service in India that does cricket and GoT for India. They have insane scale and are using Thanos to scale their Prometheus. Spoke to them offline about adding the logo and will get a signoff here too. Signed-off-by: Goutham Veeramachaneni <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Fix hotstar logo in the adoptor's list (#1558) Signed-off-by: Karthik Vijayaraju <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552) Signed-off-by: Callum Styan <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Compactor: Fix for #844 - Ignore object if it is the current directory (#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <[email protected]> * Add full-stop Signed-off-by: Jamie Poole <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Adding doc explaining the importance of groups for compactor (#1555) Signed-off-by: Leo Meira Vital <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Add blank line for list (#1566) The format of these files is wrong in the web. Signed-off-by: dongwenjuan <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Refactor compactor constants, fix bucket column (#1561) * compact: unify different time constants Use downsample.* constants where possible. Move the downsampling time ranges into constants and use them as well. Signed-off-by: Giedrius Statkevičius <[email protected]> * bucket: refactor column calculation into compact Fix the column's name and name it UNTIL-DOWN because that is what it actually shows - time until the next downsampling. Move out the calculation into a separate function into the compact package. Ideally we could use the retention policies in this calculation as well but the `bucket` subcommand knows nothing about them :-( Signed-off-by: Giedrius Statkevičius <[email protected]> * compact: fix issues with naming Reorder the constants and fix mistakes. Signed-off-by: Giedrius Statkevičius <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * remove duplicate Signed-off-by: Ivan Kiselev <[email protected]>
* Add prober to receive Signed-off-by: Kemal Akkoyun <[email protected]> * Add changelog entries Signed-off-by: Kemal Akkoyun <[email protected]> * Update README Signed-off-by: Kemal Akkoyun <[email protected]> * Remove default Signed-off-by: Kemal Akkoyun <[email protected]> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <[email protected]> Signed-off-by: Giedrius Statkevičius <[email protected]>
* Some updates to compact docs Signed-off-by: Ivan Kiselev <[email protected]> * some formatting Signed-off-by: Ivan Kiselev <[email protected]> * Update docs/components/compact.md accept PR suggestions Co-Authored-By: Bartlomiej Plotka <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Add metalmatze to list of maintainers (#1547) Signed-off-by: Matthias Loibl <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * resolve comments Signed-off-by: Ivan Kiselev <[email protected]> * resolve last comment Signed-off-by: Ivan Kiselev <[email protected]> * receive: Add liveness and readiness probe (#1537) * Add prober to receive Signed-off-by: Kemal Akkoyun <[email protected]> * Add changelog entries Signed-off-by: Kemal Akkoyun <[email protected]> * Update README Signed-off-by: Kemal Akkoyun <[email protected]> * Remove default Signed-off-by: Kemal Akkoyun <[email protected]> * Wait hashring to be ready Signed-off-by: Kemal Akkoyun <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * downsample: Add liveness and readiness probe (#1540) * Add readiness and liveness probes for downsampler Signed-off-by: Kemal Akkoyun <[email protected]> * Add changelog entry Signed-off-by: Kemal Akkoyun <[email protected]> * Remove default Signed-off-by: Kemal Akkoyun <[email protected]> * Set ready Signed-off-by: Kemal Akkoyun <[email protected]> * Update CHANGELOG Signed-off-by: Kemal Akkoyun <[email protected]> * Clean CHANGELOG Signed-off-by: Kemal Akkoyun <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Document the dnssrvnoa option (#1551) Signed-off-by: Antonio Santos <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * feat store: added readiness and livenes prober (#1460) Signed-off-by: Martin Chodur <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Add Hotstar to adopters. (#1553) It's the largest streaming service in India that does cricket and GoT for India. They have insane scale and are using Thanos to scale their Prometheus. Spoke to them offline about adding the logo and will get a signoff here too. Signed-off-by: Goutham Veeramachaneni <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Fix hotstar logo in the adoptor's list (#1558) Signed-off-by: Karthik Vijayaraju <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552) Signed-off-by: Callum Styan <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Compactor: Fix for #844 - Ignore object if it is the current directory (#1544) * Ignore object if it is the current directory Signed-off-by: Jamie Poole <[email protected]> * Add full-stop Signed-off-by: Jamie Poole <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Adding doc explaining the importance of groups for compactor (#1555) Signed-off-by: Leo Meira Vital <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Add blank line for list (#1566) The format of these files is wrong in the web. Signed-off-by: dongwenjuan <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * Refactor compactor constants, fix bucket column (#1561) * compact: unify different time constants Use downsample.* constants where possible. Move the downsampling time ranges into constants and use them as well. Signed-off-by: Giedrius Statkevičius <[email protected]> * bucket: refactor column calculation into compact Fix the column's name and name it UNTIL-DOWN because that is what it actually shows - time until the next downsampling. Move out the calculation into a separate function into the compact package. Ideally we could use the retention policies in this calculation as well but the `bucket` subcommand knows nothing about them :-( Signed-off-by: Giedrius Statkevičius <[email protected]> * compact: fix issues with naming Reorder the constants and fix mistakes. Signed-off-by: Giedrius Statkevičius <[email protected]> Signed-off-by: Ivan Kiselev <[email protected]> * remove duplicate Signed-off-by: Ivan Kiselev <[email protected]> Signed-off-by: Giedrius Statkevičius <[email protected]>
This PR,
/-/healthy
endpoint for liveness checks./-/ready
endpoint for readiness checks.Changes
/-/healthy
endpoint for liveness checks./-/ready
endpoint for readiness checks.prober.Prober
for readiness and liveness endpoints.Verification
make test
Started
thanos receive
and made a request to related endpoints.