Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat store: added readiness and livenes prober #1460

Merged
merged 1 commit into from
Sep 23, 2019

Conversation

FUSAKLA
Copy link
Member

@FUSAKLA FUSAKLA commented Aug 26, 2019

Added the prober to yet another component. Store this time.

The store does the bucket initialization and sync synchronously, so the default http server will start up only once it is finished. Then it gets healthy(live) and once the gRPC server starts up the store gets also ready.

Personally I'd much rather see the liveness probe to answer right from the start during the bucket init since this way we leave the bucket sync duration to some initial back-off of an orchestrator.

  • CHANGELOG entry if change is relevant to the end user.

Changes

  • added prober to the thanos store

Verification

Started it and verified it gets ready end healthy as expected.

@FUSAKLA FUSAKLA force-pushed the fus-store-prober branch 2 times, most recently from 0027f18 to e44d882 Compare August 26, 2019 21:05
@FUSAKLA FUSAKLA force-pushed the fus-store-prober branch 2 times, most recently from 8457aec to f15ba68 Compare August 29, 2019 19:37
Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @FUSAKLA, thanks a lot for this work, we desperately need it. Let's try to push it to finish line.
I have left a small comment, could you please check it?

And maybe you can also rebase it.

cmd/thanos/store.go Show resolved Hide resolved
@kakkoyun
Copy link
Member

Hey @FUSAKLA, I realized that we miss liveness and healthiness checks for various components when I was working on https://github.com/metalmatze/kube-thanos/pull/42 and then I have seen your work #656, #1395 and #1297.

We need probes in place for all other components as well. I'd like to push this effort further as fast as possible. I'm happy to help. I could create PRs for components which you haven't touch yet if it is also ok for you? I'd be happy to hear your opinion on those PRs as well.

What do you think?

@FUSAKLA
Copy link
Member Author

FUSAKLA commented Sep 17, 2019

Hi @kakkoyun ! Yeah I didn't want to make more PRs at once since even this way it takes it's time to get these reviewed. Every one of them raises big discussion on when to report as ready and when as healthy. Thats also the reason it got eventually splitted up but help is always welcomed so feel free to add those PRS! I'd love to see this done finally :D
Thanks for the initiative!

@kakkoyun
Copy link
Member

kakkoyun commented Sep 17, 2019

@FUSAKLA Alright, great! Let's do this then. I've already sent the first one and pinged you on #1534, PTAL.

It'd be great if we can push this one too.

CHANGELOG.md Outdated Show resolved Hide resolved
cmd/thanos/store.go Outdated Show resolved Hide resolved
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor nits only (:

cmd/thanos/store.go Show resolved Hide resolved
cmd/thanos/store.go Outdated Show resolved Hide resolved
@FUSAKLA
Copy link
Member Author

FUSAKLA commented Sep 18, 2019

Thanks @bwplotka ! Should be resolved and rebased on master.

if err != nil {
return errors.Wrap(err, "create bucket client")
runutil.CloseWithLogOnErr(logger, bkt, "bucket client")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove if err !=nil, as it is run in defer, err will be overwritten by non related code which happens after this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it was meant to close the client if the runStore ended up with an error.
Otherwise it should not close the client since others are still using it.
Maybe that's why the if err != nil is there and the fact err will be overwritten by the following code was the intention?

But I did not touch this code actually, just removed the unneeded code block wrapping it.

cancel()
ctx, cancel := context.WithCancel(context.Background())
g.Add(func() error {
defer runutil.CloseWithLogOnErr(logger, bkt, "bucket client")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit confusing, so we were closing bkt earlier, why are we closing here again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented here #1460 (comment)

cmd/thanos/store.go Outdated Show resolved Hide resolved
@FUSAKLA
Copy link
Member Author

FUSAKLA commented Sep 22, 2019

@kakkoyun PTAL I removed the metricHTTPListenGroup since it is not used anymore and unified the changelog

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @FUSAKLA. LGTM. I believe, we've covered all components with this PR.

@brancz brancz merged commit 7a2f54e into thanos-io:master Sep 23, 2019
ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
ivan-kiselev pushed a commit to ivan-kiselev/thanos that referenced this pull request Sep 26, 2019
brancz pushed a commit that referenced this pull request Sep 26, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <[email protected]>

* some formatting

Signed-off-by: Ivan Kiselev <[email protected]>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* resolve comments

Signed-off-by: Ivan Kiselev <[email protected]>

* resolve last comment

Signed-off-by: Ivan Kiselev <[email protected]>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update README

Signed-off-by: Kemal Akkoyun <[email protected]>

* Remove default

Signed-off-by: Kemal Akkoyun <[email protected]>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <[email protected]>

* Remove default

Signed-off-by: Kemal Akkoyun <[email protected]>

* Set ready

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <[email protected]>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <[email protected]>

* Add full-stop

Signed-off-by: Jamie Poole <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <[email protected]>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* remove duplicate

Signed-off-by: Ivan Kiselev <[email protected]>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
Signed-off-by: Martin Chodur <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
GiedriusS pushed a commit that referenced this pull request Oct 28, 2019
* Some updates to compact docs

Signed-off-by: Ivan Kiselev <[email protected]>

* some formatting

Signed-off-by: Ivan Kiselev <[email protected]>

* Update docs/components/compact.md

accept PR suggestions

Co-Authored-By: Bartlomiej Plotka <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add metalmatze to list of maintainers (#1547)

Signed-off-by: Matthias Loibl <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* resolve comments

Signed-off-by: Ivan Kiselev <[email protected]>

* resolve last comment

Signed-off-by: Ivan Kiselev <[email protected]>

* receive: Add liveness and readiness probe (#1537)

* Add prober to receive

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add changelog entries

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update README

Signed-off-by: Kemal Akkoyun <[email protected]>

* Remove default

Signed-off-by: Kemal Akkoyun <[email protected]>

* Wait hashring to be ready

Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* downsample: Add liveness and readiness probe (#1540)

* Add readiness and liveness probes for downsampler

Signed-off-by: Kemal Akkoyun <[email protected]>

* Add changelog entry

Signed-off-by: Kemal Akkoyun <[email protected]>

* Remove default

Signed-off-by: Kemal Akkoyun <[email protected]>

* Set ready

Signed-off-by: Kemal Akkoyun <[email protected]>

* Update CHANGELOG

Signed-off-by: Kemal Akkoyun <[email protected]>

* Clean CHANGELOG

Signed-off-by: Kemal Akkoyun <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Document the dnssrvnoa option (#1551)

Signed-off-by: Antonio Santos <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* feat store: added readiness and livenes prober (#1460)

Signed-off-by: Martin Chodur <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add Hotstar to adopters. (#1553)

It's the largest streaming service in India that does cricket and GoT
for India. They have insane scale and are using Thanos to scale their
Prometheus.

Spoke to them offline about adding the logo and will get a signoff here
too.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Fix hotstar logo in the adoptor's list (#1558)

Signed-off-by: Karthik Vijayaraju <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Fix typos, including 'fomrat' -> 'format' in tracing.config-file help text. (#1552)

Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Compactor: Fix for #844 - Ignore object if it is the current directory (#1544)

* Ignore object if it is the current directory

Signed-off-by: Jamie Poole <[email protected]>

* Add full-stop

Signed-off-by: Jamie Poole <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Adding doc explaining the importance of groups for compactor (#1555)

Signed-off-by: Leo Meira Vital <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Add blank line for list (#1566)

The format of these files is wrong in the web.

Signed-off-by: dongwenjuan <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* Refactor compactor constants, fix bucket column (#1561)

* compact: unify different time constants

Use downsample.* constants where possible. Move the downsampling time
ranges into constants and use them as well.

Signed-off-by: Giedrius Statkevičius <[email protected]>

* bucket: refactor column calculation into compact

Fix the column's name and name it UNTIL-DOWN because that is what it
actually shows - time until the next downsampling.

Move out the calculation into a separate function into the compact
package. Ideally we could use the retention policies in this calculation
as well but the `bucket` subcommand knows nothing about them :-(

Signed-off-by: Giedrius Statkevičius <[email protected]>

* compact: fix issues with naming

Reorder the constants and fix mistakes.

Signed-off-by: Giedrius Statkevičius <[email protected]>
Signed-off-by: Ivan Kiselev <[email protected]>

* remove duplicate

Signed-off-by: Ivan Kiselev <[email protected]>
Signed-off-by: Giedrius Statkevičius <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants