Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added thanos check rules command #1097

Merged
merged 6 commits into from
Jun 14, 2019

Conversation

FUSAKLA
Copy link
Member

@FUSAKLA FUSAKLA commented Apr 30, 2019

Resolves #1096
This adds thanos check rules command to check Prometheus rules.

We cannot use the promtool check rules since Thanos-rule allows the additional field partial_response_strategy

Changes

  • added the check rules cmd
  • changelog entry TBD
  • docs update?

Verification

Tested it on different types of invalid rule files even valid. Works fine.

Copy link
Member

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff 🥇 . Some nits around coding style. Also tests shouldn't be hard to do :)

Also please add this to CHANGELOG

cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 1, 2019

Just one general thought. Should we run this check in the thanos-rule also?
Would it be more safe to validate the rules before startup to provide better output and and avoid crashing on invalid data? Possibly as a new PR.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome generally, some comments.

cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 1, 2019

@povilasv Added tests and resolved all your comments PTAL if that's ok with you.

@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 2, 2019

So I managed to update even the docs and the genflags script.
Please check those also.

Also the output is now using logger so it looks like this:

$ ./thanos check rules cmd/thanos/testdata/rules-files/*.yaml ; echo "$?"
level=info ts=2019-05-02T05:54:37.584375498Z caller=check.go:62 msg=Checking filename=cmd/thanos/testdata/rules-files/invalid-rules-data.yaml
level=error ts=2019-05-02T05:54:37.585238171Z caller=check.go:45 result=FAILED
level=error ts=2019-05-02T05:54:37.585292302Z caller=check.go:47 error="Groupname should not be empty"
level=info ts=2019-05-02T05:54:37.585335002Z caller=check.go:50
level=info ts=2019-05-02T05:54:37.585365002Z caller=check.go:62 msg=Checking filename=cmd/thanos/testdata/rules-files/invalid-yaml-format.yaml
level=error ts=2019-05-02T05:54:37.585543024Z caller=check.go:45 result=FAILED
level=error ts=2019-05-02T05:54:37.585591893Z caller=check.go:47 error="yaml: line 4: could not find expected ':'"
level=info ts=2019-05-02T05:54:37.585625624Z caller=check.go:50
level=info ts=2019-05-02T05:54:37.585653444Z caller=check.go:62 msg=Checking filename=cmd/thanos/testdata/rules-files/valid.yaml
level=info ts=2019-05-02T05:54:37.586356768Z caller=check.go:53 result=SUCCESS rulesfound=2
check rules command failed: 2 errors: Groupname should not be empty; yaml: line 4: could not find expected ':'
1

cmd/thanos/check.go Outdated Show resolved Hide resolved
docs/components/check.md Outdated Show resolved Hide resolved
docs/components/check.md Outdated Show resolved Hide resolved
docs/components/check.md Outdated Show resolved Hide resolved
docs/components/check.md Outdated Show resolved Hide resolved
@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 2, 2019

Thanks @povilasv ! all applied.

cmd/thanos/check.go Outdated Show resolved Hide resolved
@povilasv
Copy link
Member

povilasv commented May 2, 2019

@FUSAKLA you just forced pushed which removed the fixes :P could reapply again?

@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 2, 2019

Oh my.. sorry it's been a long day. Thanks for noticing!

facepalm

Copy link
Member

@povilasv povilasv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bwplotka do you want take another look?

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides those few minor nits. Thank you a lot! 👍

cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
cmd/thanos/check.go Outdated Show resolved Hide resolved
@FUSAKLA FUSAKLA force-pushed the fus-rules-lint branch 2 times, most recently from e072530 to f25f441 Compare May 3, 2019 08:55
@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 3, 2019

Thanks @GiedriusS, should be fixed and rebased on master now

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you rebase on master again? The CHANGELOG.md file has changed and we cannot merge this 😛

@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 9, 2019

Thanks! Should be up to date now.

cmd/thanos/check.go Outdated Show resolved Hide resolved
@FUSAKLA FUSAKLA force-pushed the fus-rules-lint branch 2 times, most recently from 864a74c to 9505f39 Compare May 11, 2019 07:17
@GiedriusS
Copy link
Member

Could you rebase it one more time? Then I'll give it one last review and I think we can merge this. Thank you for this work and the persistence, it's not easy

@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 26, 2019

Sorry for the inactivity, I'm back.

@GiedriusS rebased so PTAL if you find a time 🙏, thanks!

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found another small issue:

./thanos check rules --log.format=json ./cmd/thanos/testdata/rules-files/invalid-unknown-field.yaml
{"caller":"check.go:69","filename":"./cmd/thanos/testdata/rules-files/invalid-unknown-field.yaml","level":"info","msg":"checking","ts":"2019-05-27T12:45:15.957753106Z"}
{"caller":"check.go:42","level":"error","result":"FAILED","ts":"2019-05-27T12:45:15.958307088Z"}
{"caller":"check.go:44","error":"yaml: unmarshal errors:\n  line 4: field interrrrrrrrval not found in type main.ThanosRuleGroup","level":"error","ts":"2019-05-27T12:45:15.958366609Z"}
{"caller":"check.go:47","level":"info","ts":"2019-05-27T12:45:15.958406713Z"}
check rules command failed: yaml: unmarshal errors:
  line 4: field interrrrrrrrval not found in type main.ThanosRuleGroup

It seems like the JSON format gets broken a bit when errors happen. Perhaps we should remove or rework cmd/thanos/check.go:47? 🤔

@FUSAKLA
Copy link
Member Author

FUSAKLA commented May 27, 2019

@GiedriusS not sure if this is a bug. It looks like the cmds return error and if the whole process returns any it's printed out by the and this is not using the logger.

For example if you run:

$ ./thanos rule --log.format=json
{"caller":"flags.go:87","level":"info","msg":"gossip is disabled","ts":"2019-05-27T21:12:00.055497911Z"}
rule command failed: Gossip is disabled and no --query parameter was given.

Yep, it's here I believe?
https://github.com/improbable-eng/thanos/blob/34ad98eda29fc4977462e3e4beaa023186fecdf3/cmd/thanos/main.go#L167

I can print it out using the logger if you want but that would affect all other thanos components but it works as expected so it up to you.

$ ./thanos rule --log.format=json
{"caller":"flags.go:87","level":"info","msg":"gossip is disabled","ts":"2019-05-27T21:21:19.432487894Z"}
{"caller":"main.go:168","err":"rule command failed: Gossip is disabled and no --query parameter was given.","level":"error","ts":"2019-05-27T21:21:19.432716717Z"}

@GiedriusS
Copy link
Member

GiedriusS commented May 28, 2019

@GiedriusS not sure if this is a bug. It looks like the cmds return error and if the whole process returns any it's printed out by the and this is not using the logger.

For example if you run:

$ ./thanos rule --log.format=json
{"caller":"flags.go:87","level":"info","msg":"gossip is disabled","ts":"2019-05-27T21:12:00.055497911Z"}
rule command failed: Gossip is disabled and no --query parameter was given.

Yep, it's here I believe?

https://github.com/improbable-eng/thanos/blob/34ad98eda29fc4977462e3e4beaa023186fecdf3/cmd/thanos/main.go#L167

I can print it out using the logger if you want but that would affect all other thanos components but it works as expected so it up to you.

$ ./thanos rule --log.format=json
{"caller":"flags.go:87","level":"info","msg":"gossip is disabled","ts":"2019-05-27T21:21:19.432487894Z"}
{"caller":"main.go:168","err":"rule command failed: Gossip is disabled and no --query parameter was given.","level":"error","ts":"2019-05-27T21:21:19.432716717Z"}

Sorry, I haven't looked into this completely when commenting originally (didn't see that you haven't introduced this yourself) but yes, I think that should be removed and the proper logger should be used. I mean the user asks for a JSON formatted log so it should be printed like that, no? 😄 Could you add that change?

@FUSAKLA
Copy link
Member Author

FUSAKLA commented Jun 3, 2019

@GiedriusS sorry for the delay. I changed the log and rebased on the recent master if that is ok with you now? I agree this way it's more consistent :)

@FUSAKLA
Copy link
Member Author

FUSAKLA commented Jun 8, 2019

I rebased it once again, fixed issues with MultiError from TSDB which moved meantime to different package and updated the docs cmd flags which changed also. CI is green 🎉

Copy link
Member

@GiedriusS GiedriusS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! 😄Will let someone else look through this again before merging.

@GiedriusS GiedriusS merged commit 9e3ee18 into thanos-io:master Jun 14, 2019
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Awesome. Thanks 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rule: Add linting command for prometheus rules
4 participants