-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: add prometheus/grafana monitoring #83004
Conversation
31cbc6a
to
fef4dd6
Compare
fef4dd6
to
0c5cc7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty slick! I have one medium-sized suggestion and a few minor ones.
Medium: We ought to avoid de-duplicating prometheus.go
as we've done so here. Looking at diff --unified pkg/cmd/roachtest/prometheus/prometheus.go pkg/roachprod/prometheus/prometheus.go
, the differences look very minor, so I'd recommend making whatever changes are needed here to work for both use cases (roachtests, roachprod). It would make it (a) easier to review and (b) reduce likelihood of introducing minor differences in the two versions, reducing maintenance overhead.
Minor suggestions are in the comments below. When reviewing this PR and trying it out, I found it helpful to type out some changes myself (reflecting the comments below so perhaps this is helpful to you too): https://gist.github.com/irfansharif/c4d6c2e5873fcbeb40cdcba535bfe123.
@msbutler meta question, are you planning on pushing this over the finish line? There is a bit of dependency on this PR now, with #81516 and #80724, so this shouldn't linger for too long. As per our comments above, roachtest should be able to do what #81516 does even once roachprod learns its new tricks, so I would be inclined to merge that first (or rather, split out the prometheus-roachtest part into a PR and merge that) and then refactor as necessary to let roachprod do its thing (i.e. rebase this PR on top and take it from there). Is that okay for you? |
@tbg happy to push this over the finish line, hopefully by the end of the week (barring a crazy L2 rotation this week). Just to clarify the order of operations:
|
0c5cc7e
to
a334918
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the feedback! Before rebasing on #83148 and refactoring the roachtests, I've tried to address the roachprod internals and UX feedback. Could y'all take a look at my diff and give the following commands a spin before I rebase? prom-start
,grafana-url
, and prom-stop
7f8b7fc
to
4e4ae1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're pretty close. I imagine we're planning on rebasing on top of #83148 now that it merged? Looking forward to seeing prometheus.go deduplicated from this PR in this current form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tpcc is now hooked into the the new roachprod package! Currently, no preconfigured dashboard is passed to tpcc.
Also, note that I simplified prometheus config: a scrapdeNode now represents a single node. Take a look at how that changed all the With...()
helper functions.
8426c74
to
1d3c8cb
Compare
Going to leave this review to Irfan as one cook in the kitchen is enough. Looking forward to trying this out myself when it has landed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Michael, I look forward to not having to look at admin UI metrics ever again.
hope you don't mind my random chime but this is niccceee :D |
a484ba7
to
a2867e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, do squash the commits and maybe add a bit of text to the commit message + PR body itself for future readers.
Previously, only roachtests could spin up prom/grafana servers that lasted the lifetime of the roachtest. This PR introduces new roachprod cmds that allow a roachprod user to easily spin up/down their own prom/grafana instances. The PR also hooks up roachtests that rely on prom/grafana into this new infrastructure. Release note: none
a2867e2
to
d1d3c42
Compare
bors r=irfansharif |
Build succeeded: |
Previously, only roachtests could spin up prom/grafana servers that lasted the
lifetime of the roachtest. This PR introduces new roachprod cmds that allow
a roachprod user to easily spin up/down their own prom/grafana instances. The PR
also hooks up roachtests that rely on prom/grafana into this new infrastructure.
Release note: none