Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: add prometheus/grafana monitoring #83004

Merged
merged 1 commit into from
Jun 29, 2022

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Jun 16, 2022

Previously, only roachtests could spin up prom/grafana servers that lasted the
lifetime of the roachtest. This PR introduces new roachprod cmds that allow
a roachprod user to easily spin up/down their own prom/grafana instances. The PR
also hooks up roachtests that rely on prom/grafana into this new infrastructure.

Release note: none

@msbutler msbutler requested review from tbg and irfansharif June 16, 2022 18:14
@msbutler msbutler self-assigned this Jun 16, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler force-pushed the butler-roachprod-grafana branch 2 times, most recently from 31cbc6a to fef4dd6 Compare June 16, 2022 20:15
@irfansharif irfansharif changed the title roachprod: add promethius/grafana monitoring roachprod: add prometheus/grafana monitoring Jun 16, 2022
@msbutler msbutler force-pushed the butler-roachprod-grafana branch from fef4dd6 to 0c5cc7e Compare June 16, 2022 21:45
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty slick! I have one medium-sized suggestion and a few minor ones.

Medium: We ought to avoid de-duplicating prometheus.go as we've done so here. Looking at diff --unified pkg/cmd/roachtest/prometheus/prometheus.go pkg/roachprod/prometheus/prometheus.go, the differences look very minor, so I'd recommend making whatever changes are needed here to work for both use cases (roachtests, roachprod). It would make it (a) easier to review and (b) reduce likelihood of introducing minor differences in the two versions, reducing maintenance overhead.

Minor suggestions are in the comments below. When reviewing this PR and trying it out, I found it helpful to type out some changes myself (reflecting the comments below so perhaps this is helpful to you too): https://gist.github.com/irfansharif/c4d6c2e5873fcbeb40cdcba535bfe123.

pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/roachprod/install/cluster_synced.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/roachprod/roachprod.go Outdated Show resolved Hide resolved
@tbg
Copy link
Member

tbg commented Jun 21, 2022

@msbutler meta question, are you planning on pushing this over the finish line? There is a bit of dependency on this PR now, with #81516 and #80724, so this shouldn't linger for too long. As per our comments above, roachtest should be able to do what #81516 does even once roachprod learns its new tricks, so I would be inclined to merge that first (or rather, split out the prometheus-roachtest part into a PR and merge that) and then refactor as necessary to let roachprod do its thing (i.e. rebase this PR on top and take it from there). Is that okay for you?

@msbutler
Copy link
Collaborator Author

msbutler commented Jun 21, 2022

@tbg happy to push this over the finish line, hopefully by the end of the week (barring a crazy L2 rotation this week). Just to clarify the order of operations:

  1. Toby will open a PR that moves roachtest/prometheus to roachprod/prometheus, incorporating his changes in roachtest: add admission/follower-overload #81516
  2. I will rebase on that PR and address all feedback above, including teaching all roachtests that use roachtest/prometheus semantics to use my new roachprod/prometheus semantics.
  3. Merge this PR

@msbutler msbutler force-pushed the butler-roachprod-grafana branch from 0c5cc7e to a334918 Compare June 22, 2022 20:15
@blathers-crl blathers-crl bot requested a review from irfansharif June 22, 2022 20:15
Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback! Before rebasing on #83148 and refactoring the roachtests, I've tried to address the roachprod internals and UX feedback. Could y'all take a look at my diff and give the following commands a spin before I rebase? prom-start,grafana-url, and prom-stop

pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/install/cluster_synced.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-roachprod-grafana branch 2 times, most recently from 7f8b7fc to 4e4ae1b Compare June 22, 2022 21:46
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're pretty close. I imagine we're planning on rebasing on top of #83148 now that it merged? Looking forward to seeing prometheus.go deduplicated from this PR in this current form.

pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/roachprod/roachprod.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/roachprod.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
@blathers-crl blathers-crl bot requested a review from irfansharif June 22, 2022 22:48
Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tpcc is now hooked into the the new roachprod package! Currently, no preconfigured dashboard is passed to tpcc.

Also, note that I simplified prometheus config: a scrapdeNode now represents a single node. Take a look at how that changed all the With...() helper functions.

pkg/roachprod/prometheus/prometheus_test.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
pkg/roachprod/prometheus/prometheus.go Outdated Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-roachprod-grafana branch 2 times, most recently from 8426c74 to 1d3c8cb Compare June 24, 2022 18:20
@msbutler msbutler marked this pull request as ready for review June 24, 2022 18:21
@msbutler msbutler requested review from a team as code owners June 24, 2022 18:21
@tbg
Copy link
Member

tbg commented Jun 27, 2022

Going to leave this review to Irfan as one cook in the kitchen is enough. Looking forward to trying this out myself when it has landed!

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Michael, I look forward to not having to look at admin UI metrics ever again.

pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
pkg/cmd/roachprod/main.go Outdated Show resolved Hide resolved
@otan
Copy link
Contributor

otan commented Jun 28, 2022

hope you don't mind my random chime but this is niccceee :D

@msbutler msbutler force-pushed the butler-roachprod-grafana branch 2 times, most recently from a484ba7 to a2867e2 Compare June 28, 2022 18:28
Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, do squash the commits and maybe add a bit of text to the commit message + PR body itself for future readers.

pkg/roachprod/prometheus/prometheus.go Show resolved Hide resolved
Previously, only roachtests could spin up prom/grafana servers that lasted the
lifetime of the roachtest. This PR introduces new roachprod cmds that allow
a roachprod user to easily spin up/down their own prom/grafana instances. The PR
also hooks up roachtests that rely on prom/grafana into this new infrastructure.

Release note: none
@msbutler msbutler force-pushed the butler-roachprod-grafana branch from a2867e2 to d1d3c42 Compare June 28, 2022 19:44
@msbutler
Copy link
Collaborator Author

bors r=irfansharif

@craig craig bot merged commit 0917fdc into cockroachdb:master Jun 29, 2022
@craig
Copy link
Contributor

craig bot commented Jun 29, 2022

Build succeeded:

@msbutler msbutler deleted the butler-roachprod-grafana branch June 29, 2022 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants