Skip to content
This repository has been archived by the owner on Feb 9, 2022. It is now read-only.

Add prometheus and friends #27

Merged
merged 4 commits into from
Apr 20, 2018
Merged

Conversation

anguslees
Copy link
Contributor

@anguslees anguslees commented Apr 17, 2018

  • prometheus
  • alertmanager
  • node-exporter
  • kube-state-metrics
  • heapster

Also includes various improvements to the oauth2-proxy setup (in particular, switches to same-domain oauth2 redirects)

Fixes #24

@anguslees anguslees added the enhancement New feature or request label Apr 17, 2018
@anguslees anguslees self-assigned this Apr 17, 2018
@arapulido arapulido requested a review from jjo April 17, 2018 11:01
@arapulido
Copy link
Contributor

@jjo Could you review this, please?

Copy link
Contributor

@jjo jjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM in general, some nits.

local kube = import "kube.libsonnet";

local arch = "amd64";
local version = "v1.4.3";
Copy link
Contributor

@jjo jjo Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not latest v1.5.2 ?
If there's some explicit reason (AKS compat / alike?), I'd suggest adding a comment on why.

Copy link
Contributor Author

@anguslees anguslees Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no known reason (I copied a lot of this from some existing jsonnet files). Will update.
Done.

@@ -191,7 +177,7 @@ local kube = import "kube.libsonnet";
terminationGracePeriodSeconds: 60,
containers_+: {
default: kube.Container("nginx") {
image: "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.9.0",
image: "quay.io/kubernetes-ingress-controller/nginx-ingress-controller:0.12.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest: 0.13.0 since 2days ago :), which has this relevant addition re: kcm:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

spec+: {
containers_+: {
proxy: kube.Container("oauth2-proxy") {
image: "a5huynh/oauth2_proxy:2.2.1",
Copy link
Contributor

@jjo jjo Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how security critical is this image, we're internally using one multi-stage built from sources, fine to keep it as-is while testing, but let's work on adding it to our kube-prod-containers for release, I'll submit a that PR.

Fwiw also that 2.2.1 version is misleading, as sources (only) have v2.2, with the extra .1 added by the guy who built it afaicf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on all points. The latest oauth2-proxy release is quite a while ago too, so most images I could find also haven't been updated in ages (ie: the base OS portions of the image also haven't been updated since the oauth2-proxy release). I went with this image over alternatives solely because it was the same image used by the helm chart.

"cookie-refresh": "3h",
"set-xauthrequest": true,
"tls-cert": "",
upstream: "file:///dev/null",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expose upstream at obj root, to ease its overriding, see e.g. sample run output, showing a kinda nonsense /null urlpath mapping

$ docker run -e OAUTH2_PROXY_CLIENT_ID=x -e OAUTH2_PROXY_CLIENT_SECRET=x -e OAUTH2_PROXY_COOKIE_SECRET=x -it a5huynh/oauth2_proxy:2.1 --email-domain="" --upstream="file://dev/null"
2018/04/18 15:38:39 oauthproxy.go:143: mapping path "/null" => file system "/null"
2018/04/18 15:38:39 oauthproxy.go:157: OAuthProxy configured for Google Client ID: x
2018/04/18 15:38:39 oauthproxy.go:167: Cookie settings: name:_oauth2_proxy secure(https):true httponly:true expiry:168h0m0s domain:<default> refresh:disabled
2018/04/18 15:38:39 http.go:49: HTTP: listening on 127.0.0.1:4180

Copy link
Contributor Author

@anguslees anguslees Apr 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I found this odd - there doesn't seem to be a way to "properly" turn off upstream afaics. The chart (eg) just uses --upstream=file:///dev/null as I have, without worry about the fact that it actually creates a "/null" path on the oauth2 server.

Note that as I have it setup here, I'm using it as an auth handler for nginx - oauth2-proxy is exposed to the internet, but upstream HTTP requests/replies do not actually get proxied through oauth2-proxy. As a practical result, this means there's only one instance of oauth2-proxy running, regardless of how many actual k8s services use that oauth2 configuration.
.. consequently I don't think of "upstream" as being an exposed/configurable parameter here .. (any more so than the other options like --tls-cert)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's revisit when we start actually plumbing services behind it.

},
{
target_label: "__address__",
replacement: "blackbox-exporter.example.com:9115",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it as blackbox:9115 with a TODO() note to later add a
blackbox deploy in the same namespace.

The other, likely more correct approach in general, would be to
have a well-known field e.g. kubeprod_enable: true that we
could set to false then massage to nullify from the container
object (scrape_configs in this case ) + std.prune(), so that
we can more easily stage new entries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I changed to "blackbox-exporter:9115", and left a TODO in the toplevel prometheus.jsonnet.

},
{
target_label: "__address__",
replacement: "blackbox-exporter.example.com:9115",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// they're no-ops).
// In particular annotations={} is apparently a "change",
// since the comparison is ignorant of defaults.
std.prune($.PersistentVolumeClaim($.hyphenate(kv[0])) + {apiVersion:: null, kind:: null} + kv[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@jjo jjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks !
fwiw jenkins is tossing on this PR, didn't dig into why tho.

@jjo
Copy link
Contributor

jjo commented Apr 19, 2018

BTW, a general note about ab-using kube-system namespace for resources that are obviously nope:

  • imo it's wrong, maybe use kube-prod-system or alike ? could also go for per-type: ingress, logging, monitoring, which would be my personal preference in general, but given the kinda "all-in-one(-shot)" kubeprod approach, a single namespace would be better.
  • it does have an operational impact: for example kops upgrade checks for kube-system to be ready/settled (essentially all pods in Running) before replacing each node, in Bitnami is already a PITA for us having elasticsearch and kibana there, as these biggies can crawl down upgrades and/or make them fail on timeout (forcing manual re-trying).

@anguslees
Copy link
Contributor Author

fwiw jenkins is tossing on this PR, didn't dig into why tho.

I'm dealing with some OOM issues on the jenkins slaves, made worse (for me) by the default limitrange. I think I've worked out why my earlier attempts to set explicit resource limits weren't working..

BTW, a general note about ab-using kube-system namespace for resources that are obviously nope

Yes. My thinking was that a few of the things we need to modify have to be in kube-system (for example fixing up the kube-dashboard for installs that don't do that correctly), so it would be confusing to pretend we were all contained somewhere else. My intention was to always use custom serviceAccounts so essentially every "app" is isolated uniquely, but (as we know) there's a number of places where isolation becomes effectively namespace-level - and I completely agree that it's messy including big components like elasticsearch in kube-system. Very open to changing to a dedicated namespace (or group of namespaces) in the near future.

I didn't know kops checked for kube-system. It's a pity it doesn't just do a drain and rely on PodDisruptionBudget since there's nothing special about kube-system that couldn't also apply to other business critical namespaces equally as strongly...

Once upon a time, jsonnet `{foo+: {a: 'b}}` required `super.foo` to
exist.  That hasn't been the case for several versions now.

These empty `annotations: {}` upset StatefulSet's buggy change
detection, which we can mostly avoid by just removing them.
- prometheus
- alertmanager
- node-exporter
- kube-state-metrics
- heapster
@anguslees anguslees merged commit 4d209e0 into vmware-archive:master Apr 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants