-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm chart for Kubernetes metrics quickstart #562
Comments
I've seen O(tens) of requests for this on the OpenTelemetry slack channels. Having it in the community would be great, as we could promote its adoption more widely. |
I am certainly interested in this if users are interested in this. A couple questions:
|
Thanks for your questions :)
|
I really like this idea, but I have a question - is there a plan to move away from kube-state-metrics, node-exporter etc in favour of otel collector native receivers (k8sclusterreceiver and hostmetrics) ? I think in general we should strive to collect all the prometheus metrics from k8s components, but not use any of the Prometheus ecosystem components and use Collector's native features :) |
@jaronoff97 I'm also curious if your chart handles the installation of the operator and the OpentelemetryCollector object like discussed here: #69 |
I have been using this chart for 3 weeks, it is working out of the box but it will need to be improved (of course). It brings almost the same functionalities as "Prometheus Operator with kube-prometheus-stack chart". It is much lightweight as you only deploy "agents" to scrape your logs/metrics/traces. I am using it to send metrics to AWS AMP (managed prometheus). Here are the main issue I encountered so far :
Thanks for the good work. |
updates/context setting: @TylerHelmuth I still want to donate this if that's still okay. I've validated with a few other people that this would be a great thing for the community to have. The only blocker for this work is to figure out if we can install the operator in the same chart which would make for a better experience. My team is going to be investigating this. |
@jaronoff97 sounds good. @open-telemetry/helm-approvers please add your thoughts. |
I approve. Thanks @jaronoff97 |
I don't think I agree that we need another chart for this. I'd rather go with adding the TA option to the collector chart. Also, why do we promote using Prometheus for scraping kubernetes/kubelet metrics instead of using specialized collector receivers that collect metrics complaint with OTel semantic conventions without additional transformations? |
I think this would provide a bridge for existing kps users that otherwise would not care to switch (afaik Prometheus is still used in ~ 99.x% of Kubernetes deployments for cluster monitoring). |
I also see value in a "transition" chart. Long term (like long long term), I think a need for a chart like this diminishes, but for users today who have extensive Prometheus setups but want to try out OTel or start transitioning to OTel I think this chart fits their needs. |
Ok, I'm not blocking it. If most @open-telemetry/helm-approvers think it's a good addition, let's add it |
The name should somehow reflect the Prometheus bridge/transition in its name. |
Could also be cool to include somewhere how to grab the same telemetry using the collector and its components. |
I'm not sure how this transitioning chart would work? Should we assume that user installed kube-prometheus-stack and we try to somehow migrate it from that to this chart? I was thinking having |
We should probably assume that the majority of admins scrape their k8s api endpoints with Prometheus via prometheus-operator objects like |
I would also see this as a 'transition' chart, but the migration path to me is something like...
In the (admittedly, kinda far?) future, I can see the operator using native OpenTelemetry components and monitoring CRDs to perform the same basic functions as this stack, but in the short-to-medium term, having this in the org will give us a pat answer for "how should I monitor k8s with OpenTelemetry?" |
Hi, quick bump on this issue - one pretty common piece of feedback we got at KubeCon EU was the amount of people who didn't know the operator existed. I believe getting this chart brought in would help a lot with that, as we could then signpost this from the docs as a "how to get started with kubernetes". |
@dmitryax is there anything else we're waiting on before accepting PRs adding this chart? |
@TylerHelmuth I think this issue is still a blocker. I'm going to run some tests right now to track this down and solve it. |
Okay after a little mish-moshing of things... i was able to get a chart that installs cert-manager (a requirement of the operator), the operator, and a collector to install together in a single chart. The problem is that it doesn't all install at once for a few reasons. Option where we install cert-manager with the chartTL;DR there are some race conditions and annoyances hereFirst installationIn order for the first installation to work for the chart, you need to set the operator's admission webhook to false. This is because helm installs resources in a particular order (here) and if you attempt to install cert-manager and the operator simultaneously with the webhook enabled you get the following error:
This is fine, because we can just initially disable the webhook on otel-operator installation so the otel-operator can come up healthy after the CRDs for cert-manager are installed. Second installationNow we have to re-enable the webhook, applying that again will get you another fun group of errors.
These are due to pods not being ready in order for the webhooks to be called. Third installationAfter waiting maybe ten seconds, instead of being impatient like me... you are able to successfully install the chart in its entirety
Option where we assume cert-manager is pre-installedGiven most clusters will already have cert-manager installed, here's what the installation process would look like... A bit smoother, but still the same webhook race condition at the endFirst installation
Trying again after a few seconds...
Proposed remediations
The operator and collector installed together successfully! An end user using this chart could just as easily enable the mutating webhook post-install as well, but that's not an ideal experience IMO. I would love to hear thoughts on this, and see if there's anything I missed in my findings here. cc @open-telemetry/helm-maintainers |
For the cert manager my preference would be to copy whatever pattern kube-prometheus-stack is using. If we can't install the cert manager as part of the chart install that will at least follow our existing pattern for the operator, although there is an issue opened about that friction: #550
When I investigated this a while ago this is the solution I stumbled upon and I believe it is the solution that kube-prometheus-stack uses. |
Looking as to what the kube-prometheus-stack does right now. |
It looks like it's configurable (obv) It's default behavior is empty and enabled, which means the policy is going to be set to They also recommend pre-installing cert-manager on a cluster to use these webhooks. |
Seeing as the chart is trying to follow the same pattern for value I think it makes sense to follow the same technical patterns as well. |
Agreed. I can work on it this week and next week to match those expectations. I'll include some docs about these decisions as well. |
Yes, Indeed |
Is this something someone is still working on? Given how complex the whole ecosystem was to grasp for me starting out, what would makes the most sense from my perspective is have some way to add presets into the Opentelemetry Operator. IMO if someone wants to plug in Otel to their cluster most likely they'll want to have the ability to get:
It would be ideal if the default setup of the operator easily allowed you to get a setup like the one Honeycomb suggests in their getting started |
@ferrucc-io yes I'm still working on this, I've had a whole slew of other priorities that keep taking precedence. |
Hello all! All of the PRs required to get the core functionality for the chart have been merged.
My team and I will be testing this chart thoroughly (it's already been tested a lot!) and adding lots of documentation in the coming weeks/months. If you have any issues with the chart, please open a new issue in the repo tagging it with chart:kube-stack . Thank you! |
Many prometheus and kubernetes users are familiar with the kube-prometheus-stack chart which aims to quickly set up and manage a prometheus and grafana installation for a user that collects mostly all of the Kubernetes metrics available. It achieves this using the Prometheus operator and
ServiceMonitor
andPodMonitor
custom resources that configure a user's Prometheus scrape config. We have the ability to do the same using the OpenTelemetry Operator and the Target Allocator. In order to provide an easy and familiar migration path to existing (or new) Prometheus and Kubernetes users, I created the kube-otel-stack chart which installs a pre-configured collector and target allocator to dynamicallyServiceMonitor
andPodMonitor
custom resources to scrape various Kubernetes metrics. You can see below some of the metrics this collector is scraping.This has since become a requested feature across the otel slack from what i can tell, as I've DM'ed this chart to at least 3 different people at this point. I was wondering if it would be welcome for me to clean up and make more generic this slightly opinionated helm chart and donate it to the repository.
Other options considered
TODO
The text was updated successfully, but these errors were encountered: