-
Notifications
You must be signed in to change notification settings - Fork 459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Opinionated Operator CRDs #1477
Comments
cc @puckpuck |
I like the approach that the Collector's Helm chart has taken with presets. When combined with a sensible default, with just a few lines I can configure a collector to do multiple common scenarios.
Using presets, I could have the above spec, which delivers a collector configured as a Daemonset with a default sensible configuration, including functionality to collector logs from all pods on the node, kubelet stats metrics, and enriches all telemetry with Kubernetes attributes. |
I see three distinct problems here:
I'd expect that 2 and 3 would eventually need their own separate issues, although they're all related to some extent. For now I'm going to focus on 1. Use casesI'm aware of three major common use cases for users wanting to collect telemetry data in K8s: Container logsThis is pretty obvious and intuitive from a user perspective - they want to ingest logs from all of the containers running in their cluster, with the right metadata. Right now, actually doing this requires an understanding of how the kubelet and the container runtime manage logs, the respective log formats, mounting Node filesystem directories as volumes into a DaemonSet with the right permissions, and a somewhat gnarly filelog receiver configuration. Prometheus metricsUnlike the logs case, this is quite complicated and has additional dependencies depending on what the user wants. The metrics we could collect here may include:
Intuitively, I think it should be easy to collect the "infrastructure" metrics, in the same way that the hostmetrics receiver makes this easy on a single host. When it comes to replacing prometheus-operator, I think the choice for users with existing CRDs is straightforward, but I'm not 100% sure if we should also recommend this for others. Right now, doing any of this requires installing additional dependencies, adding (fairly complex) Prometheus config directly to the receiver, enabling target allocator and giving it the right RBAC permissions, and manually translating from Prometheus metric schema to otel schema. Traces from instrumentationI think this is reasonably well-covered right now. We might benefit from better documentation, and maybe some way to directly link We could later consider a higher-level CRD for a tail sampling setup, where you need to make sure all spans for a given trace are delivered to the same collector. SolutionsYour proposed new CRDs are roughly the way to go @frzifus, though it does feel a bit backwards that the highest-level resource is really just a receiver + mode, and the rest of the pipeline sits in a subresource. Intuitively it'd make more sense to me if the receiver + mode is the sub resource, kind of like: ---
apiVersion: opentelemetry.io/v1alpha1
kind: Source
metadata:
name: source1
spec:
type: metrics
replicas: 2
destination:
- dst1
prometheusCR:
serviceMonitorSelector:
abc: def
static_configs: ...
---
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: metrics-collector
spec:
source: source1
processors:
batch:
...
resourcedetection/test:
detector:
...
exporters:
... I don't like overloading the Collector resource even more, so maybe this should be a new CR, but composing things this way - replacing the receiver with a receiver + mode pair, and keeping everything roughly the same - makes most sense to me as an approach. |
I've been thinking about how to solve this usability problem as well and I'm curious about leveraging the ongoing work to add a Template converter to the Collector core to help solve part of this problem: open-telemetry/opentelemetry-collector#8507 The Operator could provide OOTB templates for container logs and the common metrics that need to be collected. That said, there's more to this issue than just simplifying the collector configuration, since we also need to automate the So the total solution is probably some combination of templated collector configuration and operator-level logic for selecting the right deployment mode of a collector. This is somewhat similar to the Helm chart's presets option mentioned above, but I think an advantage of using templates is that the user could override the template or chain templates together. For example, in combination with #1906, a user could define a log processing template for containers that match specific metadata, and the collectors deployed by the operator for log collection could be updated by the operator to route logs from these containers through a pipeline that applies this processing. |
Based on feedback we received at FOSDEM, the barrier initially use the Open Telemetry operator seems to be quite high.
Therefore, I would like to discuss the current status of the v1alpha1 API here. The OpenTelemetryCollectorSpec spec provides a lot of configuration options, but it is from my point of view quite easy to misconfigure it.
I would be interested in a few things:
An example of what i think is quite "challenging".
In order to use the target-allocator, an image must be specified that contains the prometheus receiver (this is not present in the default image).[Thats wrong, see here] There are 4 modes available, but only one supports the desired function. The collector configuration is specified as a string and we can not ensure that the configuration works with the selected image. Finally we build something like this.Depending on the use of the collector, I thought it would be useful to provide certain easy-to-configure crds for certain use cases such as metrics collection.
Here is an example of what this could look like. (I am still a little unsatisfied that the problem here is partly moved to the destination resource).
Update
This topic has been discussed during the 2023/02/16, 9 AM PT/6 PM CET SIG-Call. [Notes]
tldr
Agreed on
collector config is hard
.We need better docs with examples.
Outline common usecases and start with an opinionated helm chart. [example]
Config as a string isnt nice - values can not be modified with
kustomize
.The text was updated successfully, but these errors were encountered: