-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[targetallocator] PrometheusOperator CRD MVC #653
[targetallocator] PrometheusOperator CRD MVC #653
Conversation
😻 |
Splitting this here of as else the PR would get to big. |
@pavolloffay, @VineethReddy02, is one of you interested in reviewing this? |
Any chance we get this reviewed by a maintainer? I would like to propose that we use the operator/collector, but we heavely lean on PrometheusCRDs to configure our scrape targets. |
I took a quick pass through the PR. Looks good to me in the first pass. Can we add e2e tests for the same? |
No, the Collectors would have to be redeployed when CRs would be changed. That would be definitely the cleanest solution. I agree, I'm going to create an issue if this PR is accepted. |
@pavolloffay, @VineethReddy02 you've made a couple of comments to this PR, do you think this is ready to be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -120,11 +120,21 @@ type OpenTelemetryTargetAllocator struct { | |||
// +optional | |||
Enabled bool `json:"enabled,omitempty"` | |||
|
|||
// PrometheusCR defines the configuration for the retrieval of PrometheusOperator CRDs ( servicemonitor.monitoring.coreos.com/v1 and podmonitor.monitoring.coreos.com/v1 ) retrieval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question as before, from which namespaces are these objects queried?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a line clearing this up.
cmd/otel-allocator/README.md
Outdated
Watches the Prometheus service discovery for new targets and sets targets to the Allocator | ||
|
||
### Allocator | ||
Shards the received targets based on the discovered Collector instances |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the discovered
Is this correct? Strictly speaking the TA does not discover collectors. The collectors should be known from the collector CR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does watch the associated stateful set to observe scaling events and reallocate targets as necessary when the size of the set of collectors changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the TargetAllocator watches the API for pods with the expected labels.
Because of this I have described it as discovered
, would you prefer watch
?
cmd/otel-allocator/README.md
Outdated
### Allocator | ||
Shards the received targets based on the discovered Collector instances | ||
|
||
### Collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this OTEL collector? Maybe we should find a different name if it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The headlines are referring to the packages. The collector
packages watches for new collector instances. I have rearranged it a bit to clear this up.
|
||
"github.com/go-kit/log" | ||
"github.com/go-logr/logr" | ||
allocatorconfig "github.com/otel-allocator/config" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this import from this repository? Then it should be moved to a 3rd group. Why doesn't it have the full path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is referencing the local module.
Why this module name has been chosen, I can not say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this import path is incorrect and should be changed. Probably best to change it in a separate PR, though.
|
||
# Design | ||
|
||
If the Allocator is activated, all Prometheus configurations will be transferred in a separate ConfigMap which get in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all Prometheus configurations
Is it OTEL p8s receiver configuration or as well p8s CRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies to the configuration of the prometheus
receiver in the collector configuration. The Prometheus operator CRs are observed separately after the target allocator has been started, at which point it further modifies the service discovery configuration it is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only referencing the OTEL p8s configuration.
cmd/otel-allocator/README.md
Outdated
turn mounted to the Allocator. | ||
This configuration will be resolved to target configurations and then split across all OpenTelemetryCollector instances. | ||
|
||
TargetAllocators exposes the results as [HTTP_SD endpoints](https://prometheus.io/docs/prometheus/latest/http_sd/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the OTEL colletor get the final config (resolved targets)?
Does the OTEL operator configure the collector to use the http_sd to get the targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the OTEL operator rewrites the collector prometheus
receiver config to use http_sd
for all configured jobs, pointing at the target allocator instance for that collector.
There is still a gap, that @secustor has proposed to close with open-telemetry/opentelemetry-collector-contrib#8055. This addresses the fact that the Prometheus operator CRs can cause new jobs to be exposed at the target allocator but the collector configuration is static and does not currently have a mechanism for discovering those jobs. The target allocator exposes a job list resource that should suffice to discover new/ended jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. The operator configures the collector to use http_sd endpoints which are provided by the TargetAllocator.
Currently the Operator defines the p8s jobs which the collector should query, therefore no p8s operator CR defined jobs are added to the collector. Later on the the collector should query the jobs directly from the TargetAllocator.
See following issues:
Sorry for super long delay in reviewing this. I had to take some time to lear about target allocator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pavolloffay it appears your comments have been addressed, can you confirm whether there are still outstanding items to address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the approach here, one non-blocking question. I see you're adding the changes to the end-to-end tests, would it be possible to add a test involving a prometheus CRD to gut check that the expected configuration gets generated?
PromCRWatcherConf PrometheusCRWatcherConfig | ||
} | ||
|
||
func Load(file string) (Config, error) { | ||
var cfg Config | ||
if err := unmarshal(&cfg, file); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: It seems previously the allocator would load a default config file if no file was provided, it seems that was because we would always pass in an empty string in main.go
. Should that functionality remain in case there any consumers of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to users of the TargetAllocator or Devs which import this package?
In the first case the behavior has not changed. It is still loading the same default file as before. This is implement using pflags now. https://github.com/open-telemetry/opentelemetry-operator/pull/653/files#diff-d292758014333553bcd82dfdf75743ef82b0b32f6414e2dc345777fc342e4480R77
I don't think it is possible right now to test API responses with KUTTL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving this PR, however I am still not familiar with the target-allocator and it is not area where I focus.
I am fine merging this if there will be people maintaining it.
# Conflicts: # cmd/otel-allocator/Dockerfile # cmd/otel-allocator/go.sum
@pavolloffay Can you merge this? All checks are successful. |
Merging based on the support discussion in https://cloud-native.slack.com/archives/C033BJ8BASU/p1650372898413199. @secustor @Aneurysm9 (AWS) are willing to support the target-allocator. |
* feat(target-allocator): allow custom config file path * feat(target-allocator): move CLI config options to config package * feat(target-allocator): allow running outside of cluster for debugging * introduce meta watcher * add event source * fix: log panics * fix: race condition * fixup! fix: log panics * feat: implement promCR retrieval * feat: functioning * refactor: some cleanup * feat(target-allocator): escape job names in query parameters * feat(target-allocator): make prometheusCR lookup optional and allow using kubernetes_sd outside of cluster * refactor(target-allocator): improve memory usage and comments * chore(target-allocator): update PromOperator and Kubernetes deps * refactor(target-allocator): use exclude instead of replace directive * ci: add Makefile targets for target allocator * tests: add kuttl tests for PrometheusCR feature of target allocator * docs(targetAllocator): add README.md * fixup CRD docs * fix(Makefile): add missing PHONY tags * implement change requests * go mod tidy and fix linting
Context: #383
Currently implemented:
ServiceMonitor
andPodmonitor
TBD:
Needs discussion:
cc @Aneurysm9