-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add servicemonitor for operands #1275
add servicemonitor for operands #1275
Conversation
be0a96e
to
7b05db0
Compare
@kevinearls, what do you think? In general, I would say yes to e2e tests, but more e2e tests means more waiting times for CI results. |
@jpkrohling @sschne I'd say e2e tests are always a good thing. If CI starts taking too much time we can decide which need to be run on a per PR basis and run others nightly or per release or on some other schedule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far, and you even made me realize that we are using a deprecated flag all over our code base (admin-http-port
). Thanks!
There are a few minor comments, but the direction looks good.
0ab7c91
to
b2dc17c
Compare
Thanks for your review. I have added all your requested changes and also added a e2e-test suite for the servicemonitors. |
Codecov Report
@@ Coverage Diff @@
## master #1275 +/- ##
==========================================
+ Coverage 87.59% 87.90% +0.30%
==========================================
Files 94 98 +4
Lines 5956 6232 +276
==========================================
+ Hits 5217 5478 +261
- Misses 562 571 +9
- Partials 177 183 +6
Continue to review full report at Codecov.
|
return jaeger, nil | ||
} | ||
|
||
if jaeger.Spec.ServiceMonitor.Enabled != nil && *jaeger.Spec.ServiceMonitor.Enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need a safe-guard here. When the service monitor isn't available but the CR sets it as "enabled=true", we see this:
WARN[0060] failed to reconcile servicemonitor error="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" instance=simple-prod namespace=default
We do detect already whether the cluster has support for this API:
$ make run
...
INFO[0001] Install prometheus-operator in your cluster to create ServiceMonitor objects error="no ServiceMonitor registered with the API"
This is probably done during the bootstrap, so, we could just add to viper whether we should attempt to create service monitors if they are enabled.
Once we have that, it might make sense to set the default value of the "Enabled" flag to true
in case the Prometheus Operator is available, and to false
otherwise. This way, the flag is only needed in exceptional cases.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'm not sure if we should enable the ServiceMonitor by default, since it is changing default behavior, but i would leave this decision up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinearls, @rubenvp8510, @objectiser, what do you think? One one hand, having two defaults might be confusing. On the other hand, it kinda makes sense to create a service monitor by default if the API for it exists. I think we have a similar behavior with the OAuth Proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling if by two defaults you mean "If the prometheus operator is installed we create a service monitor by default, otherwise we don't" I think that would be fine, especially if we already have this behavior elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling I think maybe for 1.x we should keep the same behaviour, but possibly revisit the defaults for 2.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry misunderstood - yes I think it is ok to create if API exists, as long as we document that as the behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sschne, looks like we have a decision. Do you know what's needed to implement this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling i see there has been done similar things with Elasticsearch Operator. So i detect whether the API is available via the autodetect package and add it to viper.
I think we can also remove the serviceMonitor.Enabled attribute from the jaeger spec again, as we then can enable/disable this on a per operator, rather than on a jaeger basis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still keep the CR flag in place: people should be able to individually turn it off on an individual basis, perhaps because the instance is a dev/staging one that doesn't need to be monitored. Like with ES, we'd then have the following options:
- CLI to enable/disable (force) the auto-provision of the
ServiceMonitor
- CR field to enable/disable the provisioning of the
ServiceMonitor
- when the operator-wide config is set to "disable" but a CR is seen with "enable", we need to log a warn to the logs with this fact (as we are not honoring what we were requested to do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CLI flag is now in place with an default auto-detection of the prometheus-operator CRDs.
If the CR field is set to "enable" but the CRDs are not available or the global flag is explicitly disabled, it will issue a warning. This is handled in the jaeger-controller.
The admin services however, which are now dedicated Service objects, will always be created if the CR field is set to "enable", despite the status of the CLI flag.
pkg/service/query.go
Outdated
// NewQueryAdminService returns a new Kubernetes service for Jaeger Query with admin port enabled | ||
func NewQueryAdminService(jaeger *v1.Jaeger, selector map[string]string) *corev1.Service { | ||
service := NewQueryService(jaeger, selector) | ||
service.Spec.Ports = append(service.Spec.Ports, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not having caught this before, but I think we might need an extra service. I'm not sure it's a good idea to add one extra port to the services here, as people might be exposing these services currently and expect only one (or a specific set) of ports to be open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same applies to the other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collector, ingester and query component now create an additional Service with the admin port enabled
@sschne are you still interested in this one? |
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
- import formatting - function names - error handling - tests Signed-off-by: Simon Schneider <[email protected]>
…add example Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
bedb97b
to
b974e92
Compare
Hi @jpkrohling , sorry for not looking into this for quite a while. I will try to get this PR back to life and resolve your comments |
Let me know if you need anything! |
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
…erands Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
refactor service creation Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
@jpkrohling i think this is now ready for review again |
First pass looks good I'll do another detailed review of this tomorrow morning |
hi @sschne, i know its quite a while. But now there are quite a few conflicting files. do you want to update your branch again? 🐎 |
lets reopen this pr when the work continues. |
Closes #1156
This PR adds servicemonitor used by prometheus-operator for the jaeger operands.
It can currently scrape metrics from the all-in-one jaeger installation, as well as the production and streaming installation, except the agent, as it is deployed as sidecar with the query deployment and does not bring its own Kubernetes Service.
Should I also add e2e-tests for this feature?