Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add servicemonitor for operands #1275

Conversation

sschne
Copy link
Contributor

@sschne sschne commented Oct 27, 2020

Closes #1156
This PR adds servicemonitor used by prometheus-operator for the jaeger operands.

It can currently scrape metrics from the all-in-one jaeger installation, as well as the production and streaming installation, except the agent, as it is deployed as sidecar with the query deployment and does not bring its own Kubernetes Service.

Should I also add e2e-tests for this feature?

@mergify mergify bot requested review from jpkrohling and objectiser October 27, 2020 08:24
@sschne sschne force-pushed the feature/1156-create-service-monitor-for-operands branch from be0a96e to 7b05db0 Compare October 27, 2020 08:26
@jpkrohling
Copy link
Contributor

Should I also add e2e-tests for this feature?

@kevinearls, what do you think? In general, I would say yes to e2e tests, but more e2e tests means more waiting times for CI results.

@kevinearls
Copy link
Contributor

@jpkrohling @sschne I'd say e2e tests are always a good thing. If CI starts taking too much time we can decide which need to be run on a per PR basis and run others nightly or per release or on some other schedule.

Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far, and you even made me realize that we are using a deprecated flag all over our code base (admin-http-port). Thanks!

There are a few minor comments, but the direction looks good.

pkg/controller/jaeger/jaeger_controller.go Outdated Show resolved Hide resolved
pkg/controller/jaeger/servicemonitor.go Outdated Show resolved Hide resolved
pkg/controller/jaeger/servicemonitor_test.go Outdated Show resolved Hide resolved
pkg/controller/jaeger/servicemonitor_test.go Show resolved Hide resolved
pkg/controller/jaeger/servicemonitor_test.go Outdated Show resolved Hide resolved
pkg/deployment/query.go Outdated Show resolved Hide resolved
pkg/deployment/collector.go Outdated Show resolved Hide resolved
pkg/deployment/ingester.go Outdated Show resolved Hide resolved
pkg/service/collector.go Outdated Show resolved Hide resolved
pkg/service/query.go Outdated Show resolved Hide resolved
@sschne sschne force-pushed the feature/1156-create-service-monitor-for-operands branch from 0ab7c91 to b2dc17c Compare November 2, 2020 22:48
@sschne
Copy link
Contributor Author

sschne commented Nov 2, 2020

Looks great so far, and you even made me realize that we are using a deprecated flag all over our code base (admin-http-port). Thanks!

There are a few minor comments, but the direction looks good.

Thanks for your review. I have added all your requested changes and also added a e2e-test suite for the servicemonitors.

@codecov
Copy link

codecov bot commented Nov 2, 2020

Codecov Report

Merging #1275 (4fe7e28) into master (93eb3c4) will increase coverage by 0.30%.
The diff coverage is 94.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1275      +/-   ##
==========================================
+ Coverage   87.59%   87.90%   +0.30%     
==========================================
  Files          94       98       +4     
  Lines        5956     6232     +276     
==========================================
+ Hits         5217     5478     +261     
- Misses        562      571       +9     
- Partials      177      183       +6     
Impacted Files Coverage Δ
pkg/apis/jaegertracing/v1/jaeger_types.go 87.50% <ø> (ø)
pkg/controller/jaeger/jaeger_controller.go 38.00% <20.00%> (-0.75%) ⬇️
pkg/controller/jaeger/servicemonitor.go 77.77% <77.77%> (ø)
pkg/autodetect/main.go 88.46% <100.00%> (+1.50%) ⬆️
pkg/deployment/all_in_one.go 100.00% <100.00%> (ø)
pkg/deployment/collector.go 100.00% <100.00%> (ø)
pkg/deployment/ingester.go 100.00% <100.00%> (ø)
pkg/deployment/query.go 100.00% <100.00%> (ø)
pkg/inventory/servicemonitor.go 100.00% <100.00%> (ø)
pkg/service/collector.go 100.00% <100.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93eb3c4...4fe7e28. Read the comment docs.

Makefile Outdated Show resolved Hide resolved
pkg/controller/jaeger/servicemonitor_test.go Show resolved Hide resolved
return jaeger, nil
}

if jaeger.Spec.ServiceMonitor.Enabled != nil && *jaeger.Spec.ServiceMonitor.Enabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need a safe-guard here. When the service monitor isn't available but the CR sets it as "enabled=true", we see this:

WARN[0060] failed to reconcile servicemonitor            error="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" instance=simple-prod namespace=default

We do detect already whether the cluster has support for this API:

$ make run
...
INFO[0001] Install prometheus-operator in your cluster to create ServiceMonitor objects  error="no ServiceMonitor registered with the API"

This is probably done during the bootstrap, so, we could just add to viper whether we should attempt to create service monitors if they are enabled.

Once we have that, it might make sense to set the default value of the "Enabled" flag to true in case the Prometheus Operator is available, and to false otherwise. This way, the flag is only needed in exceptional cases.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'm not sure if we should enable the ServiceMonitor by default, since it is changing default behavior, but i would leave this decision up to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinearls, @rubenvp8510, @objectiser, what do you think? One one hand, having two defaults might be confusing. On the other hand, it kinda makes sense to create a service monitor by default if the API for it exists. I think we have a similar behavior with the OAuth Proxy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling if by two defaults you mean "If the prometheus operator is installed we create a service monitor by default, otherwise we don't" I think that would be fine, especially if we already have this behavior elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling I think maybe for 1.x we should keep the same behaviour, but possibly revisit the defaults for 2.x?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry misunderstood - yes I think it is ok to create if API exists, as long as we document that as the behaviour

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sschne, looks like we have a decision. Do you know what's needed to implement this?

Copy link
Contributor Author

@sschne sschne Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling i see there has been done similar things with Elasticsearch Operator. So i detect whether the API is available via the autodetect package and add it to viper.
I think we can also remove the serviceMonitor.Enabled attribute from the jaeger spec again, as we then can enable/disable this on a per operator, rather than on a jaeger basis.

Copy link
Contributor

@jpkrohling jpkrohling Nov 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still keep the CR flag in place: people should be able to individually turn it off on an individual basis, perhaps because the instance is a dev/staging one that doesn't need to be monitored. Like with ES, we'd then have the following options:

  1. CLI to enable/disable (force) the auto-provision of the ServiceMonitor
  2. CR field to enable/disable the provisioning of the ServiceMonitor
  3. when the operator-wide config is set to "disable" but a CR is seen with "enable", we need to log a warn to the logs with this fact (as we are not honoring what we were requested to do)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI flag is now in place with an default auto-detection of the prometheus-operator CRDs.

If the CR field is set to "enable" but the CRDs are not available or the global flag is explicitly disabled, it will issue a warning. This is handled in the jaeger-controller.

The admin services however, which are now dedicated Service objects, will always be created if the CR field is set to "enable", despite the status of the CLI flag.

// NewQueryAdminService returns a new Kubernetes service for Jaeger Query with admin port enabled
func NewQueryAdminService(jaeger *v1.Jaeger, selector map[string]string) *corev1.Service {
service := NewQueryService(jaeger, selector)
service.Spec.Ports = append(service.Spec.Ports,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not having caught this before, but I think we might need an extra service. I'm not sure it's a good idea to add one extra port to the services here, as people might be exposing these services currently and expect only one (or a specific set) of ports to be open.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies to the other components.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collector, ingester and query component now create an additional Service with the admin port enabled

@jpkrohling
Copy link
Contributor

@sschne are you still interested in this one?

sschne added 6 commits June 14, 2021 20:17
Signed-off-by: Simon Schneider <[email protected]>
- import formatting
- function names
- error handling
- tests

Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
@sschne sschne force-pushed the feature/1156-create-service-monitor-for-operands branch from bedb97b to b974e92 Compare June 14, 2021 18:26
@sschne
Copy link
Contributor Author

sschne commented Jun 14, 2021

Hi @jpkrohling , sorry for not looking into this for quite a while. I will try to get this PR back to life and resolve your comments

@jpkrohling
Copy link
Contributor

Let me know if you need anything!

Signed-off-by: Simon Schneider <[email protected]>
refactor service creation

Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
Signed-off-by: Simon Schneider <[email protected]>
@sschne
Copy link
Contributor Author

sschne commented Oct 9, 2021

@jpkrohling i think this is now ready for review again

@rubenvp8510
Copy link
Collaborator

First pass looks good I'll do another detailed review of this tomorrow morning

@frzifus
Copy link
Member

frzifus commented Jun 28, 2022

hi @sschne, i know its quite a while. But now there are quite a few conflicting files. do you want to update your branch again? 🐎

@frzifus
Copy link
Member

frzifus commented Jul 12, 2022

lets reopen this pr when the work continues.

@frzifus frzifus closed this Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create ServiceMonitor for operands
6 participants