add servicemonitor for operands #1275

sschne · 2020-10-27T08:23:09Z

Closes #1156
This PR adds servicemonitor used by prometheus-operator for the jaeger operands.

It can currently scrape metrics from the all-in-one jaeger installation, as well as the production and streaming installation, except the agent, as it is deployed as sidecar with the query deployment and does not bring its own Kubernetes Service.

Should I also add e2e-tests for this feature?

jpkrohling · 2020-10-27T08:40:01Z

Should I also add e2e-tests for this feature?

@kevinearls, what do you think? In general, I would say yes to e2e tests, but more e2e tests means more waiting times for CI results.

kevinearls · 2020-10-27T08:49:35Z

@jpkrohling @sschne I'd say e2e tests are always a good thing. If CI starts taking too much time we can decide which need to be run on a per PR basis and run others nightly or per release or on some other schedule.

jpkrohling

Looks great so far, and you even made me realize that we are using a deprecated flag all over our code base (admin-http-port). Thanks!

There are a few minor comments, but the direction looks good.

pkg/controller/jaeger/jaeger_controller.go

pkg/controller/jaeger/servicemonitor.go

pkg/controller/jaeger/servicemonitor_test.go

pkg/deployment/query.go

pkg/deployment/collector.go

pkg/deployment/ingester.go

pkg/service/collector.go

pkg/service/query.go

sschne · 2020-11-02T23:02:22Z

Looks great so far, and you even made me realize that we are using a deprecated flag all over our code base (admin-http-port). Thanks!

There are a few minor comments, but the direction looks good.

Thanks for your review. I have added all your requested changes and also added a e2e-test suite for the servicemonitors.

codecov · 2020-11-02T23:03:07Z

Codecov Report

Merging #1275 (4fe7e28) into master (93eb3c4) will increase coverage by 0.30%.
The diff coverage is 94.38%.

@@            Coverage Diff             @@
##           master    #1275      +/-   ##
==========================================
+ Coverage   87.59%   87.90%   +0.30%     
==========================================
  Files          94       98       +4     
  Lines        5956     6232     +276     
==========================================
+ Hits         5217     5478     +261     
- Misses        562      571       +9     
- Partials      177      183       +6

Impacted Files	Coverage Δ
pkg/apis/jaegertracing/v1/jaeger_types.go	`87.50% <ø> (ø)`
pkg/controller/jaeger/jaeger_controller.go	`38.00% <20.00%> (-0.75%)`	⬇️
pkg/controller/jaeger/servicemonitor.go	`77.77% <77.77%> (ø)`
pkg/autodetect/main.go	`88.46% <100.00%> (+1.50%)`	⬆️
pkg/deployment/all_in_one.go	`100.00% <100.00%> (ø)`
pkg/deployment/collector.go	`100.00% <100.00%> (ø)`
pkg/deployment/ingester.go	`100.00% <100.00%> (ø)`
pkg/deployment/query.go	`100.00% <100.00%> (ø)`
pkg/inventory/servicemonitor.go	`100.00% <100.00%> (ø)`
pkg/service/collector.go	`100.00% <100.00%> (ø)`
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93eb3c4...4fe7e28. Read the comment docs.

Makefile

pkg/controller/jaeger/servicemonitor_test.go

jpkrohling · 2020-11-03T09:41:36Z

pkg/controller/jaeger/jaeger_controller.go

-		return jaeger, nil
+	}
+
+	if jaeger.Spec.ServiceMonitor.Enabled != nil && *jaeger.Spec.ServiceMonitor.Enabled {


I think we might need a safe-guard here. When the service monitor isn't available but the CR sets it as "enabled=true", we see this:

WARN[0060] failed to reconcile servicemonitor error="no matches for kind \"ServiceMonitor\" in version \"monitoring.coreos.com/v1\"" instance=simple-prod namespace=default

We do detect already whether the cluster has support for this API:

$ make run ... INFO[0001] Install prometheus-operator in your cluster to create ServiceMonitor objects error="no ServiceMonitor registered with the API"

This is probably done during the bootstrap, so, we could just add to viper whether we should attempt to create service monitors if they are enabled.

Once we have that, it might make sense to set the default value of the "Enabled" flag to true in case the Prometheus Operator is available, and to false otherwise. This way, the flag is only needed in exceptional cases.

What do you think?

Sounds good. I'm not sure if we should enable the ServiceMonitor by default, since it is changing default behavior, but i would leave this decision up to you.

@kevinearls, @rubenvp8510, @objectiser, what do you think? One one hand, having two defaults might be confusing. On the other hand, it kinda makes sense to create a service monitor by default if the API for it exists. I think we have a similar behavior with the OAuth Proxy.

@jpkrohling if by two defaults you mean "If the prometheus operator is installed we create a service monitor by default, otherwise we don't" I think that would be fine, especially if we already have this behavior elsewhere.

@jpkrohling I think maybe for 1.x we should keep the same behaviour, but possibly revisit the defaults for 2.x?

Sorry misunderstood - yes I think it is ok to create if API exists, as long as we document that as the behaviour

@sschne, looks like we have a decision. Do you know what's needed to implement this?

@jpkrohling i see there has been done similar things with Elasticsearch Operator. So i detect whether the API is available via the autodetect package and add it to viper.
I think we can also remove the serviceMonitor.Enabled attribute from the jaeger spec again, as we then can enable/disable this on a per operator, rather than on a jaeger basis.

I'd still keep the CR flag in place: people should be able to individually turn it off on an individual basis, perhaps because the instance is a dev/staging one that doesn't need to be monitored. Like with ES, we'd then have the following options:

CLI to enable/disable (force) the auto-provision of the ServiceMonitor

CR field to enable/disable the provisioning of the ServiceMonitor

when the operator-wide config is set to "disable" but a CR is seen with "enable", we need to log a warn to the logs with this fact (as we are not honoring what we were requested to do)

The CLI flag is now in place with an default auto-detection of the prometheus-operator CRDs.

If the CR field is set to "enable" but the CRDs are not available or the global flag is explicitly disabled, it will issue a warning. This is handled in the jaeger-controller.

The admin services however, which are now dedicated Service objects, will always be created if the CR field is set to "enable", despite the status of the CLI flag.

jpkrohling · 2020-11-03T09:45:58Z

pkg/service/query.go

+// NewQueryAdminService returns a new Kubernetes service for Jaeger Query with admin port enabled
+func NewQueryAdminService(jaeger *v1.Jaeger, selector map[string]string) *corev1.Service {
+	service := NewQueryService(jaeger, selector)
+	service.Spec.Ports = append(service.Spec.Ports,


Sorry for not having caught this before, but I think we might need an extra service. I'm not sure it's a good idea to add one extra port to the services here, as people might be exposing these services currently and expect only one (or a specific set) of ports to be open.

The same applies to the other components.

collector, ingester and query component now create an additional Service with the admin port enabled

jpkrohling · 2020-12-08T10:52:09Z

@sschne are you still interested in this one?

Signed-off-by: Simon Schneider <[email protected]>

- import formatting - function names - error handling - tests Signed-off-by: Simon Schneider <[email protected]>

…add example Signed-off-by: Simon Schneider <[email protected]>

Signed-off-by: Simon Schneider <[email protected]>

sschne · 2021-06-14T18:55:21Z

Hi @jpkrohling , sorry for not looking into this for quite a while. I will try to get this PR back to life and resolve your comments

jpkrohling · 2021-06-24T13:54:34Z

Let me know if you need anything!

Signed-off-by: Simon Schneider <[email protected]>

…erands Signed-off-by: Simon Schneider <[email protected]>

Signed-off-by: Simon Schneider <[email protected]>

refactor service creation Signed-off-by: Simon Schneider <[email protected]>

Signed-off-by: Simon Schneider <[email protected]>

sschne · 2021-10-09T15:18:54Z

@jpkrohling i think this is now ready for review again

…erands

rubenvp8510 · 2021-10-11T03:22:14Z

First pass looks good I'll do another detailed review of this tomorrow morning

frzifus · 2022-06-28T08:15:38Z

hi @sschne, i know its quite a while. But now there are quite a few conflicting files. do you want to update your branch again? 🐎

frzifus · 2022-07-12T13:07:16Z

lets reopen this pr when the work continues.

mergify bot assigned objectiser Oct 27, 2020

mergify bot requested review from jpkrohling and objectiser October 27, 2020 08:24

sschne force-pushed the feature/1156-create-service-monitor-for-operands branch from be0a96e to 7b05db0 Compare October 27, 2020 08:26

jpkrohling reviewed Oct 27, 2020

View reviewed changes

jpkrohling mentioned this pull request Oct 27, 2020

Usage of deprecated 'admin-http-port' flag in the code #1276

Closed

johanavril mentioned this pull request Oct 28, 2020

Use New Admin Port Flag #1281

Merged

sschne force-pushed the feature/1156-create-service-monitor-for-operands branch from 0ab7c91 to b2dc17c Compare November 2, 2020 22:48

jpkrohling reviewed Nov 3, 2020

View reviewed changes

jpkrohling mentioned this pull request Nov 6, 2020

Could you please help me to understand or can you provide more details/reference that I can refer? #1295

Closed

sschne added 6 commits June 14, 2021 20:17

add servicemonitor for operands

2d58a57

Signed-off-by: Simon Schneider <[email protected]>

add generate and fix api rule violation

6527e5b

Signed-off-by: Simon Schneider <[email protected]>

minor fixes

7c00e84

- import formatting - function names - error handling - tests Signed-off-by: Simon Schneider <[email protected]>

add e2e-tests for servicemonitor, update prometheus-operator bundle, …

d17e736

…add example Signed-off-by: Simon Schneider <[email protected]>

fix linting errors

400f260

Signed-off-by: Simon Schneider <[email protected]>

add missing require.NoError, change all from assert to require.NoError

b974e92

Signed-off-by: Simon Schneider <[email protected]>

sschne force-pushed the feature/1156-create-service-monitor-for-operands branch from bedb97b to b974e92 Compare June 14, 2021 18:26

sschne added 6 commits September 25, 2021 16:35

Merge branch 'master' into HEAD

ebfd05e

Signed-off-by: Simon Schneider <[email protected]>

update prometheus-operator and minor fixes

5da32c1

Signed-off-by: Simon Schneider <[email protected]>

expose admin ports in dedicated service, add streaming e2e test

28a14fc

Signed-off-by: Simon Schneider <[email protected]>

downgrade to prometheus-operator 0.50.0

e9e28b6

Signed-off-by: Simon Schneider <[email protected]>

Merge branch 'master' into feature/1156-create-service-monitor-for-op…

cc3fca5

…erands Signed-off-by: Simon Schneider <[email protected]>

downgrade jaeger-client-go again

6f4195d

Signed-off-by: Simon Schneider <[email protected]>

sschne added 9 commits September 30, 2021 20:33

fix function comments

0ea41d6

Signed-off-by: Simon Schneider <[email protected]>

add global prometheus-provision flag

9224b3b

refactor service creation Signed-off-by: Simon Schneider <[email protected]>

fix comments

9d2b17d

Signed-off-by: Simon Schneider <[email protected]>

fix query admin services test

b64acec

Signed-off-by: Simon Schneider <[email protected]>

added tests for strategy

9a5ce07

Signed-off-by: Simon Schneider <[email protected]>

add typemeta

496879a

Signed-off-by: Simon Schneider <[email protected]>

add the serviceMonitor only if enabled in CR

40e9f80

Signed-off-by: Simon Schneider <[email protected]>

revert mistaken change in collectorService

f80e38b

Signed-off-by: Simon Schneider <[email protected]>

fix name in camelCase

56c76c8

Signed-off-by: Simon Schneider <[email protected]>

Merge branch 'master' into feature/1156-create-service-monitor-for-op…

4fe7e28

…erands

frzifus closed this Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add servicemonitor for operands #1275

add servicemonitor for operands #1275

sschne commented Oct 27, 2020 •

edited by jpkrohling

Loading

jpkrohling commented Oct 27, 2020

kevinearls commented Oct 27, 2020

jpkrohling left a comment

sschne commented Nov 2, 2020

codecov bot commented Nov 2, 2020 •

edited

Loading

jpkrohling Nov 3, 2020

sschne Nov 4, 2020

jpkrohling Nov 4, 2020

kevinearls Nov 5, 2020

objectiser Nov 5, 2020

objectiser Nov 5, 2020

jpkrohling Nov 5, 2020

sschne Nov 9, 2020 •

edited

Loading

jpkrohling Nov 10, 2020 •

edited

Loading

sschne Oct 3, 2021

jpkrohling Nov 3, 2020

jpkrohling Nov 3, 2020

sschne Sep 30, 2021

jpkrohling commented Dec 8, 2020

sschne commented Jun 14, 2021

jpkrohling commented Jun 24, 2021

sschne commented Oct 9, 2021

rubenvp8510 commented Oct 11, 2021

frzifus commented Jun 28, 2022

frzifus commented Jul 12, 2022

add servicemonitor for operands #1275

add servicemonitor for operands #1275

Conversation

sschne commented Oct 27, 2020 • edited by jpkrohling Loading

jpkrohling commented Oct 27, 2020

kevinearls commented Oct 27, 2020

jpkrohling left a comment

Choose a reason for hiding this comment

sschne commented Nov 2, 2020

codecov bot commented Nov 2, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sschne Nov 9, 2020 • edited Loading

Choose a reason for hiding this comment

jpkrohling Nov 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpkrohling commented Dec 8, 2020

sschne commented Jun 14, 2021

jpkrohling commented Jun 24, 2021

sschne commented Oct 9, 2021

rubenvp8510 commented Oct 11, 2021

frzifus commented Jun 28, 2022

frzifus commented Jul 12, 2022

sschne commented Oct 27, 2020 •

edited by jpkrohling

Loading

codecov bot commented Nov 2, 2020 •

edited

Loading

sschne Nov 9, 2020 •

edited

Loading

jpkrohling Nov 10, 2020 •

edited

Loading