-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add trusted CA bundle support for OpenShift #1079
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1079 +/- ##
==========================================
+ Coverage 87.81% 88.01% +0.19%
==========================================
Files 85 86 +1
Lines 5172 5256 +84
==========================================
+ Hits 4542 4626 +84
Misses 466 466
Partials 164 164
Continue to review full report at Codecov.
|
@kevinearls Would it be possible to kick off a test of this PR on OCP? 4.4 and 4.2 would be good. In addition to any automated tests, can you just check the Jaeger UI is available via the route. |
@objectiser When I ran the CI tests on OCP 4.4 I got two failures. The operator log contained a number of these messages: https://www.google.com/url?q=https://primetime.bluejeans.com/a2m/live-event/pahjkwua&sa=D&ust=1591558608965000&usg=AOvVaw3tHOqaVaXzpqQt1V4l6Iq1 The tests create a Jaeger instance for each test case, so maybe we need to delete the configmap when the jaeger instance is deleted. |
@kevinearls I've updated the PR, as its possible a user would have already configured the volumeMount with a trusted CA bundle (i.e. Service Mesh will be doing that). I tried to manually deploy to reproduce the problem you were experiencing, and the configmap seems to be correctly deleted when the Jaeger instance CR is removed. Would you be able to retest - and if necessary update the tests to wait for the configmap to be undeployed (if that is the issue)? |
@objectiser The ES index tests are still failing on me, although I found out that most of the errors in the log occurred immediately after the Jaeger instance was created: https://github.com/jaegertracing/jaeger-operator/blob/master/test/e2e/elasticsearch_test.go#L140-L146 That doesn't make much difference initially, as the smoke test step passes. However, the cron job that is supposed to get created when we enable the index cleaner never appears. https://github.com/jaegertracing/jaeger-operator/blob/master/test/e2e/elasticsearch_test.go#L256-L265. I don't see anything in the operator log about this. I will try to get more info tomorrow if possible. |
@kevinearls Would it also be worth trying the same tests against |
@objectiser It's definitely due to the PR, tests pass when I run against master. |
Is this related to #1043? |
@jpkrohling No this relates to TRACING-1208 - first attempt to implement was just mounting the voume on the oauth proxy container, which is the original problem - but when that failed in OCP I thought it may need the volume mounted in all components (which is also the same solution that Service Mesh would be implementing by adding the volume/mounts in their Jaeger templates). So we need to find a solution to TRACING-1208 and decide whether volume/mounts required across all components or just for oauth-proxy. Ideally we need the problem reproduced in a OCP cluster to properly test with custom CA bundle. |
Running this PR locally, I noticed that the volume/volumemount is added to the query container, not to the OAuth container: spec:
containers:
- args:
- --collector.grpc.tls.cert=/etc/tls-config/tls.crt
- --collector.grpc.tls.enabled=true
- --collector.grpc.tls.key=/etc/tls-config/tls.key
- --query.ui-config=/etc/config/ui.json
- --reporter.grpc.tls.ca=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
- --reporter.grpc.tls.enabled=true
- --reporter.grpc.tls.server-name=simplest-collector-headless.default.svc.cluster.local
- --reporter.type=grpc
- --sampling.strategies-file=/etc/jaeger/sampling/sampling.json
env:
- name: SPAN_STORAGE_TYPE
value: memory
- name: COLLECTOR_ZIPKIN_HTTP_PORT
value: "9411"
image: jaegertracing/all-in-one:1.18.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /
port: 14269
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 15
successThreshold: 1
timeoutSeconds: 1
name: jaeger
ports:
- containerPort: 5775
name: zk-compact-trft
protocol: UDP
- containerPort: 5778
name: config-rest
protocol: TCP
- containerPort: 6831
name: jg-compact-trft
protocol: UDP
- containerPort: 6832
name: jg-binary-trft
protocol: UDP
- containerPort: 9411
name: zipkin
protocol: TCP
- containerPort: 14267
name: c-tchan-trft
protocol: TCP
- containerPort: 14268
name: c-binary-trft
protocol: TCP
- containerPort: 16686
name: query
protocol: TCP
- containerPort: 14269
name: admin-http
protocol: TCP
- containerPort: 14250
name: grpc
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: 14269
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/config
name: simplest-ui-configuration-volume
readOnly: true
- mountPath: /etc/jaeger/sampling
name: simplest-sampling-configuration-volume
readOnly: true
- mountPath: /etc/tls-config
name: simplest-collector-tls-config-volume
readOnly: true
- mountPath: /etc/pki/ca-trust/extracted/pem
name: simplest-trusted-ca
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: simplest-ui-proxy-token-bkm5j
readOnly: true
- args:
- --cookie-secret=7+6bwAmtQOK6z5otl0X3RQ==
- --https-address=:8443
- --openshift-service-account=simplest-ui-proxy
- --provider=openshift
- --tls-cert=/etc/tls/private/tls.crt
- --tls-key=/etc/tls/private/tls.key
- --upstream=http://localhost:16686
image: openshift/oauth-proxy:latest
imagePullPolicy: Always
name: oauth-proxy
ports:
- containerPort: 8443
name: public
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/tls/private
name: simplest-ui-oauth-proxy-tls
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: simplest-ui-proxy-token-bkm5j
readOnly: true Your description seems to indicate that it should be added to the OAuth Proxy (which is also what makes sense to me). Once @jkandasa and/or @kevinearls are able to reproduce this in a test cluster, I'll update this PR to get the volume/volumemount added to the OAuth sidecar. |
Signed-off-by: Gary Brown <[email protected]>
…te a Jaeger instance specific trust CA configmap or volume/mount Signed-off-by: Gary Brown <[email protected]>
Signed-off-by: Gary Brown <[email protected]>
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
@jpkrohling Must have made the mistake when changing from oauth proxy only to all components. However might be worth testing this PR yourself on OCP to see if you get the es-index-cleaner issue found by @kevinearls? |
The last state of this PR is passing the e2e tests in OpenShift 4.4:
(edit: output changed to show the results for tests on 4.4) |
|
||
func deployTrustedCA(jaeger *v1.Jaeger) bool { | ||
for _, vm := range jaeger.Spec.JaegerCommonSpec.VolumeMounts { | ||
if strings.HasPrefix(vm.MountPath, "/etc/pki/ca-trust/extracted/pem") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling Is this a reason approach to disable adding the custom CA bundle/volumes/mounts, if (for example) Service Mesh explicitly adds them into the Jaeger CR? Or do we need a better approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll think about it some more, but I think it's good enough. In any case, there's no way to be 100% sure that there won't be a clash, as the path inside the volumes themselves might end up conflicting (/etc/pki
with ca-trust
inside vs. /etc/pki/ca-trust
).
Edit: actually, we could compare the paths, and see if the CA bundle volume mount would clash with an existing volume mount (/etc/pki
in the example above).
@jpkrohling If you are happy with the PR as is, can you approve and merge? If any other issues are found in subsequent QE testing they can be fixed in a followup PR. |
@objectiser @jpkrohling I verified this PR via LGTM 👍 |
viper.Set("platform", "other") | ||
defer viper.Reset() | ||
|
||
jaeger := v1.NewJaeger(types.NamespacedName{Name: "TestGetWithoutTrustedCA"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we started using a generic name some time ago. Was it changed recently, to use individual names for the instances in each test? If not, there's no need to change this now.
}}, | ||
}, | ||
Data: map[string]string{ | ||
"ca-bundle.crt": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are showing that this is fine, but I wonder if there's a reason to have this as empty first? Also, would the reconciliation phase revert this to empty? If so, OpenShift seems to be refilling this quite fast (possibly with a webhook, before changes are actually applied).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be an idea to check with kubectl get -w configmap -o yaml
- could be a potencial race?
No description provided.