Closes #74 - Changed Kubernetes Template to use DaemonSet #75

jpkrohling · 2018-03-19T17:30:07Z

Signed-off-by: Juraci Paixão Kröhling [email protected]

jpkrohling · 2018-03-19T17:38:47Z

The tests are failing locally, due to this error:

[ERROR] io.jaegertracing.kubernetes.CassandraETest  Time elapsed: 6.276 s  <<< ERROR!
java.lang.RuntimeException: io.fabric8.kubernetes.clnt.v3_1.KubernetesClientException: Failure executing: POST at: https://192.168.39.71:8443/apis/extensions/v1/namespaces/itest-cc9d5c47/daemonsets. Message: the server could not find the requested resource. Received status: Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=null, name=null, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=the server could not find the requested resource, metadata=ListMeta(resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={}).
Caused by: io.fabric8.kubernetes.clnt.v3_1.KubernetesClientException: Failure executing: POST at: https://192.168.39.71:8443/apis/extensions/v1/namespaces/itest-cc9d5c47/daemonsets. Message: the server could not find the requested resource. Received status: Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=null, name=null, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=the server could not find the requested resource, metadata=ListMeta(resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={}).

I assume this is some incompatibility between the plugin and the Kubernetes version. I did some manual tests with both Cassandra and Elasticsearch, and it works:

The application I used for this sample was an adaptation of OpenShift's hello-openshift and can be found here:

https://github.com/jpkrohling/origin/tree/JPK-AddedJaegerTracingToHelloWorld/examples/hello-openshift

To manually test using this application, follow the installation instructions on the readme, replacing the remote URLs for local paths (production/configmap.yml instead of kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-kubernetes/master/production/configmap.yml)

Once Jaeger is installed, add an application, such as:

kubectl create -f https://raw.githubusercontent.com/jpkrohling/origin/JPK-AddedJaegerTracingToHelloWorld/examples/hello-openshift/hello-openshift.yaml

A pod like hello-openshift-deployment-6bb7f5c687-d54lx should be created. The logs should be like this:

2018/03/19 17:37:35 Initializing logging reporter
serving on 8080
2018/03/19 17:37:35 Jaeger tracer initialized
serving on 8888
Servicing request.
2018/03/19 17:37:37 Reporting span 38ab873eb8f039a8:38ab873eb8f039a8:0:1
Servicing request.
2018/03/19 17:37:47 Reporting span 6ff4ff825093f3db:6ff4ff825093f3db:0:1
Servicing request.

At this point, you should see traces on Jaeger.

yurishkuro · 2018-03-19T18:38:33Z

what is "SWS-326"?

jpkrohling · 2018-03-20T07:44:17Z

Sorry, my bad. SWS-236 is the JIRA tracking my activity. I changed the commit and the PR title to refer to the issue on this repo here.

pieterlange

This needs a reference to the host IP (localhost is local to the pod scope)

env:
- name: JAEGER_AGENT_HOST
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

pieterlange · 2018-03-20T10:11:57Z

README.md

@@ -64,18 +65,24 @@ Once everything is ready, `kubectl get service jaeger-query` tells you where to

 ### Deploying the agent as sidecar


No longer deploying as sidecar by default

Not by default, but providing instructions on how to deploy the agent as a sidecar is still useful.

Ah right.

There should be some note for correct agent discovery from the apps though, as localhost (the node) is not available on the pod scope.

I'm not quite sure I understand what you mean, but when using a sidecar, localhost is indeed correct.

pieterlange · 2018-03-20T10:14:06Z

jaeger-production-template.yml

+        labels:
+          app: jaeger
+          jaeger-infra: agent-instance
+      spec:


Needs a hostnetwork: true in here somewhere

Why's that? Do we want the agent to receive spans from outside of the Kubernetes cluster?

Also need to make sure the agent listens on the interface IP, not localhost (again, because you can't route to localhost from the pod)

I'm missing something here. I'm under the assumption that the target applications are not going to send spans to the agent using localhost as its address (as opposed to how it happens with a sidecar or bare metal deployment). Rather, they would send spans to a known address, like this:

https://github.com/jpkrohling/origin/blob/JPK-AddedJaegerTracingToHelloWorld/examples/hello-openshift/hello_openshift.go#L43

pieterlange · 2018-03-20T10:14:52Z

jaeger-production-template.yml

+          - name: jaeger-configuration-volume
+            mountPath: /conf
+          ports:
+          - containerPort: 5775


for clarity you can also explicitly add hostPort's in here

pieterlange · 2018-03-20T10:16:00Z

jaeger-production-template.yml

+                - key: agent
+                  path: agent.yaml
+            name: jaeger-configuration-volume
+- apiVersion: v1


This service is unnecessary (as the pods on each node send their UDP reports to the host IP)

True, but one benefit of having a service is that the instrumented application can refer to the agent via the hostname jaeger-agent and not care about whether it's being deployed as a daemonset or regular pod.

This means traffic may route into different nodes - the whole point of running the daemonset was that each pod would be able to route to the (single) jaeger-agent on the node it's scheduled on.

To reinforce @pieterlange comment, this means traffic will be mostly routed to a different nodes

pieterlange · 2018-03-20T11:18:18Z

Collapsing the discussion into a single comment for clarity: the discussion in #74 centers on (UDP) traffic routing in the case of daemonset deployments. The goal here is to make sure each pod that submits jaeger reports can submit them to the jaeger agent running on the same node.

In the PR you use a service for routing the UDP traffic into jaeger-agents, which works, is convenient and is briefly described in the configuration as we can use a "well known" name for agent service discovery but is not achieving the desired effect (local node routing).

As such we have to work around this issue by binding the daemonset pods to the hostnetwork (exposing the jaeger agent on the node IP) and add an environment variable containing the host IP to apps that want to submit jaeger reports. (see snippet above)

jpkrohling · 2018-03-20T13:25:12Z

The goal here is to make sure each pod that submits jaeger reports can submit them to the jaeger agent running on the same node.

Would the client need to know the IP where the DaemonSet is running? Or would this act as localhost from the perspective of the client? If the client needs to know the IP, I'd then rather use the service name, as it would provide a way for the client to connect to a more distant agent, in case the local agent is down. Also, I would have guessed that Kubernetes would route connections to the "closest" DaemonSet: is that not the case?

pieterlange · 2018-03-20T13:35:20Z

Would the client need to know the IP where the DaemonSet is running?

Yes, but the IP can be dynamically added to the environment using this stanza:

env:
- name: JAEGER_AGENT_HOST
  valueFrom:
    fieldRef:
      fieldPath: status.hostIP

Or would this act as localhost from the perspective of the client?

In practice, it is the local host - but we need to connect to the host IP address because localhost means something else from the Pod context (in Pod context it means the pod itself - this would work for sidecars)

If the client needs to know the IP, I'd then rather use the service name, as it would provide a way for the client to connect to a more distant agent, in case the local agent is down.

Giving the IP is relatively easy, we just need to make sure the environment variable is used by whatever is submitting data over UDP to the agent. Since we can't trust delivery using UDP we should make sure this data goes to the agent on the node as the node is running an agent anyway.
You're right kubernetes would make sure the Service is only fronting working instances of the agent, but i think this is the wrong tradeoff to make here.

Also, I would have guessed that Kubernetes would route connections to the "closest" DaemonSet: is that not the case?

Maybe in the future, but not right now.

jpkrohling · 2018-03-20T14:34:58Z

I just tried removing the service and adding hostNetwork: true to the DaemonSet, but then it has problems accessing the jaeger-collector:

{"level":"info","ts":1521556402.228892,"caller":"peerlistmgr/peer_list_mgr.go:166","msg":"Trying to connect to peer","host:port":"jaeger-collector:14267"}
{"level":"error","ts":1521556402.230131,"caller":"peerlistmgr/peer_list_mgr.go:171","msg":"Unable to connect","host:port":"jaeger-collector:14267","connCheckTimeout":0.25,"error":"dial tcp: lookup jaeger-collector on 192.168.122.1:53: no such host","stacktrace":"github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).ensureConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:171\ngithub.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr.(*PeerListManager).maintainConnections\n\t/home/travis/gopath/src/github.com/jaegertracing/jaeger/pkg/discovery/peerlistmgr/peer_list_mgr.go:101"}

I guess that's because the agent and collector are not on the same network anymore.

ledor473 · 2018-03-20T14:39:09Z

I don't think we need the hostNetwork: true when using the fieldPath: status.hostIP

pieterlange · 2018-03-20T14:50:04Z

You need to set either hostNetwork: true or specify hostPorts in the ports: {} section of the agent spec.

My oversight here was that if you use hostNetwork: true you should also set dnsPolicy: ClusterFirstWithHostNet on the podSpec for the agent.

On second consideration it's maybe cleaner to not use hostNetwork: true but use the hostPort way of exposing the service on the node IP. That configuration was broken for a while for a bunch of CNI's so force of habit automatically guided me towards hostNetwork..

The environment configuration is needed to connect to the agent from the applications.

jpkrohling · 2018-03-20T15:51:35Z

Thanks, adding dnsPolicy: ClusterFirstWithHostNet did the trick. I just updated this PR to incorporate the changes you mentioned (hostNetwork/dnsPolicy). The sample I used for testing was also updated: https://github.com/jpkrohling/origin/blob/JPK-AddedJaegerTracingToHelloWorld/examples/hello-openshift/hello_openshift.go

jpkrohling · 2018-03-21T09:02:53Z

@pieterlange , @ledor473 , if this looks good to you, I'll prepare to merge this by Friday.

pavolloffay · 2018-03-21T12:27:36Z

I have a meta a request. We include all objects in one file, however most projects define objects in separate files. It has several advantages. One being that agent can be deployed as sidecar - so when it's all defined in one file users have to remove it from there.

Could we do the same?

pieterlange · 2018-03-21T13:44:42Z

@pavolloffay Let's not dump that into this PR. Ideally this should be deployed through a helm chart so you can just tweak a parameter to switch between deployment modes. (a different beast altogether)

@jpkrohling LGTM

jpkrohling · 2018-03-21T13:51:18Z

@pavolloffay , as the "owner" of the tests, do you know why the test is failing?

Caused by: io.fabric8.kubernetes.clnt.v3_1.KubernetesClientException: Failure executing: POST at: https://192.168.39.234:8443/apis/extensions/v1/namespaces/itest-23f563c3/daemonsets. Message: the server could not find the requested resource. Received status: Status(apiVersion=v1, code=404, details=StatusDetails(causes=[], group=null, kind=null, name=null, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=the server could not find the requested resource, metadata=ListMeta(resourceVersion=null, selfLink=null, additionalProperties={}), reason=NotFound, status=Failure, additionalProperties={}).

Do I need to adjust the test somehow, or is it a fabric8 problem?

pavolloffay · 2018-03-21T14:15:54Z

@jpkrohling I don't know why it is failing.

pieterlange · 2018-03-21T14:20:09Z

I don't know the test environment but it's quite possible that daemonsets aren't allowed/available in this type of environment.

pavolloffay · 2018-03-22T09:15:37Z

jaeger-production-template.yml

@@ -24,8 +24,10 @@ items:
      jaeger-infra: collector-deployment
  spec:
    replicas: 1
-    strategy:
-      type: Recreate
+    selector:


Is this selector mandatory?

What does it do?

It is. Without the selector, this happens:

The Deployment "jaeger-collector" is invalid: * spec.selector: Required value * spec.template.metadata.labels: Invalid value: map[string]string{"jaeger-infra":"collector-pod", "app":"jaeger"}: `selector` does not match template `labels`

It looks like it's not required when the version is set to extensions/v1beta1, so, I assume this became a requirement when it was moved out of beta.

As this works with the beta version, I'll keep the template using that, as the test framework seems to require it.

I had also a question about back compatibility. Will apps/v1 work on older k8s versions?

Probably not, but similarly, are we guaranteeing backwards compatibility? If so, up to which version?

I'd rather break it "now" (if anything) that this feature moved from beta, than commit to keep backwards compat with a beta version.

By the way: as the test framework seems to require this older notation, I'm reverting this part of the change, but the backwards compatibility question is a good one.

I don't think we want to commit to anything right now, but k8s itself does back. compatibility for betav1 so we could leverage it without doing anything special.

If we can support more versions basically for free the better.

pavolloffay · 2018-03-22T09:16:13Z

jaeger-production-template.yml

@@ -24,8 +24,10 @@ items:
      jaeger-infra: collector-deployment
  spec:
    replicas: 1
-    strategy:
-      type: Recreate


Why did you remove recreate? Why do you keep it for query deployment then?

pavolloffay · 2018-03-22T09:38:13Z

Other thing in my mind. Do we want to add daemonset to all-in-one template? At the moment there is agent service

jpkrohling · 2018-03-22T10:09:17Z

Do we want to add daemonset to all-in-one template?

Do you mean replacing the Deployment by a DaemonSet?

pavolloffay · 2018-03-22T10:23:00Z

Do you mean replacing the Deployment by a DaemonSet?

Maybe. For back compatibility we can keep the deployment for some time

Signed-off-by: Juraci Paixão Kröhling <[email protected]>

jpkrohling · 2018-03-22T10:56:41Z

Maybe. For back compatibility we can keep the deployment for some time

Let's discuss this in a new issue. I'd also like to get a feedback from @pieterlange and @ledor473 on this before doing this change.

pavolloffay · 2018-03-22T11:03:58Z

Also see this https://kubernetes.io/docs/reference/workloads-18-19/#post-19

pieterlange · 2018-03-22T11:21:12Z

I think it's OK to just support the latest stable release

jpkrohling force-pushed the SWS-326-SwitchToDaemonSets branch from abccf95 to ccedc6d Compare March 20, 2018 07:43

jpkrohling changed the title ~~SWS-326 - Changed Kubernetes Template to use DaemonSet~~ Closes #74 - Changed Kubernetes Template to use DaemonSet Mar 20, 2018

jpkrohling mentioned this pull request Mar 20, 2018

Use DaemonSet by default instead of sidecar #74

Closed

pieterlange suggested changes Mar 20, 2018

View reviewed changes

pieterlange reviewed Mar 20, 2018

View reviewed changes

jpkrohling force-pushed the SWS-326-SwitchToDaemonSets branch from ccedc6d to 35a7cb7 Compare March 20, 2018 15:49

pavolloffay mentioned this pull request Mar 22, 2018

Do not merge! #77

Closed

pavolloffay reviewed Mar 22, 2018

View reviewed changes

jpkrohling force-pushed the SWS-326-SwitchToDaemonSets branch from 35a7cb7 to ec38c01 Compare March 22, 2018 10:44

Closes jaegertracing#74 - Changed Kubernetes Template to use DaemonSet

68b1434

Signed-off-by: Juraci Paixão Kröhling <[email protected]>

jpkrohling force-pushed the SWS-326-SwitchToDaemonSets branch from ec38c01 to 68b1434 Compare March 22, 2018 10:45

jpkrohling mentioned this pull request Mar 22, 2018

DaemonSet for all-in-one #80

Open

jpkrohling merged commit 68b1434 into jaegertracing:master Mar 22, 2018

		@@ -64,18 +65,24 @@ Once everything is ready, `kubectl get service jaeger-query` tells you where to

		### Deploying the agent as sidecar

Closes #74 - Changed Kubernetes Template to use DaemonSet #75

Closes #74 - Changed Kubernetes Template to use DaemonSet #75

Conversation

jpkrohling commented Mar 19, 2018

jpkrohling commented Mar 19, 2018

yurishkuro commented Mar 19, 2018

jpkrohling commented Mar 20, 2018 • edited Loading

pieterlange left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pieterlange commented Mar 20, 2018

jpkrohling commented Mar 20, 2018

pieterlange commented Mar 20, 2018 • edited Loading

jpkrohling commented Mar 20, 2018

ledor473 commented Mar 20, 2018 • edited Loading

pieterlange commented Mar 20, 2018 • edited Loading

jpkrohling commented Mar 20, 2018

jpkrohling commented Mar 21, 2018

pavolloffay commented Mar 21, 2018

pieterlange commented Mar 21, 2018 • edited Loading

jpkrohling commented Mar 21, 2018 • edited Loading

pavolloffay commented Mar 21, 2018

pieterlange commented Mar 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Mar 22, 2018

jpkrohling commented Mar 22, 2018

pavolloffay commented Mar 22, 2018

jpkrohling commented Mar 22, 2018

pavolloffay commented Mar 22, 2018

pieterlange commented Mar 22, 2018

jpkrohling commented Mar 20, 2018 •

edited

Loading

pieterlange commented Mar 20, 2018 •

edited

Loading

ledor473 commented Mar 20, 2018 •

edited

Loading

pieterlange commented Mar 20, 2018 •

edited

Loading

pieterlange commented Mar 21, 2018 •

edited

Loading

jpkrohling commented Mar 21, 2018 •

edited

Loading