-
Notifications
You must be signed in to change notification settings - Fork 156
Closes #74 - Changed Kubernetes Template to use DaemonSet #75
Closes #74 - Changed Kubernetes Template to use DaemonSet #75
Conversation
The tests are failing locally, due to this error:
I assume this is some incompatibility between the plugin and the Kubernetes version. I did some manual tests with both Cassandra and Elasticsearch, and it works: The application I used for this sample was an adaptation of OpenShift's To manually test using this application, follow the installation instructions on the readme, replacing the remote URLs for local paths ( Once Jaeger is installed, add an application, such as:
A pod like
At this point, you should see traces on Jaeger. |
what is "SWS-326"? |
abccf95
to
ccedc6d
Compare
Sorry, my bad. SWS-236 is the JIRA tracking my activity. I changed the commit and the PR title to refer to the issue on this repo here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a reference to the host IP (localhost is local to the pod scope)
env:
- name: JAEGER_AGENT_HOST
valueFrom:
fieldRef:
fieldPath: status.hostIP
@@ -64,18 +65,24 @@ Once everything is ready, `kubectl get service jaeger-query` tells you where to | |||
|
|||
### Deploying the agent as sidecar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer deploying as sidecar by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not by default, but providing instructions on how to deploy the agent as a sidecar is still useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right.
There should be some note for correct agent discovery from the apps though, as localhost
(the node) is not available on the pod scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure I understand what you mean, but when using a sidecar, localhost
is indeed correct.
labels: | ||
app: jaeger | ||
jaeger-infra: agent-instance | ||
spec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a hostnetwork: true
in here somewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why's that? Do we want the agent to receive spans from outside of the Kubernetes cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need to make sure the agent listens on the interface IP, not localhost (again, because you can't route to localhost
from the pod)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing something here. I'm under the assumption that the target applications are not going to send spans to the agent using localhost
as its address (as opposed to how it happens with a sidecar or bare metal deployment). Rather, they would send spans to a known address, like this:
- name: jaeger-configuration-volume | ||
mountPath: /conf | ||
ports: | ||
- containerPort: 5775 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for clarity you can also explicitly add hostPort
's in here
jaeger-production-template.yml
Outdated
- key: agent | ||
path: agent.yaml | ||
name: jaeger-configuration-volume | ||
- apiVersion: v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This service is unnecessary (as the pods on each node send their UDP reports to the host IP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but one benefit of having a service is that the instrumented application can refer to the agent via the hostname jaeger-agent
and not care about whether it's being deployed as a daemonset or regular pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means traffic may route into different nodes - the whole point of running the daemonset was that each pod would be able to route to the (single) jaeger-agent on the node it's scheduled on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reinforce @pieterlange comment, this means traffic will be mostly routed to a different nodes
Collapsing the discussion into a single comment for clarity: the discussion in #74 centers on (UDP) traffic routing in the case of daemonset deployments. The goal here is to make sure each pod that submits jaeger reports can submit them to the jaeger agent running on the same node. In the PR you use a service for routing the UDP traffic into jaeger-agents, which works, is convenient and is briefly described in the configuration as we can use a "well known" name for agent service discovery but is not achieving the desired effect (local node routing). As such we have to work around this issue by binding the daemonset pods to the hostnetwork (exposing the jaeger agent on the node IP) and add an environment variable containing the host IP to apps that want to submit jaeger reports. (see snippet above) |
Would the client need to know the IP where the DaemonSet is running? Or would this act as |
Yes, but the IP can be dynamically added to the environment using this stanza:
In practice, it is the local host - but we need to connect to the host IP address because
Giving the IP is relatively easy, we just need to make sure the environment variable is used by whatever is submitting data over UDP to the agent. Since we can't trust delivery using UDP we should make sure this data goes to the agent on the node as the node is running an agent anyway.
Maybe in the future, but not right now. |
I just tried removing the service and adding
I guess that's because the agent and collector are not on the same network anymore. |
I don't think we need the |
You need to set either My oversight here was that if you use On second consideration it's maybe cleaner to not use The environment configuration is needed to connect to the agent from the applications. |
ccedc6d
to
35a7cb7
Compare
Thanks, adding |
@pieterlange , @ledor473 , if this looks good to you, I'll prepare to merge this by Friday. |
I have a meta a request. We include all objects in one file, however most projects define objects in separate files. It has several advantages. One being that agent can be deployed as sidecar - so when it's all defined in one file users have to remove it from there. Could we do the same? |
@pavolloffay Let's not dump that into this PR. Ideally this should be deployed through a helm chart so you can just tweak a parameter to switch between deployment modes. (a different beast altogether) @jpkrohling LGTM |
@pavolloffay , as the "owner" of the tests, do you know why the test is failing?
Do I need to adjust the test somehow, or is it a fabric8 problem? |
@jpkrohling I don't know why it is failing. |
I don't know the test environment but it's quite possible that daemonsets aren't allowed/available in this type of environment. |
jaeger-production-template.yml
Outdated
@@ -24,8 +24,10 @@ items: | |||
jaeger-infra: collector-deployment | |||
spec: | |||
replicas: 1 | |||
strategy: | |||
type: Recreate | |||
selector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is this selector mandatory?
- What does it do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. Without the selector, this happens:
The Deployment "jaeger-collector" is invalid:
* spec.selector: Required value
* spec.template.metadata.labels: Invalid value: map[string]string{"jaeger-infra":"collector-pod", "app":"jaeger"}: `selector` does not match template `labels`
It looks like it's not required when the version is set to extensions/v1beta1
, so, I assume this became a requirement when it was moved out of beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this works with the beta version, I'll keep the template using that, as the test framework seems to require it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had also a question about back compatibility. Will apps/v1
work on older k8s versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not, but similarly, are we guaranteeing backwards compatibility? If so, up to which version?
I'd rather break it "now" (if anything) that this feature moved from beta, than commit to keep backwards compat with a beta version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way: as the test framework seems to require this older notation, I'm reverting this part of the change, but the backwards compatibility question is a good one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to commit to anything right now, but k8s itself does back. compatibility for betav1
so we could leverage it without doing anything special.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can support more versions basically for free the better.
jaeger-production-template.yml
Outdated
@@ -24,8 +24,10 @@ items: | |||
jaeger-infra: collector-deployment | |||
spec: | |||
replicas: 1 | |||
strategy: | |||
type: Recreate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove recreate? Why do you keep it for query deployment then?
Other thing in my mind. Do we want to add daemonset to all-in-one template? At the moment there is agent service |
Do you mean replacing the |
Maybe. For back compatibility we can keep the deployment for some time |
35a7cb7
to
ec38c01
Compare
Signed-off-by: Juraci Paixão Kröhling <[email protected]>
ec38c01
to
68b1434
Compare
Let's discuss this in a new issue. I'd also like to get a feedback from @pieterlange and @ledor473 on this before doing this change. |
I think it's OK to just support the latest stable release |
Signed-off-by: Juraci Paixão Kröhling [email protected]