Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include k8s.pod.hostname in semantic conventions #1072

Closed

Conversation

pmm-sumo
Copy link
Contributor

@pmm-sumo pmm-sumo commented Oct 7, 2020

Include k8s.pod.hostname in k8s semantic conventions.

Since this is a relatively small change (and it follows the naming scheme used right now), I didn't open a separate issue. Will be happy to update the upcoming YAML definitions if this gets accepted.

Related issues #
open-telemetry/opentelemetry-collector-contrib#141

@pmm-sumo pmm-sumo changed the title Include k8s.pod.hostname and k8s.container.id and image Include k8s.pod.hostname in semantic conventions Oct 7, 2020
@pmm-sumo
Copy link
Contributor Author

pmm-sumo commented Oct 7, 2020

On the second thought, this got me wondering if maybe host.hostname should be used instead? It's not clearly documented (missing from the host semantic conventions) though mentioned in e.g. Logs Data Model

@dmitryax
Copy link
Member

dmitryax commented Oct 7, 2020

@pmm-sumo the host.hostname was recently removed from the conventions #838

@pmm-sumo
Copy link
Contributor Author

pmm-sumo commented Oct 7, 2020

Thanks for the clarification @dmitryax

I have opened a draft update for the collector (which is ready to be converted into a normal PR after this one gets merged). I am not sure how is it impacted by the focus on GA, though then it's a rather minimal change

@Oberon00
Copy link
Member

Oberon00 commented Oct 8, 2020

What is the difference of that attribute to host.name?

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because I think this is redundant with host.name (https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/resource/semantic_conventions/host.md) and I think this needs a proper explanation of what the difference would be.

@@ -49,6 +49,7 @@ containers on your cluster.
|---|---|---|
| k8s.pod.uid | The uid of the Pod. | `275ecb36-5aa8-4c2a-9c47-d8bb681b9aff` |
| k8s.pod.name | The name of the Pod. | `opentelemetry-pod-autoconf` |
| k8s.pod.hostname | The hostname of the Pod, if set, or [name, otherwise](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields). | `opentelemetry-pod-autoconf` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why default it to the pod name which is already stored in pod.name? As a result you can never be sure if you actually got a hostname, or just the pod name (since the two are probably often but not always equal)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this becomes the effective hostname. Which leads to the conclusion that host.name is probably a better fit for it, as suggested. What do you think @dmitryax ?

Copy link
Member

@tigrannajaryan tigrannajaryan Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Oberon00 if I am reading the K8s documentation correctly then that's how K8s behaves, it is not our choice. By default Pod's hostname fetched from K8s will be equal to the Pod's name unless the hostname field is explicitly specified in the Pod's configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not what the semantic convention says. If I read this, I would have expected to implement it by

  1. Use gethostname to find my hostname
  2. If it's empty, use the same value as set for k8s.pod.name

Nowhere does it say that we should ask k8s for that hostname.

Copy link
Contributor Author

@pmm-sumo pmm-sumo Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I introduced this PR, following the discussion I am thinking that simply using host.name might work better... The only concern I have is how the hostname value of Pod maps to compute instance. E.g. when the Kubernetes Node runs on, say, EC2, each of the pods running there would have a different host.name value. Should the latter refer to EC2 instance hostname or Pod hostname?

Copy link
Member

@Oberon00 Oberon00 Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I would say the "inner" hostname. With nested containerization and possibly nested VMs, it is hard to determine a "real" host. But maybe we should duplicate the host resource to a physical_host, which is the app's best guess at the "real" host. Or outer_host which is the app's best guess at the outer-next host. But that is something worth clarifying in the spec in any case.

Copy link
Member

@tigrannajaryan tigrannajaryan Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node's host vs Pod's host is a general problem with how our resources/attributes are defined (not just for "hostname"). We commingle them without specifying which entity the attributes apply to. I deliberately avoided raising this topic close to GA and this is something I plan to address after GA since it may require conceptual changes in how we understand resources in 2.0.

For now the best we can do is to avoid the problem. "Avoiding" here means not to record ambiguous attributes. "hostname" is ambiguous in Kubernetes context.

I would advise to do one of the following:

  • Don't record the host name at all in the Collector's k8s processor. This would be my preference unless there are other arguments.
  • If we absolutely must record the host name then either accept k8s.pod.hostname+k8s.node.hostname as a semantic convention or just record it in the Collector as an experimental capability without corresponding semantic convention. Time will tell if it needs to stay and become a semantic convention in the spec.

Copy link
Contributor Author

@pmm-sumo pmm-sumo Oct 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, the k8s.node.name frequently holds the hostname of the machine where the node resides.

In either case - I wouldn't want this problem to take the focus out of the GA. I fully agree with @tigrannajaryan note. If nobody objects I am going to close this PR and skip adding the capability until after-GA (and with a solid discussion).

@Oberon00 Oberon00 added area:semantic-conventions Related to semantic conventions spec:resource Related to the specification/resource directory labels Oct 8, 2020
@pmm-sumo
Copy link
Contributor Author

pmm-sumo commented Oct 9, 2020

Closing this issue per the conversation. Lets go back to this after GA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions spec:resource Related to the specification/resource directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants