-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Every (log?) integration should deal with Kubernetes fields like the Kubernetes integration (or vice-versa) #9808
Comments
Thanks for the detailed write up @herrBez . For me the target is, that all data going into
Keyword with keyword.textHaving all Few thoughts:
Kubernetes fieldsThe k8s fields are not aligned, as ECS does not have k8s fields but orchestrator fields. Also the orchestrator fields do not cover the full list of fields from k8s (fields can be found here). That we end up with different mappings for these fields is not acceptable. It seems the list of fields that causes issues is a very limited set of fields and we need to pick one of the two behaviours. Is there a use of non keyword fields for labels and annotations? If not, keyword seems like the right default. There is also a problem in data views in Kibana. Even though in the above scenario, different mappings should not exist, there are cases where in all logs Side note: The dedotting is not required anymore as we have introduced subobject false. This is a very recent addition. |
I never noticed it but the change is pretty huge 😅 . Having labels and annotations in the form of I kind of like the idea of the orchestrator field because:
TLDR: I agree, keyword is the right call! Long Answer: According to Kubernetes Documentation annotations value needs to strings. For labels we don't have a clear warning like for annotations, but I feel like they should be string as well. This issue seems to confirm it: kubernetes/kubernetes#57509. Trying to create an object with a non-string labels will fails.
While I appreciate the fact that having more precise datasets will avoid this kind of issues, I somehow feel it's kind of a betrayal of the original promise of ECS: be allowed to correlate different sources
Is it also in-use by the different integrations? Last time I checked it was not enabled |
+1 to the orchestrator fields. About
While possible, this doesn't seem like a good approach to me - if the |
I'm talking here about fields which are not part of ECS. ECS is the foundation but users always have their own fields. It is ok if these are different in different datasets. Kibana should only complain, if the ECS fields are not the same.
I wonder if most users know the difference between querying / filtering on a text vs keyword field? In quite a few cases, both are needed and we should support users in making the right choice. What are our next steps here? I expect the kubernetes fields problem to stick around for a long time even if we recommend moving to orchestrator fields. One idea: Have a special mapping for labels to ensure these are ALWAYS keywords (and only keywords) in the ecs template? |
You mean keyword-only for Another idea would be to always add |
Pick the fields where it is required to have keyword only and ensure either flattened or not flattened is aligned with kubernetes integrations or swap it. For About logs@custom, I assume it is currently there for all non integration datasets but for integrations, it is not included? I agree, it should be there not only for this case. |
I assume priority wise it would sit after the |
This makes sense to me, the new
General methodology here as I understand it is "stack-defined mappings" -> "package defined mappings" -> "user defined mappings" -> "ecs mappings" -> "fleet metadata". Am I understanding the intent correctly here? |
the twist around |
I took a look at the code in Kibana to see if this would be a quick change, and it's a little more involved than I initially thought. We need to do a bit of a refactor because the |
Problem Statement
It is not uncommon to read logs from Kubernetes and send them into different integrations (e.g., if I am reading the data from an Nginx Pod I want to use the NGINX integration etc.). To accomplish this goal we have at least two alternatives:
Autodiscoery works perfectly fine for Filebeat, where all kubernetes metadata fields are specified in the template, in the integration case it can lead to unwanted consequences, i.e., creation of unexpected fields and potentially mapping differences between (kubernetes.container_logs and other integration datasets).
Indeed, since the kubernetes dynamic template rules and static fields are not defined, we may end up in having different mappings in different datastreams. A couple of examples:
kubernetes.pod.name
: it has always been a keyword. However, if the data is rerouted to another integration it will result in the creation ofkubernetes.pod.name.text
field because of the dynamic template ruleecs_path_match_keyword_and_match_only_text
The
kubernetes.container_logs
integration containskubernetes.node.labels
,kubernetes.node.annotations
,kubernetes.namespace_labels
,kubernetes.labels
,kubernetes.namespace_annotations
andkubernetes.selectors
dynamic template rules that match every type and map them as keyword (note that nested fields likekuberntes.labels.test.foo
are not accepted. The fields are dedoted by Beats https://www.elastic.co/guide/en/fleet/current/kubernetes-provider.html).Potential Impact
Creating unnecessary fields is a potential waste of resources and can slow down ingestion
If users starts accidentally use the field
kubernetes.pod.name.text
to define their alerts or to query data they may unwillingly filter out the kubernetes.container_logsIf for some reasons we have a label of type integer (not sure it can actually happen) we may have different mappings between NGINX and Kubernetes integration. Labels are used as identifier for Kubernetes and can be used to group together all logs for a given application. Having different mappings can make a field not usable in Kibana or not viable for certain types of queries.
Example:
How similar fields are treated elsewhere
If we take the example of
container
andcloud
metadata, these are defined in every single integration. Because in the end these are metadata to identify the source object sending data.Potential Solutions Identified so far
Add the kubernetes dynamic template rules and core kubernetes fields like
kubernetes.pod.name
,kubernetes.namespace
in the template of each and every integration likecontainer
andcloud
metadataRely everywhere on the ecs@mapping component template to avoid mismatches between kubernetes and the other datasets (and hopefully find a way to map the kubernetes.pod.name and similar fields to keyword). As of now the ecs@mapping component template would map also host.name as text and keyword.
Add an
infra
component template containing all infrastructure fields and their dynamic template rules counterparts (e.g.,host
,cloud
,kubernetes
)Somehow reuse/nest the
logs-kubernetes.container_logs
into another component templateCurrent Workaround
The current workaround would be to define in each and every integration
@custom
template the fields. The workaround is viable if the amount of integration is small. Ideally, having aglobal@custom
, (and especially)logs@custom
,logs-dataset@custom
component templates as described elastic/kibana#149484 , would make the workaround viable even when using many integrations.CC @ruflin, @philippkahr, @flash1293 . Please chime-in if I forgot something.
The text was updated successfully, but these errors were encountered: