Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce service.type for all Metricbeat modules #8965

Merged
merged 6 commits into from
Nov 29, 2018

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented Nov 7, 2018

The service.type describes which service a dataset is connecting to. This is not necessarily the same for all datasets in one module. An example here is the K8s module which reaches out to 3 different services.

For now the default is just set to the module name if none is set. Unfortunately changing K8s module was not as easy as I though as it still has the old Fetch methods so the variables can not be set.

Questions:

  • As in most cases for use service.name and service.type are the same, should we also set service.name by default?

@ruflin ruflin added in progress Pull request is currently in progress. Metricbeat Metricbeat ecs labels Nov 7, 2018
@ruflin
Copy link
Contributor Author

ruflin commented Nov 7, 2018

@exekias @jsoriano I think we should update the k8s module to use the new Fetch methods. Especially it's a module that we actively develop.

@ruflin ruflin changed the title Introduce service.type for all Metricbeat modules [WIP] Introduce service.type for all Metricbeat modules Nov 7, 2018
@ruflin ruflin force-pushed the metricbeat-service-type branch from 48bac4e to ff7a7f5 Compare November 21, 2018 06:53
@ruflin ruflin mentioned this pull request Nov 21, 2018
@ruflin ruflin changed the title [WIP] Introduce service.type for all Metricbeat modules Introduce service.type for all Metricbeat modules Nov 21, 2018
@ruflin ruflin added review and removed in progress Pull request is currently in progress. labels Nov 21, 2018
@ruflin
Copy link
Contributor Author

ruflin commented Nov 21, 2018

This PR should be ready for review. I created a follow up issue here: #9190 @exekias @jsoriano @sayden It would be nice if one of you could pick it up.

One open question is still if we should by default always populate service.name too?

@jsoriano
Copy link
Member

Regarding your comment about K8s, does it mean that modules not using Reporter methods won't have the service.type field? Could we do something to cover also these modules?

And, should we also add service.type for filebeat modules?

One open question is still if we should by default always populate service.name too?

I don't see any major arguments in favour of this position or the opposite. What would be the use of service.name after this change?

@ruflin ruflin force-pushed the metricbeat-service-type branch from 2928cf8 to 34fd2f5 Compare November 21, 2018 11:40
@ruflin
Copy link
Contributor Author

ruflin commented Nov 21, 2018

@jsoriano All modules will have a service.type which defaults to the module name. The problem with k8s is that this module contains multiple services. Setting the service with the reporter is very easy but needs more hacks to do it with the old fetch method.

Add service.type for Filebeat modules? Yes

An example for the service.name: Assuming you have 2 Elasticsearch clusters. All the documents from both cluster will have service.type: elasticsearch, but the service.name will contain in this case potentially the cluster name which makes it possible to differentiate the data from the two. Same is true assming you have 3 mysql instances running. How do you differentiate them? service.name is the solution here. We should also offer a simple way in the module config to set / overwrite service.name I think.

@jsoriano
Copy link
Member

jsoriano commented Nov 21, 2018

Ok, got it.

For the use case for service.name I think that better if we don't fill service.name by default (with the same value as service.type)

@ruflin
Copy link
Contributor Author

ruflin commented Nov 21, 2018

Test failures seem to be related, need to investigate.

@ruflin ruflin force-pushed the metricbeat-service-type branch 2 times, most recently from 125b9bc to 9b53f47 Compare November 23, 2018 08:21
The service.type describes which service a dataset is connecting to. This is not necessarily the same for all datasets in one module. An example here is the K8s module which reaches out to 3 different services.

For now the default is just set to the module name if none is set. Unfortunately changing K8s module was not as easy as I though as it still has the old Fetch methods so the variables can not be set.

Questions:
* As in most cases for use service.name and service.type are the same, should we also set service.name by default?
@ruflin ruflin force-pushed the metricbeat-service-type branch from 9b53f47 to 4b1b2cb Compare November 26, 2018 06:36
@ruflin
Copy link
Contributor Author

ruflin commented Nov 26, 2018

jenkins, test this

@ruflin ruflin added the Team:Integrations Label for the Integrations team label Nov 27, 2018
@sayden
Copy link
Contributor

sayden commented Nov 28, 2018

Please, can you elaborate what's a service and a dataset in this context?

In our jargon, a module == service and here service.type might be dataset which might be elasticsearch or mysql (because, we are talking about a service) which makes me think that the type is the destination of some event and that in service.name you will have the source of the document, like cluster_1.

As a general rule, I would avoid naming things with very generic words or a conjunction of generic words. Example: type is very generic, the same with service so service.type might still be too generic for users, specially if they aren't used to ES jargon which will raise the entry barrier. Sorry, I don't have a better alternative because it's even difficult for me to understand the purpose of those fields.

@ruflin ruflin merged commit 377c9aa into elastic:master Nov 29, 2018
@ruflin
Copy link
Contributor Author

ruflin commented Nov 29, 2018

@sayden Service in this context is the service we connected to. Example is mysql, elasticsearch etc. The dataset describes a unique set of fields like the ones from system load or system cpu. Both are separate datasets. At the moment the value in event.dataset is cpu or load which I think is wrong as a dataset should be a unique identifier for it. So this will become system.cpu and system.load.

All these names are based on ECS. So definitions can be looked up here: https://github.com/elastic/ecs

@ruflin ruflin deleted the metricbeat-service-type branch November 29, 2018 11:05
@sayden
Copy link
Contributor

sayden commented Nov 29, 2018

Ok, understood. Thank you very much 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecs Metricbeat Metricbeat review Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants