Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not always send host.name and have host metadata processor enabled #10698

Closed
ruflin opened this issue Feb 12, 2019 · 14 comments
Closed

Not always send host.name and have host metadata processor enabled #10698

ruflin opened this issue Feb 12, 2019 · 14 comments
Assignees
Labels
discuss Issue needs further discussion. ecs libbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team

Comments

@ruflin
Copy link
Collaborator

ruflin commented Feb 12, 2019

Beats always sets the field host.name: https://github.com/elastic/beats/blob/master/libbeat/publisher/pipeline/module.go#L88 The reason this was introduce was to have always host data available as an object. Also the add_host_metadata processor is enabled in the default config. This make sense as long as the Beats collect data locally. But in cases like heartbeat, apm-server where the event is initiated on a different machine, having the host.name of the local machine does not make too much sense. Also if metricbeat fetches metrics from a remote host, the host.* data should be populate with the info from this host and not the host metricbeat is running on.

In the above cases, instead of being the agent, the role of the Beats is the observer: https://github.com/elastic/ecs#-observer-fields

To allow more flexibility, libbeat should stop populating host.name always. In addition a beat / event should be configurable if it's an agent on an observer. For the host metadata processor, it could be decided based on the role if the host metadata should be enriched or not.

Metricbeat or Filebeat can have different roles based on the input. If Filebeat reads data from a file, it's the agent, if it opens a syslog input, it becomes an observer.

@ruflin ruflin added discuss Issue needs further discussion. libbeat ecs labels Feb 12, 2019
@ruflin
Copy link
Collaborator Author

ruflin commented Feb 12, 2019

@roncohen @andrewvc @webmat I created this issue to start a discussion around this.

@simitt
Copy link
Contributor

simitt commented Feb 14, 2019

With aligning fields to meet ECS criteria it became clear that a beat can act as an agent , observer or both when handling an event.
Excerpt from ECS:

The agent fields contain the data about the software entity, if any, that collects, detects, or observes events on a host, or takes measurements on a host. Examples include Beats. Agents may also run on observers.

When acting as an agent enriching the event with meta information related to the agent itself might make sense, e.g. adding hostand agent information.

In situations where the beat acts as observer for an event, enriching it with this information does not make sense and can even be misleading, as the host info should concern the host where the event is created or measured (as defined by ECS).

Right now we introduced SkipXYZ methods to decide in which cases event data should be enriched with host name or with agent related information. Maybe it would be useful to review the current global and default processors and retrieve which processor make sense depending on the nature of the beat acting as agent or observer.

cc @urso @andrewkroh

@andrewvc andrewvc added the Team:obs-ds-hosted-services Label for the Observability Hosted Services team label Feb 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/uptime

@andrewvc
Copy link
Contributor

Having read @simitt Thinking about this more, I think, for heartbeat at least, the best answer is to stick with agent.* since beats already uses that. The distinction between agent and host seems like a fine line as @simitt notes.

Also salient to us is the notion of a target, and having a target.ip. The source, destination, and host fields don't seem appropriate to hearbeat's use case.

@webmat
Copy link
Contributor

webmat commented Feb 15, 2019

Here's an idea:

- add_host_metadata:
    target: [root|agent|observer]

@webmat
Copy link
Contributor

webmat commented Feb 15, 2019

@andrewvc I see the need for you to have your monitored endpoint IPs in a field that makes more sense than src/dst or cli/srv. At this point I'm not sure yet ECS should cover that, I'd need to see other use cases for it.

For now, you can simply keep track of it in your events in a custom field. I would try to avoid a generic nesting such as target.*, however, because it's the kind of word that may be used in the future by ECS (think target.user in the case of user management, user A affects target user B).

A place that would be safer for Heartbeat to have custom fields would be heartbeat.*, e.g. heartbeat.target.ip.

WDYT?

No stress if it's too late to adjust to this idea for 7.0. Let's deal with conflicts only if they materialize.

@andrewvc
Copy link
Contributor

@webmat I think for 7.0 the best thing we can do is just remove the host metadata for now.

I spoke to @ruflin about this, and the conclusion was that most people don't use this data now.

In the meantime users can manually edit the config to add in extra metadata if need be, and it will make adding the right fields later easier.

That said, in terms of what to do ultimately, are saying that all non-ECS fields should be under heartbeat.*?

@webmat
Copy link
Contributor

webmat commented Feb 15, 2019

Not necessarily. I'd say it depends on the situation. Some situations have a very low risk of conflict, or a very high risk of getting into ECS as is (#10760 is an example of the latter situation). In situations like this, I think it's fine to mix fields directly inside the ECS field space.

In many other situations (e.g. if one needs a whole field set space named like a general concept, like target.*), the fields should be nested in a place that is not expected to conflict with ECS. Ways of doing this are: nest under a company name, a product name, or an application name. The biggest risk here is the potential for mapping conflict exceptions. For example: you define target.user as a username specific to Heartbeat, and later ECS introduces target.user as a nesting of the user field set, for situations like user management. You have a keyword vs object situation, just like the Filebeat 6.x source field. 💥 :-)

@webmat
Copy link
Contributor

webmat commented Feb 15, 2019

Coming up with a good documentation page on how to think about integrating custom fields in an ECS index is very high in my mind.

@andrewvc
Copy link
Contributor

andrewvc commented May 7, 2019

Closing the loop here for heartbeat, we wound up using observer.* and added the add_observer_metadata processor.

@andrewvc
Copy link
Contributor

@ruflin Heartbeat has fixed this in #14140. Since there is a workaround, is this issue still necessary? I assume it's purpose is to track its implementation in other beats?

@ruflin
Copy link
Collaborator Author

ruflin commented Oct 23, 2019

Yes, we still need the issue to solve it in "all" Beats.

@urso
Copy link

urso commented Oct 23, 2019

We got too many open issues for this same issue :)

I did lay out a plan for handling this in #13920. I will follow up to create a meta-issue + per-task issue. I think we can close this one.

@ruflin
Copy link
Collaborator Author

ruflin commented Oct 24, 2019

Closing in favor of #13920 Newest one wins 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. ecs libbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team
Projects
None yet
Development

No branches or pull requests

6 participants