Improve Monitoring Of Filebeat With New Metrics #33250

aveuiller · 2022-10-04T16:37:28Z

Hello,

Describe the enhancement

We are currently using custom methods to fetch some metrics that are important to have a view on the stability of Filebeat, as I mentioned in #33206.

We would like to see those metrics integrated natively. This would greatly simplify our workflow, and uniformize data collection for Filebeat instances both on baremetal and kubernetes pods.

The proposed enhancement is composed of 3 features that improve visibility on the state of Filebeat. The main point is to be able to tell if Filebeat is working as expected.

Describe a specific use case for the enhancement or feature:

In this section I will describe each metric and the integration we aim for them. The final use case is to integrate those new metrics into our alerting systems to react quickly to any bad state.

New Feature: Hearbeat

First of all, we currently have a cron sending messages to a log file every x minutes. This log file is tailed by Filebeat and the event sent to our infrastructure.
This gives us a good overview on the log collection status, by ensuring that logs flows continously. However, it currently requires external components.

We would love to see that directly handled by Filebeat, activated through the configuration for instance.

New Metric: Last Registry Update Time

Following an incident with a stalled Filebeat that was still attempting to send data, a non-updated registry seems to be a good indicator of a bad state that should be investigated ASAP.

We are currently retrieving the last update time through the command stat -c %Z /var/lib/filebeat/registry/filebeat/log.json, exported once again by custom tools.

Once again, having this data directly into Filebeat would be great. For instance integrated in the /stats results, this could look like the following:

{
  "beat": {
    "info": {
      "ephemeral_id": "62e0e489-14c5-4cbd-a87a-f2ebf4643a7a",
      "name": "filebeat",
      "uptime": {
        "ms": 205465136
      },
      "version": "8.3.3"
      "registry_update": {
        "timestamp": 1664896065
      }
    }
  }
}

New Metric: Kafka Connectivity Status

In the same vein as before we are monitoring the connectivity state by parsing the output of filebeat -e -c /etc/filebeat/filebeat.yml test output in order to ensure that all Kafka brokers can be contacted.

This would help tremendously to either have this kind of repetitive check as part of Filebeat, or simply keeping up with the amount of brokers in each state, independently of the configuration.

As before, integrated in the /stats results, this could look like the following:

{
  "libbeat": {
    "output": {
      "events": {},
      "read": {},
      "type": "kafka",
      "write": {},
      "brokers": {
        "pending": 1,
        "failed": 0,
        "connected": 2,
      }
    }
  }
}

Let me know if you need more details.

Best regards,
Antoine.

The text was updated successfully, but these errors were encountered:

botelastic · 2022-10-04T16:38:03Z

This issue doesn't have a Team:<team> label.

botelastic · 2023-10-04T17:04:10Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

aveuiller · 2023-10-06T12:01:45Z

👍

botelastic · 2024-10-05T12:49:57Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Oct 4, 2022

botelastic bot added the Stalled label Oct 4, 2023

botelastic bot removed the Stalled label Oct 6, 2023

botelastic bot added the Stalled label Oct 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Monitoring Of Filebeat With New Metrics #33250

Improve Monitoring Of Filebeat With New Metrics #33250

aveuiller commented Oct 4, 2022

botelastic bot commented Oct 4, 2022

botelastic bot commented Oct 4, 2023

aveuiller commented Oct 6, 2023

botelastic bot commented Oct 5, 2024

Improve Monitoring Of Filebeat With New Metrics #33250

Improve Monitoring Of Filebeat With New Metrics #33250

Comments

aveuiller commented Oct 4, 2022

Describe the enhancement

Describe a specific use case for the enhancement or feature:

New Feature: Hearbeat

New Metric: Last Registry Update Time

New Metric: Kafka Connectivity Status

botelastic bot commented Oct 4, 2022

botelastic bot commented Oct 4, 2023

aveuiller commented Oct 6, 2023

botelastic bot commented Oct 5, 2024