Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metricbeat][zookeeper][mntr] Metricbeat Zookeeper MNTR metricset should surface when mntr probe fails with instance is not currently serving requests #22846

Closed
geekpete opened this issue Dec 2, 2020 · 5 comments
Labels
enhancement Metricbeat Metricbeat module Stalled Team:Service-Integrations Label for the Service Integrations team

Comments

@geekpete
Copy link
Member

geekpete commented Dec 2, 2020

Describe the enhancement:

The Zookeeper MNTR metricset should surface a response into the resulting event when Zookeeper MNTR probe response is:

This ZooKeeper instance is not currently serving requests

which occurs when the Zookeeper node itself is still up but quorum has been lost.

Currently the resulting event from such a probe contains no response and is more or less blank of useful data indicating what the node is doing:

{
  "_index": "metricbeat-7.10.0-2020.11.30-000001",
  "_type": "_doc",
  "_id": "CpNDInYBMqdaVa4Rh--S",
  "_version": 1,
  "_score": null,
  "_source": {
    "container": {
      "image": {
        "name": "docker.elastic.co/cloud-enterprise/elastic-cloud-enterprise:2.5.1"
      },
      "name": "frc-zookeeper-servers-zookeeper",
      "id": "9b556946c060ed14af599c328d733fd8124e29ca8a1cfe2e05509dc2d91d93bb"
    },
    "agent": {
      "hostname": "128f00d03842",
      "name": "128f00d03842",
      "id": "22726ebc-438f-4517-a1af-7ea0db17ab65",
      "ephemeral_id": "6e22bfef-2e88-4b5f-b2f4-3cd498ca5593",
      "type": "metricbeat",
      "version": "7.10.0"
    },
    "zookeeper": {
      "mntr": {
        "latency": {},
        "packets": {}
      }
    },
    "docker": {
      "container": {
        "labels": {
          "org_label-schema_name": "elastic-cloud-enterprise",
          "org_label-schema_build-date": "2020-05-25T20:05:49Z",
          "org_label-schema_schema-version": "1.0",
          "org_label-schema_vcs-ref": "6227683",
          "org_label-schema_version": "2.5.1",
          "co_elastic_ci_worker": "cloud-ci-immutable-ubuntu-1604-1590436694814909094",
          "author": "Cloud Enterprise Developers, [email protected]",
          "org_label-schema_vendor": "Elastic",
          "co_elastic_ci_build-tag": "jenkins-cloud-pipeline-2.5.1-BC_4-1",
          "co_elastic_vcs-branch": "2.5.1-BC_4",
          "org_label-schema_license": ""
        }
      }
    },
    "cloud": {
      "availability_zone": "asia-southeast1-a",
      "instance": {
        "name": "sl-geekpete-895831-host1",
        "id": "8177120513981258077"
      },
      "provider": "gcp",
      "machine": {
        "type": "n1-standard-8"
      },
      "project": {
        "id": "elastic-support"
      }
    },
    "@timestamp": "2020-12-02T07:03:38.663Z",
    "ecs": {
      "version": "1.6.0"
    },
    "service": {
      "address": "172.17.0.9:2191",
      "type": "zookeeper"
    },
    "host": {
      "name": "128f00d03842"
    },
    "metricset": {
      "period": 10000,
      "name": "mntr"
    },
    "event": {
      "duration": 1129955,
      "module": "zookeeper",
      "dataset": "zookeeper.mntr"
    }
  },
  "fields": {
    "@timestamp": [
      "2020-12-02T07:03:38.663Z"
    ]
  },
  "sort": [
    1606892618663
  ]
}

Compared to an event where Zookeeper mntr probe is down from the port being closed where we at least get an informative error message:

   ...
    "error": {
      "message": "mntr command failed: read failed: read tcp 172.17.0.16:35210->172.17.0.4:2193: i/o timeout"
    },
  ...

The fix for this might be as easy as throwing the This ZooKeeper instance is not currently serving requests message into the error.message field to surface it.

Describe a specific use case for the enhancement or feature:

For monitoring Zookeeper with MNTR, highlighting that a node has failed quorum is really something you want to know about.

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 2, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 2, 2020
@andresrc andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Dec 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@botelastic
Copy link

botelastic bot commented Jan 27, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added the Stalled label Jan 27, 2022
@geekpete
Copy link
Member Author

Would this issue still be relevant based on work going on with #30066 ?

@botelastic botelastic bot removed the Stalled label Jan 27, 2022
@pfcoperez
Copy link
Contributor

Would this issue still be relevant based on work going on with #30066 ?

I think it would, the problems reported in #30066 do not include this issue.

@jlind23 jlind23 added Team:Service-Integrations Label for the Service Integrations team and removed Team:Services (Deprecated) Label for the former Integrations-Services team labels Mar 31, 2022
@botelastic
Copy link

botelastic bot commented Mar 31, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Mar 31, 2023
@botelastic botelastic bot closed this as completed Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Metricbeat Metricbeat module Stalled Team:Service-Integrations Label for the Service Integrations team
Projects
None yet
Development

No branches or pull requests

5 participants