Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat][elasticsearch] Use replica number in doc ID for shard metricset #33457

Merged
merged 5 commits into from
Nov 16, 2022

Conversation

miltonhultgren
Copy link
Contributor

@miltonhultgren miltonhultgren commented Oct 25, 2022

Summary

In the shard metricset, we want to report one document per shard allocation status. If you have an index with 3 shards (primaries) and 3 replicas (per primary, 9 total) you should have 12 documents that describe if the shards are assigned or not.

To achieve this we query the routing_table of the cluster state at /_cluster/state/version,nodes,master_node,routing_table and loop over each index, where each shard is an array of the primary and all the replicas for that primary:

[
 [primary, replica 1, replica 2, replica 3],
 [primary, replica 1, replica 2, replica 3],
 [primary, replica 1, replica 2, replica 3]
]

We create a custom document ID for each shard allocation with the function generateHashForEvent but in main this function doesn't account for the "id" of the replica so it only reports the first replica (ignoring the other 2 in this example).

This PR adds the array iterator to the ID if it's a replica, making sure that we generate a document for each replica of the shard.

How to test

Get the changes: git fetch [email protected]:miltonhultgren/beats.git sm-shard-allocations:sm-shard-allocations && git switch sm-shard-allocations

Start Elasticsearch, create an index like this:

PUT /some-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,  
      "number_of_replicas": 3
    }
  }
}

Run Metricbeat with xpack.enabled: true.

Run the following query to verify that you get 12 documents back, 3 for the assigned primaries and 9 for the unassigned replicas:

GET .monitoring-*/_search
{
  "size": 20,
  "_source": [
    "_id"
  ],
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "metricset.name": "shard"
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "elasticsearch.index.name": "some-index"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Such as:

    [
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:YehnOz5cS2StYQ5C8ssG1Q:some-index:s0:p",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s0:r1",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s0:r2",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s0:r3",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:YehnOz5cS2StYQ5C8ssG1Q:some-index:s1:p",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s1:r1",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s1:r2",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s1:r3",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:YehnOz5cS2StYQ5C8ssG1Q:some-index:s2:p",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s2:r1",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s2:r2",
        "_score": 0,
        "_source": {}
      },
      {
        "_index": ".monitoring-es-8-2022.10.27",
        "_id": "TyApCSpKTZWwFWCqjmTxRg:_na:some-index:s2:r3",
        "_score": 0,
        "_source": {}
      }
    ]

@miltonhultgren miltonhultgren added Metricbeat Metricbeat Module:elasticsearch Elasticsearch Beats modules Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team backport-v8.5.0 Automated backport with mergify labels Oct 25, 2022
@miltonhultgren miltonhultgren requested a review from a team as a code owner October 25, 2022 15:38
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Oct 25, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Oct 25, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-11-10T11:36:45.419+0000

  • Duration: 48 min 11 sec

Test stats 🧪

Test Results
Failed 0
Passed 3882
Skipped 884
Total 4766

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@crespocarlos
Copy link
Contributor

/test

Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I just don't understand why the CI has failed.

image

@miltonhultgren
Copy link
Contributor Author

@crespocarlos It's a legit lint error, I was also confused but you can see the errors in the code diff (or at least I can), and it's because of rules added but they only do something like "lint-staged" so this is the first time this file is changed since then so I get to fix them yay

@miltonhultgren miltonhultgren reopened this Nov 7, 2022
@miltonhultgren miltonhultgren marked this pull request as draft November 7, 2022 15:42
@miltonhultgren miltonhultgren marked this pull request as ready for review November 10, 2022 16:56
Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

@miltonhultgren miltonhultgren merged commit a5ffacc into elastic:main Nov 16, 2022
mergify bot pushed a commit that referenced this pull request Nov 16, 2022
…tricset (#33457)

* [Metricbeat][elasticsearch] Use replica number in doc ID for shard metricset

* "Fix" lint warnings

* Fix lint errors

(cherry picked from commit a5ffacc)
miltonhultgren added a commit that referenced this pull request Nov 16, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…tricset (#33457) (#33695)

* [Metricbeat][elasticsearch] Use replica number in doc ID for shard metricset

* "Fix" lint warnings

* Fix lint errors

(cherry picked from commit a5ffacc)

Co-authored-by: Milton Hultgren <[email protected]>
miltonhultgren added a commit to elastic/kibana that referenced this pull request Nov 17, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
## Summary

This PR depends on Metricbeat changes introduced in this PR
elastic/beats#33457 or the Internal collection
changes introduced in
elastic/elasticsearch#91153.

The above PR fixes a bug that makes Metricbeat properly report on all
the shards in an index, but our shard legend had the same kind of bug,
it only displayed the first replica because it didn't account for the
replica "id" in the function that tries to deduplicate the results.

Since the above PR ensures we report each shard with a unique document
ID, we change the Kibana code to use the document ID instead of trying
to regenerate a unique ID.

### How to test
Get the changes: `git fetch [email protected]:miltonhultgren/kibana.git
sm-shard-allocations:sm-shard-allocations && git switch
sm-shard-allocations`

Start Elasticsearch, create an index like this:
```
PUT /some-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,  
      "number_of_replicas": 3
    }
  }
}
```

Run Metricbeat with `xpack.enabled: true`. 

Visit the Index details page for `some-index` in the Stack Monitoring
app and verify that on a single node cluster you have 12 shards, 9 of
which are unassigned.

<img width="753" alt="Screenshot 2022-10-25 at 17 19 25"
src="https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png">

Co-authored-by: Kibana Machine <[email protected]>
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 17, 2022
…ic#143963)

## Summary

This PR depends on Metricbeat changes introduced in this PR
elastic/beats#33457 or the Internal collection
changes introduced in
elastic/elasticsearch#91153.

The above PR fixes a bug that makes Metricbeat properly report on all
the shards in an index, but our shard legend had the same kind of bug,
it only displayed the first replica because it didn't account for the
replica "id" in the function that tries to deduplicate the results.

Since the above PR ensures we report each shard with a unique document
ID, we change the Kibana code to use the document ID instead of trying
to regenerate a unique ID.

### How to test
Get the changes: `git fetch [email protected]:miltonhultgren/kibana.git
sm-shard-allocations:sm-shard-allocations && git switch
sm-shard-allocations`

Start Elasticsearch, create an index like this:
```
PUT /some-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 3
    }
  }
}
```

Run Metricbeat with `xpack.enabled: true`.

Visit the Index details page for `some-index` in the Stack Monitoring
app and verify that on a single node cluster you have 12 shards, 9 of
which are unassigned.

<img width="753" alt="Screenshot 2022-10-25 at 17 19 25"
src="https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png">

Co-authored-by: Kibana Machine <[email protected]>
(cherry picked from commit b7debc8)
kibanamachine added a commit to elastic/kibana that referenced this pull request Nov 17, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…143963) (#145520)

# Backport

This will backport the following commits from `main` to `8.6`:
- [[Stack Monitoring] Use doc ID to deduplicate shard allocations
(#143963)](#143963)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Milton
Hultgren","email":"[email protected]"},"sourceCommit":{"committedDate":"2022-11-17T09:44:58Z","message":"[Stack
Monitoring] Use doc ID to deduplicate shard allocations (#143963)\n\n##
Summary\r\n\r\nThis PR depends on Metricbeat changes introduced in this
PR\r\nhttps://github.com/elastic/beats/pull/33457 or the Internal
collection\r\nchanges introduced
in\r\nhttps://github.com/elastic/elasticsearch/pull/91153.\r\n\r\nThe
above PR fixes a bug that makes Metricbeat properly report on all\r\nthe
shards in an index, but our shard legend had the same kind of bug,\r\nit
only displayed the first replica because it didn't account for
the\r\nreplica \"id\" in the function that tries to deduplicate the
results.\r\n\r\nSince the above PR ensures we report each shard with a
unique document\r\nID, we change the Kibana code to use the document ID
instead of trying\r\nto regenerate a unique ID.\r\n\r\n### How to
test\r\nGet the changes: `git fetch
[email protected]:miltonhultgren/kibana.git\r\nsm-shard-allocations:sm-shard-allocations
&& git switch\r\nsm-shard-allocations`\r\n\r\nStart Elasticsearch,
create an index like this:\r\n```\r\nPUT /some-index\r\n{\r\n
\"settings\": {\r\n \"index\": {\r\n \"number_of_shards\": 3, \r\n
\"number_of_replicas\": 3\r\n }\r\n }\r\n}\r\n```\r\n\r\nRun Metricbeat
with `xpack.enabled: true`. \r\n\r\nVisit the Index details page for
`some-index` in the Stack Monitoring\r\napp and verify that on a single
node cluster you have 12 shards, 9 of\r\nwhich are
unassigned.\r\n\r\n<img width=\"753\" alt=\"Screenshot 2022-10-25 at 17
19
25\"\r\nsrc=\"https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png\">\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>","sha":"b7debc839ff6923923fe9a41355bd0318e04cdca","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Infra
Monitoring UI","release_note:skip","Feature:Stack
Monitoring","backport:prev-minor","v8.7.0"],"number":143963,"url":"https://github.com/elastic/kibana/pull/143963","mergeCommit":{"message":"[Stack
Monitoring] Use doc ID to deduplicate shard allocations (#143963)\n\n##
Summary\r\n\r\nThis PR depends on Metricbeat changes introduced in this
PR\r\nhttps://github.com/elastic/beats/pull/33457 or the Internal
collection\r\nchanges introduced
in\r\nhttps://github.com/elastic/elasticsearch/pull/91153.\r\n\r\nThe
above PR fixes a bug that makes Metricbeat properly report on all\r\nthe
shards in an index, but our shard legend had the same kind of bug,\r\nit
only displayed the first replica because it didn't account for
the\r\nreplica \"id\" in the function that tries to deduplicate the
results.\r\n\r\nSince the above PR ensures we report each shard with a
unique document\r\nID, we change the Kibana code to use the document ID
instead of trying\r\nto regenerate a unique ID.\r\n\r\n### How to
test\r\nGet the changes: `git fetch
[email protected]:miltonhultgren/kibana.git\r\nsm-shard-allocations:sm-shard-allocations
&& git switch\r\nsm-shard-allocations`\r\n\r\nStart Elasticsearch,
create an index like this:\r\n```\r\nPUT /some-index\r\n{\r\n
\"settings\": {\r\n \"index\": {\r\n \"number_of_shards\": 3, \r\n
\"number_of_replicas\": 3\r\n }\r\n }\r\n}\r\n```\r\n\r\nRun Metricbeat
with `xpack.enabled: true`. \r\n\r\nVisit the Index details page for
`some-index` in the Stack Monitoring\r\napp and verify that on a single
node cluster you have 12 shards, 9 of\r\nwhich are
unassigned.\r\n\r\n<img width=\"753\" alt=\"Screenshot 2022-10-25 at 17
19
25\"\r\nsrc=\"https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png\">\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>","sha":"b7debc839ff6923923fe9a41355bd0318e04cdca"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/143963","number":143963,"mergeCommit":{"message":"[Stack
Monitoring] Use doc ID to deduplicate shard allocations (#143963)\n\n##
Summary\r\n\r\nThis PR depends on Metricbeat changes introduced in this
PR\r\nhttps://github.com/elastic/beats/pull/33457 or the Internal
collection\r\nchanges introduced
in\r\nhttps://github.com/elastic/elasticsearch/pull/91153.\r\n\r\nThe
above PR fixes a bug that makes Metricbeat properly report on all\r\nthe
shards in an index, but our shard legend had the same kind of bug,\r\nit
only displayed the first replica because it didn't account for
the\r\nreplica \"id\" in the function that tries to deduplicate the
results.\r\n\r\nSince the above PR ensures we report each shard with a
unique document\r\nID, we change the Kibana code to use the document ID
instead of trying\r\nto regenerate a unique ID.\r\n\r\n### How to
test\r\nGet the changes: `git fetch
[email protected]:miltonhultgren/kibana.git\r\nsm-shard-allocations:sm-shard-allocations
&& git switch\r\nsm-shard-allocations`\r\n\r\nStart Elasticsearch,
create an index like this:\r\n```\r\nPUT /some-index\r\n{\r\n
\"settings\": {\r\n \"index\": {\r\n \"number_of_shards\": 3, \r\n
\"number_of_replicas\": 3\r\n }\r\n }\r\n}\r\n```\r\n\r\nRun Metricbeat
with `xpack.enabled: true`. \r\n\r\nVisit the Index details page for
`some-index` in the Stack Monitoring\r\napp and verify that on a single
node cluster you have 12 shards, 9 of\r\nwhich are
unassigned.\r\n\r\n<img width=\"753\" alt=\"Screenshot 2022-10-25 at 17
19
25\"\r\nsrc=\"https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png\">\r\n\r\nCo-authored-by:
Kibana Machine
<[email protected]>","sha":"b7debc839ff6923923fe9a41355bd0318e04cdca"}}]}]
BACKPORT-->

Co-authored-by: Milton Hultgren <[email protected]>
chrisberkhout pushed a commit that referenced this pull request Jun 1, 2023
…tricset (#33457)

* [Metricbeat][elasticsearch] Use replica number in doc ID for shard metricset

* "Fix" lint warnings

* Fix lint errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.5.0 Automated backport with mergify Metricbeat Metricbeat Module:elasticsearch Elasticsearch Beats modules Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants