Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat] gcp: fix instance machineType reporting #27363

Merged
merged 3 commits into from
Aug 17, 2021

Conversation

endorama
Copy link
Member

@endorama endorama commented Aug 13, 2021

What does this PR do?

Refresh instance metadata for each incoming response by updating the s.computeMetadata field in the metadataCollector struct.

Why is it important?

The code was applying a sort of caching to the computeMetadata, but in the wrong place. Most probably it was a leftover, as a proper caching for instance metadata is implemented in the instance function on line 146.
Due to this check computeMetadata were not refreshed for each response.
metadataCollector is instanciated only once for the entire module.
Thus by not overriding computeMetadata the metadata of the first response handled was used for all subsequent events.

Removing the check make the code update the s.computeMetadata property for each new response. With this update the correct metadata based on the instance ID reported in the response are used.

Fixes a bug were all metrics were being reported with the same machineType.

NOTE: the code update the same struct field at each iteration; this is most probably not safe in a parallel execution context; this code is not running multiple times in different threads so for the moment this change should not have negative side effects.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
    I would like to add an integration test (as I have been able to reproduce it in a pristine environment)
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

@endorama endorama added bug Team:Integrations Label for the Integrations team backport-v7.14.0 Automated backport with mergify labels Aug 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 13, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 13, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-08-17T08:55:07.286+0000

  • Duration: 68 min 56 sec

  • Commit: 5df3115

Test stats 🧪

Test Results
Failed 0
Passed 2648
Skipped 232
Total 2880

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 2648
Skipped 232
Total 2880

@endorama endorama requested a review from sayden August 13, 2021 13:37
@endorama endorama self-assigned this Aug 13, 2021
@endorama endorama added the backport-v7.15.0 Automated backport with mergify label Aug 16, 2021
@mergify
Copy link
Contributor

mergify bot commented Aug 16, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b endorama/gcp-machineType-issue upstream/endorama/gcp-machineType-issue
git merge upstream/master
git push upstream endorama/gcp-machineType-issue

Refresh instance metadata for each incoming response by updating the
`s.computeMetadata` field in the `metadataCollector` struct.

The code was applying a sort of caching to the computeMetadata, but in
the wrong place. Most probably it was a leftover, as a proper caching
for instance metadata is implemented in the `instance` function on line
146.
Due to this check `computeMetadata` were not refreshed for each
response.
`metadataCollector` is instanciated only once for the entire module.
Thus by not overriding `computeMetadata` the metadata of the first
response handled was used for all subsequent events.

Removing the check make the code update the `s.computeMetadata` property
for each new response. With this update the correct metadata based on
the instance ID reported in the response are used.

Fixes a bug were all metrics were being reported with the same
`machineType`.

NOTE: the code update the same struct field at each iteration; this is
most probably not safe in a parallel execution context; this code is not
running multiple times in different threads so for the moment this
change should not have negative side effects.
@endorama endorama force-pushed the endorama/gcp-machineType-issue branch from 5cb8a59 to 5df3115 Compare August 17, 2021 08:54
Copy link
Contributor

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, just modify the Changelog in case any user reads it looking for reasons to update their metricbeat

@@ -407,6 +407,7 @@ https://github.com/elastic/beats/compare/v7.0.0-alpha2...master[Check the HEAD d
- Fix cloudwatch metricset collecting duplicate data points. {pull}27248[27248]
- Fix flaky test TestAddCounterInvalidArgWhenQueryClosed. {issue}27312[27312] {pull}27313[27313]
- Add percent formatters to system/process {pull}27374[27374]
- Fix instance machineType reporting {pull}27363[27363]
Copy link
Contributor

@sayden sayden Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add here "In compute metricset of GCP module" (for example)

@endorama
Copy link
Member Author

Merging without waiting for CI as I've only updated the Changelog description and CI was green at previous commit.

@endorama endorama merged commit 5339327 into elastic:master Aug 17, 2021
@endorama endorama deleted the endorama/gcp-machineType-issue branch August 17, 2021 11:41
mergify bot pushed a commit that referenced this pull request Aug 17, 2021
Refresh instance metadata for each incoming response by updating the
`s.computeMetadata` field in the `metadataCollector` struct.

The code was applying a sort of caching to the computeMetadata, but in
the wrong place. Most probably it was a leftover, as a proper caching
for instance metadata is implemented in the `instance` function on line
146.
Due to this check `computeMetadata` were not refreshed for each
response.
`metadataCollector` is instanciated only once for the entire module.
Thus by not overriding `computeMetadata` the metadata of the first
response handled was used for all subsequent events.

Removing the check make the code update the `s.computeMetadata` property
for each new response. With this update the correct metadata based on
the instance ID reported in the response are used.

Fixes a bug were all metrics were being reported with the same
`machineType`.

NOTE: the code update the same struct field at each iteration; this is
most probably not safe in a parallel execution context; this code is not
running multiple times in different threads so for the moment this
change should not have negative side effects.

(cherry picked from commit 5339327)
@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-08-17T11:40:20.032+0000

  • Duration: 3 min 39 sec

  • Commit: 1eeee7a

Trends 🧪

Image of Build Times

Steps errors 2

Expand to view the steps failures

Git fetch
  • Took 0 min 3 sec . View more details on here
  • Description: git fetch https://${GIT_USERNAME}:${GIT_PASSWORD}@github.com/elastic/beats.git +refs/pull/*/head:refs/remotes/origin/pr/* > fetch.log 2>&1
Archive the artifacts
  • Took 0 min 1 sec . View more details on here
  • Description: fetch.log

Log output

Expand to view the last 100 lines of log output

[2021-08-17T11:40:22.363Z]  > git --version # 'git version 2.17.1'
[2021-08-17T11:40:22.363Z]  > git config --get remote.origin.url # timeout=10
[2021-08-17T11:40:22.369Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-08-17T11:40:22.377Z]  > git fetch --tags --progress -- origin +refs/heads/*:refs/remotes/origin/* # timeout=10
[2021-08-17T11:40:23.031Z]  > git rev-parse current^{commit} # timeout=10
[2021-08-17T11:40:23.038Z]  > git branch -a -v --no-abbrev --contains d11dc1f0af4927b4f3a9d92a4bff0584b7734b6b # timeout=10
[2021-08-17T11:40:23.079Z] Selected match: update-go-version-20210809032825-master revision d11dc1f0af4927b4f3a9d92a4bff0584b7734b6b
[2021-08-17T11:40:23.079Z] The recommended git tool is: git
[2021-08-17T11:40:23.080Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2021-08-17T11:40:23.087Z]  > git rev-parse --resolve-git-dir /var/lib/jenkins/workspace/Beats_beats_PR-27363@libs/apm/.git # timeout=10
[2021-08-17T11:40:23.094Z] Fetching changes from the remote Git repository
[2021-08-17T11:40:23.094Z]  > git config remote.origin.url [email protected]:elastic/apm-pipeline-library.git # timeout=10
[2021-08-17T11:40:23.102Z] Fetching without tags
[2021-08-17T11:40:23.102Z] Fetching upstream changes from [email protected]:elastic/apm-pipeline-library.git
[2021-08-17T11:40:23.103Z]  > git --version # timeout=10
[2021-08-17T11:40:23.109Z]  > git --version # 'git version 2.17.1'
[2021-08-17T11:40:23.109Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-08-17T11:40:23.116Z]  > git fetch --no-tags --progress -- [email protected]:elastic/apm-pipeline-library.git +refs/heads/*:refs/remotes/origin/* # timeout=10
[2021-08-17T11:40:23.672Z] Checking out Revision d11dc1f0af4927b4f3a9d92a4bff0584b7734b6b (update-go-version-20210809032825-master)
[2021-08-17T11:40:23.672Z]  > git config core.sparsecheckout # timeout=10
[2021-08-17T11:40:23.679Z]  > git checkout -f d11dc1f0af4927b4f3a9d92a4bff0584b7734b6b # timeout=10
[2021-08-17T11:40:23.705Z] Commit message: "docs: update CHANGELOG.md"
[2021-08-17T11:40:24.270Z] Excluding src/test/ from checkout of git [email protected]:elastic/apm-pipeline-library.git so that shared library test code cannot be accessed by Pipelines.
[2021-08-17T11:40:24.270Z] To remove this log message, move the test code outside of src/. To restore the previous behavior that allowed access to files in src/test/, pass -Dorg.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.INCLUDE_SRC_TEST_IN_LIBRARIES=true to the java command used to start Jenkins.
[2021-08-17T11:40:55.757Z] Still waiting to schedule task
[2021-08-17T11:40:55.757Z] All nodes of label ‘ubuntu-18&&immutable’ are offline
[2021-08-17T11:41:49.545Z] Running on beats-ci-immutable-ubuntu-1804-1629200448440720247 in /var/lib/jenkins/workspace/Beats_beats_PR-27363
[2021-08-17T11:41:55.685Z] �[39;49m[INFO] Override default checkout�[0m
[2021-08-17T11:41:55.739Z] Sleeping for 10 sec
[2021-08-17T11:42:05.910Z] The recommended git tool is: git
[2021-08-17T11:42:11.162Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2021-08-17T11:42:11.169Z] Wiping out workspace first.
[2021-08-17T11:42:11.236Z] Cloning the remote Git repository
[2021-08-17T11:42:11.237Z] Using shallow clone with depth 10
[2021-08-17T11:42:11.237Z] Avoid fetching tags
[2021-08-17T11:42:11.296Z] Cloning repository [email protected]:elastic/beats.git
[2021-08-17T11:42:11.343Z]  > git init /var/lib/jenkins/workspace/Beats_beats_PR-27363 # timeout=10
[2021-08-17T11:42:11.464Z] Fetching upstream changes from [email protected]:elastic/beats.git
[2021-08-17T11:42:11.464Z]  > git --version # timeout=10
[2021-08-17T11:42:11.468Z]  > git --version # 'git version 2.17.1'
[2021-08-17T11:42:11.469Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-08-17T11:42:11.537Z]  > git fetch --no-tags --progress -- [email protected]:elastic/beats.git +refs/heads/*:refs/remotes/origin/* # timeout=15
[2021-08-17T11:42:34.146Z] Cleaning workspace
[2021-08-17T11:42:34.161Z] Using shallow fetch with depth 10
[2021-08-17T11:42:34.161Z] Pruning obsolete local branches
[2021-08-17T11:42:34.127Z]  > git config remote.origin.url [email protected]:elastic/beats.git # timeout=10
[2021-08-17T11:42:34.131Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2021-08-17T11:42:34.140Z]  > git config remote.origin.url [email protected]:elastic/beats.git # timeout=10
[2021-08-17T11:42:34.148Z]  > git rev-parse --verify HEAD # timeout=10
[2021-08-17T11:42:34.152Z] No valid HEAD. Skipping the resetting
[2021-08-17T11:42:34.152Z]  > git clean -fdx # timeout=10
[2021-08-17T11:42:34.165Z] Fetching upstream changes from [email protected]:elastic/beats.git
[2021-08-17T11:42:34.165Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-08-17T11:42:34.169Z]  > git fetch --no-tags --progress --prune -- [email protected]:elastic/beats.git +refs/pull/27363/head:refs/remotes/origin/PR-27363 +refs/heads/master:refs/remotes/origin/master # timeout=15
[2021-08-17T11:42:35.073Z] Merging remotes/origin/master commit 928f9c549dfc69bc44cba66ae56fbc0827853316 into PR head commit 1eeee7a46148bfb0dd15a921a12b1cec794d77bb
[2021-08-17T11:42:36.834Z] Merge succeeded, producing 1eeee7a46148bfb0dd15a921a12b1cec794d77bb
[2021-08-17T11:42:36.834Z] Checking out Revision 1eeee7a46148bfb0dd15a921a12b1cec794d77bb (PR-27363)
[2021-08-17T11:42:35.079Z]  > git config core.sparsecheckout # timeout=10
[2021-08-17T11:42:35.083Z]  > git checkout -f 1eeee7a46148bfb0dd15a921a12b1cec794d77bb # timeout=15
[2021-08-17T11:42:36.801Z]  > git remote # timeout=10
[2021-08-17T11:42:36.808Z]  > git config --get remote.origin.url # timeout=10
[2021-08-17T11:42:36.812Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2021-08-17T11:42:36.816Z]  > git merge 928f9c549dfc69bc44cba66ae56fbc0827853316 # timeout=10
[2021-08-17T11:42:36.827Z]  > git rev-parse HEAD^{commit} # timeout=10
[2021-08-17T11:42:36.836Z]  > git config core.sparsecheckout # timeout=10
[2021-08-17T11:42:36.840Z]  > git checkout -f 1eeee7a46148bfb0dd15a921a12b1cec794d77bb # timeout=15
[2021-08-17T11:42:42.455Z] Commit message: "improve changelog description"
[2021-08-17T11:42:42.458Z]  > git rev-list --no-walk 5df311564b4a7d1453dc13cffd0e127f44f352f8 # timeout=10
[2021-08-17T11:42:42.526Z] Cleaning workspace
[2021-08-17T11:42:42.796Z] Timeout set to expire in 4 hr 0 min
[2021-08-17T11:42:42.814Z] The timestamps step is unnecessary when timestamps are enabled for all Pipeline builds.
[2021-08-17T11:42:43.051Z] [INFO] Number of builds to be searched 10
[2021-08-17T11:42:43.821Z] [INFO] 'shallow' is forced to be disabled when running on PullRequests
[2021-08-17T11:42:43.837Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-27363@tmp
[2021-08-17T11:42:43.854Z] [INFO] gitCheckout: Checkout SCM PR-27363 with default customisation from the Item.
[2021-08-17T11:42:43.887Z] [INFO] Override default checkout
[2021-08-17T11:42:43.926Z] Sleeping for 10 sec
[2021-08-17T11:42:42.527Z]  > git rev-parse --verify HEAD # timeout=10
[2021-08-17T11:42:42.530Z] Resetting working tree
[2021-08-17T11:42:42.530Z]  > git reset --hard # timeout=10
[2021-08-17T11:42:42.615Z]  > git clean -fdx # timeout=10
[2021-08-17T11:42:54.395Z] Masking supported pattern matches of $GIT_USERNAME or $GIT_PASSWORD
[2021-08-17T11:42:57.038Z] + git fetch https://****:****@github.com/elastic/beats.git +refs/pull/*/head:refs/remotes/origin/pr/*
[2021-08-17T11:42:57.175Z] [WARN] gitCmd failed, further details in the archived file 'fetch.log'
[2021-08-17T11:42:57.288Z] Archiving artifacts
[2021-08-17T11:42:58.109Z] Stage "Lint" skipped due to earlier failure(s)
[2021-08-17T11:42:58.138Z] Stage "Build&Test" skipped due to earlier failure(s)
[2021-08-17T11:42:58.171Z] Stage "Extended" skipped due to earlier failure(s)
[2021-08-17T11:42:58.199Z] Stage "Packaging" skipped due to earlier failure(s)
[2021-08-17T11:42:58.228Z] Stage "Packaging-Pipeline" skipped due to earlier failure(s)
[2021-08-17T11:42:58.277Z] Running in /var/lib/jenkins/workspace/Beats_beats_PR-27363/src/github.com/elastic/beats
[2021-08-17T11:42:58.629Z] Running on Jenkins in /var/lib/jenkins/workspace/Beats_beats_PR-27363
[2021-08-17T11:42:58.851Z] [INFO] getVaultSecret: Getting secrets
[2021-08-17T11:42:58.890Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2021-08-17T11:42:59.740Z] + chmod 755 generate-build-data.sh
[2021-08-17T11:42:59.740Z] + ./generate-build-data.sh https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-27363/ https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-27363/runs/4 FAILURE 159446
[2021-08-17T11:42:59.740Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-27363/runs/4/steps/?limit=10000 -o steps-info.json
[2021-08-17T11:42:59.740Z] INFO: curl https://beats-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/Beats/beats/PR-27363/runs/4/tests/?status=FAILED -o tests-errors.json
[2021-08-17T11:42:59.740Z] Retry 1/3 exited 22, retrying in 1 seconds...
[2021-08-17T11:43:01.084Z] Retry 2/3 exited 22, retrying in 2 seconds...

❕ Flaky test report

No test was executed to be analysed.

mergify bot pushed a commit that referenced this pull request Aug 17, 2021
Refresh instance metadata for each incoming response by updating the
`s.computeMetadata` field in the `metadataCollector` struct.

The code was applying a sort of caching to the computeMetadata, but in
the wrong place. Most probably it was a leftover, as a proper caching
for instance metadata is implemented in the `instance` function on line
146.
Due to this check `computeMetadata` were not refreshed for each
response.
`metadataCollector` is instanciated only once for the entire module.
Thus by not overriding `computeMetadata` the metadata of the first
response handled was used for all subsequent events.

Removing the check make the code update the `s.computeMetadata` property
for each new response. With this update the correct metadata based on
the instance ID reported in the response are used.

Fixes a bug were all metrics were being reported with the same
`machineType`.

NOTE: the code update the same struct field at each iteration; this is
most probably not safe in a parallel execution context; this code is not
running multiple times in different threads so for the moment this
change should not have negative side effects.

(cherry picked from commit 5339327)
endorama added a commit that referenced this pull request Aug 17, 2021
…orting (#27419)

Refresh instance metadata for each incoming response by updating the
`s.computeMetadata` field in the `metadataCollector` struct.

The code was applying a sort of caching to the computeMetadata, but in
the wrong place. Most probably it was a leftover, as a proper caching
for instance metadata is implemented in the `instance` function on line
146.
Due to this check `computeMetadata` were not refreshed for each
response.
`metadataCollector` is instanciated only once for the entire module.
Thus by not overriding `computeMetadata` the metadata of the first
response handled was used for all subsequent events.

Removing the check make the code update the `s.computeMetadata` property
for each new response. With this update the correct metadata based on
the instance ID reported in the response are used.

Fixes a bug were all metrics were being reported with the same
`machineType`.

NOTE: the code update the same struct field at each iteration; this is
most probably not safe in a parallel execution context; this code is not
running multiple times in different threads so for the moment this
change should not have negative side effects.

(cherry picked from commit 5339327)

Co-authored-by: endorama <[email protected]>
endorama added a commit that referenced this pull request Aug 17, 2021
…porting (#27420)

Refresh instance metadata for each incoming response by updating the
`s.computeMetadata` field in the `metadataCollector` struct.

The code was applying a sort of caching to the computeMetadata, but in
the wrong place. Most probably it was a leftover, as a proper caching
for instance metadata is implemented in the `instance` function on line
146.
Due to this check `computeMetadata` were not refreshed for each
response.
`metadataCollector` is instanciated only once for the entire module.
Thus by not overriding `computeMetadata` the metadata of the first
response handled was used for all subsequent events.

Removing the check make the code update the `s.computeMetadata` property
for each new response. With this update the correct metadata based on
the instance ID reported in the response are used.

Fixes a bug were all metrics were being reported with the same
`machineType`.

NOTE: the code update the same struct field at each iteration; this is
most probably not safe in a parallel execution context; this code is not
running multiple times in different threads so for the moment this
change should not have negative side effects.

(cherry picked from commit 5339327)

Co-authored-by: endorama <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.14.0 Automated backport with mergify backport-v7.15.0 Automated backport with mergify bug Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants