Fix unintended skip in metric collection on Azure Monitor #37203

zmoog · 2023-11-26T16:11:39Z

Proposed commit message

Due to the use of time.Now() to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection.

Made two changes to the Azure Monitor metricset:

moved referenceTime into the Fetch() and truncated its value to seconds to have a more predictable reference time for comparison.
Updated the MetricRegistry.NeedsUpdate() method to use referenceTime vs. using "now" to compare with the time grain duration.

Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s.

I am also adding some structured logging messages to track registry decisions at the debug log level.

Here's how to parse the structured logs to get a nice table view:

$ cat metricbeat.log.json | grep "MetricRegistry" | jq -r  '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @tsv'
fdd3a07a3cabd90233c083950a4bc30c        true    2023-11-26T15:51:30.000Z        2023-11-26T15:51:30.967Z        2023-11-26T15:46:30.000Z        2023-11-26T15:46:30.000Z
6ee8809577a906538473e3e5e98dc893        true    2023-11-26T15:51:30.000Z        2023-11-26T15:51:35.257Z        2023-11-26T15:46:30.000Z        2023-11-26T15:46:30.000Z
6aedb7dffafbfe9ca19e0aa01436d30a        false   2023-11-26T15:51:30.000Z        2023-11-26T15:51:35.757Z        2023-11-26T15:46:30.000Z        2023-11-26T15:48:30.000Z

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes Azure Monitor may skip a metric collection depending on the timing #37204

Use cases

Screenshots

Here's the compute vm dashboard running a custom agent build from this PR:

Logs

@TSV

Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ```

mergify · 2023-11-26T16:12:15Z

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @zmoog? 🙏.
For such, you'll need to label your PR with:

The upcoming major version of the Elastic Stack
The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

elasticmachine · 2023-11-26T17:07:41Z

💚 Build Succeeded

Expand to view the summary

Build stats

Start Time: 2023-11-26T16:11:54.386+0000
Duration: 56 min 39 sec

Test stats 🧪

Test	Results
Failed	0
Passed	1518
Skipped	96
Total	1614

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Remove leftovers

elasticmachine · 2023-11-26T17:39:33Z

❕ Build Aborted

Expand to view the summary

Build stats

Start Time: 2023-11-26T17:28:27.428+0000
Duration: 11 min 1 sec

Test stats 🧪

Test	Results
Failed	0
Passed	3
Skipped	0
Total	3

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine · 2023-11-26T18:26:58Z

💚 Build Succeeded

Expand to view the summary

Build stats

Start Time: 2023-11-26T17:35:12.068+0000
Duration: 52 min 39 sec

Test stats 🧪

Test	Results
Failed	0
Passed	1518
Skipped	96
Total	1614

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

x-pack/metricbeat/module/azure/azure.go

tommyers-elastic

LGTM

lucian-ioan

Changes LGTM.

x-pack/metricbeat/module/azure/client.go

Co-authored-by: Richa Talwar <[email protected]>

elasticmachine · 2023-11-28T17:45:52Z

❕ Build Aborted

Expand to view the summary

Build stats

Duration: 7 min 12 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine · 2023-11-28T17:48:34Z

❕ Build Aborted

Expand to view the summary

Build stats

Duration: 7 min 25 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine · 2023-11-28T17:54:22Z

❕ Build Aborted

Expand to view the summary

Build stats

Duration: 11 min 33 sec

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine · 2023-11-28T18:42:10Z

💚 Build Succeeded

Expand to view the summary

Build stats

Start Time: 2023-11-28T17:49:27.159+0000
Duration: 53 min 36 sec

Test stats 🧪

Test	Results
Failed	0
Passed	1518
Skipped	96
Total	1614

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@TSV

* Fix unintended skip in metric collection Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ``` --------- Co-authored-by: Richa Talwar <[email protected]> (cherry picked from commit 110cc31)

@TSV

…37222) * Fix unintended skip in metric collection Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ``` --------- Co-authored-by: Richa Talwar <[email protected]> (cherry picked from commit 110cc31) Co-authored-by: Maurizio Branca <[email protected]>

@TSV

) * Fix unintended skip in metric collection Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ``` --------- Co-authored-by: Richa Talwar <[email protected]>

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 26, 2023

zmoog changed the title ~~Fix unintended skip in metric collection~~ Fix unintended skip in metric collection on Azure Monitor Nov 26, 2023

mergify bot assigned zmoog Nov 26, 2023

zmoog added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Nov 26, 2023

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 26, 2023

zmoog mentioned this pull request Nov 26, 2023

Azure Monitor may skip a metric collection depending on the timing #37204

Closed

zmoog added 2 commits November 26, 2023 18:27

Clarify reference time

32f90ac

Clean up

2f4a75f

Remove leftovers

lalit-satapathy requested review from lucian-ioan and ritalwar November 27, 2023 07:51

tommyers-elastic reviewed Nov 27, 2023

View reviewed changes

x-pack/metricbeat/module/azure/azure.go Outdated Show resolved Hide resolved

tommyers-elastic approved these changes Nov 27, 2023

View reviewed changes

zmoog marked this pull request as ready for review November 28, 2023 09:50

zmoog requested a review from a team as a code owner November 28, 2023 09:50

lucian-ioan approved these changes Nov 28, 2023

View reviewed changes

ritalwar reviewed Nov 28, 2023

View reviewed changes

x-pack/metricbeat/module/azure/client.go Outdated Show resolved Hide resolved

x-pack/metricbeat/module/azure/client.go Outdated Show resolved Hide resolved

ritalwar approved these changes Nov 28, 2023

View reviewed changes

zmoog and others added 3 commits November 28, 2023 18:38

Update x-pack/metricbeat/module/azure/client.go

b2eb769

Co-authored-by: Richa Talwar <[email protected]>

Cleanup: eliminate the commented-out code

b736378

Clarify the initial value for reference time

3063e24

Update changelog

dc6c743

zmoog merged commit 110cc31 into elastic:main Nov 28, 2023
21 checks passed

zmoog deleted the zmoog/fix-unintended-skip-collection-on-azure-metrics branch November 28, 2023 21:16

zmoog added the backport-v8.11.0 Automated backport with mergify label Nov 28, 2023

mergify bot mentioned this pull request Nov 28, 2023

[8.11](backport #37203) Fix unintended skip in metric collection on Azure Monitor #37222

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unintended skip in metric collection on Azure Monitor #37203

Fix unintended skip in metric collection on Azure Monitor #37203

zmoog commented Nov 26, 2023 •

edited

Loading

mergify bot commented Nov 26, 2023

elasticmachine commented Nov 26, 2023

Build stats

Test stats 🧪

elasticmachine commented Nov 26, 2023

Build stats

Test stats 🧪

elasticmachine commented Nov 26, 2023

Build stats

Test stats 🧪

tommyers-elastic left a comment

lucian-ioan left a comment

elasticmachine commented Nov 28, 2023

Build stats

elasticmachine commented Nov 28, 2023

Build stats

elasticmachine commented Nov 28, 2023

Build stats

elasticmachine commented Nov 28, 2023

Build stats

Test stats 🧪

Fix unintended skip in metric collection on Azure Monitor #37203

Fix unintended skip in metric collection on Azure Monitor #37203

Conversation

zmoog commented Nov 26, 2023 • edited Loading

Proposed commit message

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

mergify bot commented Nov 26, 2023

elasticmachine commented Nov 26, 2023

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

elasticmachine commented Nov 26, 2023

❕ Build Aborted

Build stats

Test stats 🧪

🤖 GitHub comments

elasticmachine commented Nov 26, 2023

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

tommyers-elastic left a comment

Choose a reason for hiding this comment

lucian-ioan left a comment

Choose a reason for hiding this comment

elasticmachine commented Nov 28, 2023

❕ Build Aborted

Build stats

🤖 GitHub comments

elasticmachine commented Nov 28, 2023

❕ Build Aborted

Build stats

🤖 GitHub comments

elasticmachine commented Nov 28, 2023

❕ Build Aborted

Build stats

🤖 GitHub comments

elasticmachine commented Nov 28, 2023

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

zmoog commented Nov 26, 2023 •

edited

Loading