-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unintended skip in metric collection on Azure Monitor #37203
Fix unintended skip in metric collection on Azure Monitor #37203
Conversation
Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ```
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
❕ Build Aborted
Expand to view the summary
Build stats
Test stats 🧪
🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
💚 Flaky test reportTests succeeded. 🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM.
❕ Build Aborted
Expand to view the summary
Build stats
🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
❕ Build Aborted
Expand to view the summary
Build stats
🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
❕ Build Aborted
Expand to view the summary
Build stats
🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
💚 Flaky test reportTests succeeded. 🤖 GitHub commentsExpand to view the GitHub comments
To re-run your PR in the CI, just comment with:
|
* Fix unintended skip in metric collection Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ``` --------- Co-authored-by: Richa Talwar <[email protected]> (cherry picked from commit 110cc31)
…37222) * Fix unintended skip in metric collection Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ``` --------- Co-authored-by: Richa Talwar <[email protected]> (cherry picked from commit 110cc31) Co-authored-by: Maurizio Branca <[email protected]>
) * Fix unintended skip in metric collection Due to the use of `time.Now()` to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection. Made two changes to the Azure Monitor metricset: - moved `referenceTime` into the `Fetch()` and truncated its value to seconds to have a more predictable reference time for comparison. - Updated the `MetricRegistry.NeedsUpdate()` method to use `referenceTime` vs. using "now" to compare with the time grain duration. Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s. I am also adding some structured logging messages to track registry decisions at the debug log level. Here's how to parse the structured logs to get a nice table view: ```shell $ cat metricbeat.log.json | grep "MetricRegistry" | jq -r '[.key, .needs_update, .reference_time, .now, .time_grain_start_time//"n/a", .last_collection_at//"n/a"] | @TSV' fdd3a07a3cabd90233c083950a4bc30c true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:30.967Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6ee8809577a906538473e3e5e98dc893 true 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.257Z 2023-11-26T15:46:30.000Z 2023-11-26T15:46:30.000Z 6aedb7dffafbfe9ca19e0aa01436d30a false 2023-11-26T15:51:30.000Z 2023-11-26T15:51:35.757Z 2023-11-26T15:46:30.000Z 2023-11-26T15:48:30.000Z ``` --------- Co-authored-by: Richa Talwar <[email protected]>
Proposed commit message
Due to the use of
time.Now()
to set the value of the reference time used to decide if collect the value of the metric, the metricset may skip a metric collection.Made two changes to the Azure Monitor metricset:
referenceTime
into theFetch()
and truncated its value to seconds to have a more predictable reference time for comparison.MetricRegistry.NeedsUpdate()
method to usereferenceTime
vs. using "now" to compare with the time grain duration.Current tests seem fine, with PT5M time grain and collection periods of 300s and 60s.
I am also adding some structured logging messages to track registry decisions at the debug log level.
Here's how to parse the structured logs to get a nice table view:
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Here's the compute vm dashboard running a custom agent build from this PR:
Logs