-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metricbeat GCP Module panics on nil Metadata in Metricset #22494
Comments
Pinging @elastic/integrations-platforms (Team:Platforms) |
I also think this is a bug in the GCP module, I'm just not sure about the proper fix for it without losing accuracy. Metricbeat checks the sample period of the incoming metrics because Metricbeat sample period is usually not the same. Generally speaking about any MB module, it setups a 30 seconds fetch period, it fetchs the metrics in the service and it stores them in Elasticsearch. So the user gets a full picture of the service state every 30 seconds. In GCP however, it happens that the sample period they have is, for example, 5 minutes while Metricbeat sample period is 1 minute. So Metricbeat might fetch the exact same full picture 5 times, giving incorrect aggregation results later. On the other side, if Metricbeat ignores metrics without a |
Just to give more info: it seems that the sample period is not mandatory to every Stackdriver metric: https://cloud.google.com/monitoring/api/metrics#metadata The description for each metric type might include additional information, called metadata, about the metric. |
@sayden so, interestingly, if you take a look at the file, there are default if out.Metadata != nil { // <-- this is new
if out.Metadata.SamplePeriod != nil {
...
}
if out.Metadata.IngestDelay != nil {
...
}
} |
@andrewstucki maybe it's not that simple. My suggestion is to add the nil checkers but then double check that we are not receiving duplicate events from the metrics that don't have metadata and that we aren't ignoring metrics. In different words, my worry is that the fetch operation will bring duplicates or simply will omit those metrics. We have 2 scenarios:
|
@masci At least I think we should try to prioritise some mitigation. Given the conversation it seems that the "right" solution, if any may require some discussion but a panic it's a pretty bad thing. Given that we are able to detect the situation we should do some "safe" (e.g. trying no to ship incorrect data) like skipping the event (ideally with a "throttled" warning) which will always be better than a panic |
I think the only quick workaround is the one proposed by @andrewstucki - in the end, whether we're panicking or dropping, the metric wouldn't make it through elasticsearch. Do we have any facility to implement a throttled warning? |
I agree, I prefer dropping to panicking, as we might be collecting other things with the same instance. @urso do we have something to help implement some throttled warning or should we just start with something simpler? |
No, we have no support for throttling warnings/errors to logs. Something "simpler" would be better. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This appears to be coming from this line:
beats/x-pack/metricbeat/module/googlecloud/metrics/metricset.go
Line 256 in f0bad21
I believe we're dereferencing the
Metadata
field, which, according to the structure is optional and can benil
. We should throw in a guard for thenil
check.The text was updated successfully, but these errors were encountered: