-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metricbeat] Add Google Cloud Platform module #14829
[Metricbeat] Add Google Cloud Platform module #14829
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass through the code, I haven't found anything serious but I think that some things would need to be polished. Thanks!
SERVICE_COMPUTE = "compute" | ||
SERVICE_PUBSUB = "pubsub" | ||
SERVICE_FIRESTORE = "firestore" | ||
SERVICE_STORAGE = "storage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, please use camel case for these constants
x-pack/metricbeat/module/googlecloud/stackdriver/metrics_requester.go
Outdated
Show resolved
Hide resolved
- firebase | ||
- storage | ||
- loadbalancing | ||
zone: "your zone" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we specify more than one zone here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. We can't specify it right now. The idea was to maintain first version as simple as possible. It's actually possible to request all metrics for a project without zone filter or even request various zones but we are moving slow yet and see how it goes because the code to request metrics and convert them using lightweight modules is pretty complex already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps it would be interesting to put a real zone here so things won't fail if they start the module out of the box?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer that things do fail explicitly so that a user with machines in Europe that runs Metricbeat will have an specific error saying zone "your zone" not found
instead of silent errors of simply not sending any event because there are no machines in that zone/region which may lead to think that Metricbeat is not working properly (it's your fault because you didn't set the correct zone, but that's implicit)
"id": "elastic-metricbeat" | ||
}, | ||
"provider": "googlecloud", | ||
"instance": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a separate API to get more info for each compute instance? For example the machine type, status and etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes! And I'm actually using it already but I completely forgot to attach machine type too! Thanks for the heads up!
@sayden this will need to be backported to 7.x. |
Includes Stackdriver and Compute Metricset # Conflicts: # NOTICE.txt # vendor/vendor.json
I just saw this doesn't have a changelog, could you add it in a different PR? |
as 7.6 branch was already created, could you also add another backport to that one? |
Includes Stackdriver and Compute Metricset (cherry picked from commit 8be7745) # Conflicts: # NOTICE.txt # vendor/vendor.json
@sayden I just started testing this PR with compute metricset. Curious, why we have metrics from the same instance but in different events? I also see that you have metrics separated into cpu, disk, firewall and etc in https://github.com/elastic/beats/tree/master/x-pack/metricbeat/module/googlecloud/compute/_meta. Why they are not in the same event/metric from the same instance? |
Bug found during testing: #15613 |
Missing exported field in documentation for compute metricset: #15776 |
enhancement request(no need for 7.6) for adding regions as a config parameter: #15780 |
potential sensitive data in labels.metadata: #15782 |
ONGOING work on docs bust most code is ready to go.
Seed PR for the Google Cloud Platform module for Metricbeat.
It includes the following:
Ignore the following Metricsets which are already included in the PR for testing purposes but they are not going to be merged yet (they'll be removed before merging):
Some vocabulary for people new to Google Cloud
You can find some translations for GCP services in AWS:
Labels / Metadata
You'll see lots of mentions to Metadata inside the code. This refers to two different entities within GCP: labels and metadata. For Elasticsearch purposes both can be considered metadata so whenever you read "label" or "metadata" it's going to be treated as the same thing at the end of the pipeline.
Grouping of events
The way that GCP labels metrics is somehow complex to generate "service based events". They export their metrics individually so you don't request "compute metrics" or "metrics of this compute instance" but instead you have to request "give all cpu_utilization values of compute instances" so a single response will bring one or more values per instance for a specified timeframe for all your instances. That's a single response.
For example, a request for CPU utilization can return (in pseudocode):
Then, a new call must be done to (in this example it will be Compute API) to request Instance metadata (like working group, network group, user labels or user metadata which is associated only to the instance and not to a particular metrics like CPU). Then you get data like this (again, in pseudocode)
At the end, both response for that particular metric must be grouped into a single event that share some common metadata. For compute this includes instance_id and availability zone apart from timestamp. Each service requires an specifici implementation to get non-stackdriver metadata. The service metadata implementation is only developed for Compute at the moment and can be seen in
googlecloud/stackdriver/compute
, the rest of the services uses only metadata provided by Stackdriver.ECS
Metadata returned from Stackdriver is ECS compliant for Compute metadata (mainly availability zone, account id and cloud provider, instance id and instance name). Some of the metadata might be written out of the ECS fields. More deployment configurations plus testing is needed find them all.
Modules
All services from https://cloud.google.com/monitoring/api/metrics_gcp can be added as more configuration. Tests until now shows no problem but their specific metadata must be developed separatedly for each of them.
Limitations
You cannot set period under 300s (you can right now, but it won't return any metric). I think it's some kind of limitation of Stackdriver because their metrics are sampled each 60 to 300 seconds.
Happy reviewing :)
Sorry for the big PR, it was impossible to make it smaller