Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add citadel metricset for Istio Metricbeat module #15990

Merged
merged 4 commits into from
Feb 3, 2020

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Jan 31, 2020

This PR adds citadel metricset for istio Module.

The metrics will be collected from the respective Prometheus exporter endpoint:
istio-citadel.istio-system:15014: The istio-citadel job returns all Citadel-generated metrics.

Part of: #15505

Manual testing

  1. Build or download x-pack metricbeat executable
  2. Spin up a Kubernetes cluster with Minikube
  3. Install ISTIO https://istio.io/docs/setup/install/
  4. Expose the istio-citadel.istio-system:15014 service to working machine, using kubectl port-forward, kubectl -n istio-system port-forward svc/istio-citadel 15014:15014
  5. Enable the istio module, configure it in metricbeat.yaml and start monitoring ISTIO citadel

@ChrsMark ChrsMark added enhancement review Metricbeat Metricbeat needs_backport PR is waiting to be backported to other branches. [zube]: In Review v7.7.0 labels Jan 31, 2020
@ChrsMark ChrsMark requested a review from a team as a code owner January 31, 2020 12:11
@ChrsMark ChrsMark self-assigned this Jan 31, 2020
@ChrsMark ChrsMark mentioned this pull request Jan 31, 2020
11 tasks
@ChrsMark ChrsMark added Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team labels Feb 3, 2020
description: >
The type of the respective grpc service

- name: secret_controller_svc_acc_created_cert_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be secret_controller_svc_acc_created_cert.count to follow naming conventions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for mentioning. Will change it.

type: long
description: >
The number of certificates created due to service account creation.
- name: server_root_cert_expiry_timestamp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call it server_root_cert_expiry_seconds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for suggesting. Will change it.

description: >
Total number of RPCs started on the server.

- name: grpc.server.handling.latency.ms.bucket.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things here:

  • Usually the unit comes at the end of the field, it's tricky in this case because of the wildcard anyways: grpc.server.handling.latency.bucket.ms.*
  • I wonder if we should store this operations instead of "doing" them in Kibana.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is common pattern so far for this kind of Metrics (Prometheus histograms). See

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thank you for explaining :)

object_type: long
description: >
The response latency (milliseconds) of gRPC that had been application-level handled by the server.
- name: grpc.server.handling.latency.ms.sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit at the end :) grpc.server.handling.latency.sum.ms, same thing with the sum operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above this is a prometheus spesific concern.

format: duration
description: >
The response latency of gRPC, sum of latencies in milliseconds
- name: grpc.server.handling.latency.ms.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit at the end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above this is a prometheus spesific concern.

Copy link
Contributor

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. More like questions than requests. Maybe none of them apply. I suggest to create a quick dashboard with the data coming from the metricset to see how much you can visualize them. It's a good exercise when creating a metricset.

@ChrsMark
Copy link
Member Author

ChrsMark commented Feb 3, 2020

Thank you for reviewing @sayden !

Will push the adjustments soonish. Regarding the Prometheus Histogram types as I mentioned inline, this is something that we follow already for this kind of metrics. However this might be a subject of refactoring anyway in the near feature while we are working on #14843. In this case I would suggest we proceed with this currently and have all of them to be refactored with the new type introduction. wdyt?

Regarding the Dashboard, it is planned already and listed in the "TODOs" #15505 (comment).

@sayden
Copy link
Contributor

sayden commented Feb 3, 2020

Thank you for reviewing @sayden !

Will push the adjustments soonish. Regarding the Prometheus Histogram types as I mentioned inline, this is something that we follow already for this kind of metrics. However this might be a subject of refactoring anyway in the near feature while we are working on #14843. In this case I would suggest we proceed with this currently and have all of them to be refactored with the new type introduction. wdyt?

Regarding the Dashboard, it is planned already and listed in the "TODOs" #15505 (comment).

Yeap, yeap, let's proceed. I just wanted to make sure that we weren't missing anything :)

@sayden sayden self-requested a review February 3, 2020 12:03
Copy link
Contributor

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ChrsMark ChrsMark merged commit e9b0be3 into elastic:master Feb 3, 2020
ChrsMark added a commit to ChrsMark/beats that referenced this pull request Feb 4, 2020
@ChrsMark ChrsMark added test-plan Add this PR to be manual test plan and removed needs_backport PR is waiting to be backported to other branches. labels Feb 4, 2020
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Mar 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Metricbeat Metricbeat review Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants