Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add opentelemetry tracing of gRPC calls #1714

Conversation

Fricounet
Copy link
Contributor

Is this a bug fix or adding new feature?
This PR adds support for opentelemetry tracing in the driver. The feature is opt-in behind a feature flag --enable-otel-tracing.

What is this PR about? / Why do we need it?
Adds basic tracing instrumentation focused around the gRPC calls by using an opentelemetry lib to intercept the calls automatically. However, some more in-depth instrumentation could be added at a later date.
The chart was also updated to allow users to use this new feature.
Closes #1691

What testing is done?
This change has been running in Datadog's AWS clusters for a month without any issue. Below is a screenshot of an example of a captured trace.
image

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 10, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @Fricounet!

It looks like this is your first PR to kubernetes-sigs/aws-ebs-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-ebs-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 10, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @Fricounet. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested review from gtxu and hanyuel August 10, 2023 16:34
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 10, 2023
@torredil
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 10, 2023
@Fricounet Fricounet force-pushed the fricounet/upstream/otel-tracing-grpc branch 2 times, most recently from aa9ca17 to 3e3d309 Compare August 11, 2023 09:05
@Fricounet
Copy link
Contributor Author

Hi @torredil looking at the failing tests,

  • pull-aws-ebs-csi-driver-test-e2e-external-eks-windows seem to have been failing on all the recent runs. Should I ignore it?
  • pull-aws-ebs-csi-driver-test-helm-chart has the following: Error: UPGRADE FAILED: template: aws-ebs-csi-driver/templates/node.yaml:67:26: executing "aws-ebs-csi-driver/templates/node.yaml" at <.Values.node.otelTracing.enabled>: nil pointer evaluating interface {}.enabled. For some reason, the values I set in charts/aws-ebs-csi-driver/values.yaml do not seem to be picked up. Do you know what is missing? 😅

@torredil
Copy link
Member

Hi @Fricounet you may ignore the pull-aws-ebs-csi-driver-test-e2e-external-eks-windows failures, that specific job is currently configured as optional and thus will not block your PR.

As for fixing the helm error, I suggest modifying the scope using with:

{{- with .Values.foo }}
  {{- .bar }}
{{- end }}

@Fricounet
Copy link
Contributor Author

@torredil thanks for your help, test are green now :)
I created a separate commit to fix the tests but if you prefer a cleaner git history, I can squash the commit.
The PR should be ready for review now.

@torredil
Copy link
Member

@Fricounet Glad to hear!

I can squash the commit

That would be great, thank you.

@Fricounet Fricounet force-pushed the fricounet/upstream/otel-tracing-grpc branch from b676dba to cee4afb Compare August 14, 2023 13:24
cmd/main.go Outdated Show resolved Hide resolved
@ConnorJC3
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 14, 2023
@ConnorJC3
Copy link
Contributor

/cc @torredil

anything left on this one besides squashing the commits?

@k8s-ci-robot k8s-ci-robot requested a review from torredil August 14, 2023 17:24
@torredil
Copy link
Member

@Fricounet This looks great. Will merge once the commits have been squashed, thanks!

The feature is disabled by default and hidden behind a the flag
--enable-otel-tracing. When enabled, all the gRPC calls made by the
driver will be instrumented and can be forwarded to an opentelemetry-
compatible collector.
The configuration of opentelemetry tracing can be done in the chart with
values `controller.otelTracing` and `node.otelTracing`. The service name
and the endpoint that will be used by the sdk can be configured with
`otelServiceName` and `otelExporterEndpoint` for both components.
@Fricounet Fricounet force-pushed the fricounet/upstream/otel-tracing-grpc branch from fbd320b to 85f0103 Compare August 16, 2023 06:27
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 16, 2023
@Fricounet
Copy link
Contributor Author

Squash done, thanks for your review :)

@k8s-ci-robot
Copy link
Contributor

@Fricounet: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-aws-ebs-csi-driver-test-e2e-external-eks-windows 85f0103 link false /test pull-aws-ebs-csi-driver-test-e2e-external-eks-windows

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ConnorJC3
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2023
Copy link
Member

@torredil torredil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: torredil

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2023
@k8s-ci-robot k8s-ci-robot merged commit a85fb63 into kubernetes-sigs:master Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add otel trace instrumentation on gRPC calls
4 participants