Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opentelemetry_propagate do not work with auth endpoints #9811

Closed
csepulveda opened this issue Mar 30, 2023 · 21 comments
Closed

opentelemetry_propagate do not work with auth endpoints #9811

csepulveda opened this issue Mar 30, 2023 · 21 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@csepulveda
Copy link

What happened:

Enabling the open telemetry module, the traces are created by nginx and then propagated to the application behind.
But when the backend is the authorization endpoint, the opentracing headers are not present.
Here is an example of the auth-url configuration.

    nginx.ingress.kubernetes.io/auth-url: http://yyyy.xxx.svc.cluster.local/auth
    nginx.ingress.kubernetes.io/auth-cache-key: $http_authorization$arg_api_key$request_method
    nginx.ingress.kubernetes.io/auth-response-headers: partner_id

When the auth endpoint its called, there are no opentracing headers.

T 172.16.11.212:59146 -> 172.16.10.71:8080 [AP] #265
GET /auth HTTP/1.1.
X-Request-ID: 332a44b693c19f83fce1bdd929b2a165.
Host: yyyy.xxx.svc.cluster.local.
X-Original-URL: https://xxxx.yyyyy.zzzz/v2/admin/financings?sort=&page=1&per_page=100.
X-Original-Method: GET.
X-Sent-From: nginx-ingress-controller.
X-Real-IP: xxxxx.
X-Forwarded-For: xxxxx.
X-Auth-Request-Redirect: /v2/admin/financings?sort=&page=1&per_page=100.
Connection: close.
sec-ch-ua: "Google Chrome";v="111", "Not(A:Brand";v="8", "Chromium";v="111".
accept: application/json.
sec-ch-ua-mobile: ?0.
authorization: eyJhbGcixxxx
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36.
sec-ch-ua-platform: "macOS".
origin: https://xxxx.yyyyy.zzzz
sec-fetch-site: cross-site.
sec-fetch-mode: cors.
sec-fetch-dest: empty.
referer: https://xxxx.yyyyy.zzzz/.
accept-encoding: gzip, deflate, br.
accept-language: en-US,en;q=0.9.

What you expected to happen:
Maybe when the location its internal the header are not injected when the location its internal.

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

kubectl exec -it $POD_NAME -n $POD_NAMESPACE -- /nginx-ingress-controller --version
Defaulted container "controller" out of: controller, opentelemetry (init)
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.7.0
  Build:         72ff21ed9e26cb969052c753633049ba8a87ecf9
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.6

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"24+", GitVersion:"v1.24.10-eks-48e63af", GitCommit:"9176fb99b52f8d5ff73d67fea27f3a638f679f8a", GitTreeState:"clean", BuildDate:"2023-01-24T19:17:48Z", GoVersion:"go1.19.5", Compiler:"gc", Platform:"linux/amd64"}

Environment:
AWS EKS 1.24

  • How was the ingress-nginx-controller installed:
my-ingress-nginx                                 	kube-system  	13      	2023-03-30 14:53:21.741429 -0300 -03   	deployed	ingress-nginx-4.6.0                  	1.7.0
USER-SUPPLIED VALUES:
controller:
  config:
    allow-snippet-annotations: true
    enable-opentelemetry: "true"
    enable-underscores-in-headers: true
    opentelemetry-config: /etc/nginx/opentelemtry.toml
    opentelemetry-operation-name: HTTP $request_method $service_name $uri
    opentelemetry-trust-incoming-span: "false"
    otel-max-export-batch-size: "512"
    otel-max-queuesize: "2048"
    otel-sampler: AlwaysOn
    otel-sampler-parent-based: "false"
    otel-sampler-ratio: "1.0"
    otel-schedule-delay-millis: "5000"
    otel-service-name: nginx-proxy
    otlp-collector-host: xxxx.yyyy.svc
    otlp-collector-port: "4317"
    proxy-real-ip-cidr: 172.16.0.0/16
    use-proxy-protocol: true
  opentelemetry:
    enabled: true
  replicaCount: 2
  service:
    annotations:
      service.beta.kubernetes.io/aws-load-balancer-backend-protocol: tcp
      service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60"
      service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
      service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: proxy_protocol_v2.enabled=true
      service.beta.kubernetes.io/aws-load-balancer-type: nlb

How to reproduce this issue:
Install the latest version.
Enable the opentelemetry
Set any backend with /auth-url

@csepulveda csepulveda added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority labels Mar 30, 2023
@longwuyuan
Copy link
Contributor

/assign @esigo

@esigo
Copy link
Member

esigo commented Mar 30, 2023

I'm not sure what you mean with opentracing headers. If you mean the context propagation, at the moment only w3s tracing context is supported. It's possible to support b3 context propagation here by changing it to opentelemetry_propagate b3. But it will not be possible to mix these contexts.

@csepulveda
Copy link
Author

csepulveda commented Mar 30, 2023

Let me explain a little bit more.

When i make a request to the application behind nginx, if the request have to be authorized using auth-url, the opentracing headers (w3s) are propagated only to the app and not to the authorizer.

Request:

curl --location --request GET 'https://xx.yy.zz/v2/merchants' \
--header 'authorization: xxxxtoken'

headers on auth service

GET /auth HTTP/1.1.
X-Request-ID: 1988fbf01a6328935f2afb42ab0bd5d8.
Host: xxx-auth.yyy.svc.cluster.local.
X-Original-URL: https://xx.yy.zz/v2/merchants.
X-Original-Method: GET.
X-Sent-From: nginx-ingress-controller.
X-Real-IP: 11.99.99.99.
X-Forwarded-For: 11.99.99.99.
X-Auth-Request-Redirect: /v2/merchants.
Connection: close.
authorization: xxxxtoken.
User-Agent: PostmanRuntime/7.29.2.
Accept: */*.
Postman-Token: 845a33a4-6115-41bd-91da-435501c47b20.
Accept-Encoding: gzip, deflate, br.

headers on app

GET /merchants HTTP/1.1.
traceparent: 00-689d50c15cfac82080b3c70db777bc64-e9c02cfa0c878177-01.
partner_id: xxxx-xxx-xxx-xxx-xxxx.
Host: xx.yy.zz.
X-Request-ID: 1988fbf01a6328935f2afb42ab0bd5d8.
X-Real-IP: 11.99.99.99.
X-Forwarded-For: 11.99.99.99.
X-Forwarded-Host: xx.yy.zz.
X-Forwarded-Port: 443.
X-Forwarded-Proto: https.
X-Forwarded-Scheme: https.
X-Scheme: https.
authorization: xxxxtoken.
User-Agent: PostmanRuntime/7.29.2.
Accept: */*.
Postman-Token: 845a33a4-6115-41bd-91da-435501c47b20.
Accept-Encoding: gzip, deflate, br.

ingress configuration:

ingress:
    enabled: true
    className: "nginx"
    annotations:
      cert-manager.io/cluster-issuer: "letsencrypt"
      nginx.ingress.kubernetes.io/auth-url: http://xxx-auth.yyy.svc.cluster.local/auth
      nginx.ingress.kubernetes.io/auth-cache-key: $http_authorization$arg_api_key$request_method
      nginx.ingress.kubernetes.io/auth-response-headers: partner_id
      nginx.ingress.kubernetes.io/rewrite-target: /$2
      nginx.ingress.kubernetes.io/proxy-buffer-size: 256k
      nginx.ingress.kubernetes.io/proxy-body-size: 75m
      nginx.ingress.kubernetes.io/client-header-buffer-size: 128k
      nginx.ingress.kubernetes.io/opentracing-trust-incoming-span: "false"
      nginx.ingress.kubernetes.io/server-snippet: |
        location ~* "^/v2/debug/" {
            deny all;
            return 403;
          }
    hosts:
      - host: xx.zz.yy
        paths:
          - path: /v2(/|$)(.*)
            pathType: Prefix
  
    tls:
      - hosts:
        - xx.zz.yy
        secretName: xx.zz.yy

The idea to use the opentelemetry module on the ingress is to create the trace on nginx and propagate the traceparent header to both apps (the application and its authorizer) so in that way; I will be able to have all spans and timeline in the same traceId.

@csepulveda
Copy link
Author

Maybe its necessary to force the header using something like:

nginx.ingress.kubernetes.io/auth-url: http://foo.com/external-auth
nginx.ingress.kubernetes.io/auth-snippet: |
    proxy_set_header traceparent $traceparent;

Is there a way to get the traceparent as a variable?

@csepulveda
Copy link
Author

csepulveda commented Mar 30, 2023

I tried the following configuration:

    nginx.ingress.kubernetes.io/auth-snippet: |
      proxy_set_header traceparent 123;

and i'm able to see the header on the authorized service:

...
X-Auth-Request-Redirect: /v2/offers.
traceparent: 123.
Connection: close.
....

The problem is when i try to use the variable "$opentelemetry_context_traceparent" instead 123 nothing shows up.
It's like the variable opentelemetry_context_traceparent its created after the authorization 😢

@github-actions
Copy link

github-actions bot commented May 1, 2023

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

@github-actions github-actions bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 1, 2023
@csepulveda
Copy link
Author

Here is an example to reproduce the issue:
https://github.com/csepulveda/nginx-example

The expected result is to see the service-c span inside the trace generated by Nginx.

Regards!

@github-actions github-actions bot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 5, 2023
@esigo
Copy link
Member

esigo commented May 20, 2023

this works with the following config:

daemon off;
error_log /dev/stdout info;

load_module /modules_mount/etc/nginx/modules/otel/otel_ngx_module.so;

events {
  worker_connections 1024;
}

http {
  opentelemetry on;
  opentelemetry_operation_name "http";
  opentelemetry_config /conf/otel-nginx.toml;
  access_log /dev/stdout;
  opentelemetry_capture_headers on;
  add_header Server-Timing "traceparent;desc=\"$opentelemetry_context_traceparent\"";
  opentelemetry_operation_name my_example_backend_root;
  server {
    listen 80;
    server_name localhost;

    location / {
      opentelemetry on;
      opentelemetry_operation_name my_example_backend;
      opentelemetry_propagate;
      opentelemetry_trust_incoming_spans on;
      proxy_pass http://service-a:80/;

      auth_request /auth;
      set $dummy_val "$opentelemetry_context_traceparent";
    }

    location = /auth {
      opentelemetry_operation_name my_example_auth;
      opentelemetry on;
      opentelemetry_propagate;
      opentelemetry_trust_incoming_spans on;
      internal;
      proxy_pass http://service-c:80/auth;
      proxy_pass_request_body off;
      proxy_set_header Content-Length "";
      proxy_set_header X-Original-URI $request_uri;
    }
  }
}

image

It should be possible to achieve the same config using annotations I think, I haven't tested though.

@eguzki
Copy link

eguzki commented May 24, 2023

@esigo from your example, I would expect my_example_auth operation to be one span added to the trace. Is my expectation correct?

@esigo
Copy link
Member

esigo commented May 24, 2023

@esigo from your example, I would expect my_example_auth operation to be one span added to the trace. Is my expectation correct?

@eguzki yes, it's service-c in the snapshot I attached. But this span is actually coming from the service not from nginx. And auth span (service-c) is a child to the span that nginx created.

@deejgregor
Copy link

I confirmed that the example posted earlier by @esigo works, in particular adding set $dummy_val "$opentelemetry_context_traceparent"; in the parent location block where auth_request is used.

I was able to do it with a snippet, as well:

      nginx.ingress.kubernetes.io/configuration-snippet: |
        set $dummy_val "$opentelemetry_context_traceparent";

Thanks, @esigo!

@deejgregor
Copy link

I'll note that there's a similar issue with propagating to gRPC backend services in #10319.

@zmitry
Copy link

zmitry commented Apr 4, 2024

I confirmed that this snipped actually works, but can someone explain why and is there any plan to fix it and make it work out of the box?

 nginx.ingress.kubernetes.io/configuration-snippet: |
        set $dummy_val "$opentelemetry_context_traceparent";

@zmitry
Copy link

zmitry commented Apr 26, 2024

Hi, do you have any updates on this?

@andrewdinunzio
Copy link

I asked Copilot to explain that workaround:

The set $dummy_val "$opentelemetry_context_traceparent"; line is a workaround to ensure that the traceparent value is correctly propagated. In Nginx, variables are evaluated lazily. This means that if a variable is not used, it won’t be evaluated. By assigning the traceparent value to a dummy variable, you’re forcing Nginx to evaluate it, ensuring that the OpenTelemetry context is correctly propagated.

Is that accurate? The workaround worked for us though, so thanks!

@longwuyuan
Copy link
Contributor

Esigo and other comments seems to be conclusive on the topic of this issue. So there is no action item on the project for this issue hence closing it.

/clsoe

@longwuyuan
Copy link
Contributor

/close

@k8s-ci-robot
Copy link
Contributor

@longwuyuan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@juissi-t
Copy link

juissi-t commented Sep 19, 2024

I tried to use the workaround today with the following results:

  1. When using nginx.ingress.kubernetes.io/auth-keepalive: "20" annotation in an ingress, the auth request is not part of the full request trace.
  2. When not using the keepalive annotation, the workaround is effective and the auth request is part of the full trace.

Any ideas how to get it working when using keepalive?

Edit: it starts to work when nginx.ingress.kubernetes.io/auth-keepalive-share-vars: "true" is set.

@moleskin-smile
Copy link

@longwuyuan: I think we have a regression (vs. opentracing) here. Distributed tracing should work out of the box for auth requests as it worked before. If some additional configuration is needed, it should be documented. The suggested workaround was expressed in plain nginx config, not as a ready-to-use solution for ingress-nginx.

We experience the same thing and we consider it a blocker. Because of this issue, we can't switch to OpenTelemetry in nginx easily, so we can't update ingress-nginx, which in turn blocks further Kubernetes updates. It's already quite big undertaking to support W3c header extraction everywhere and to adjust configuration of all ingress resources, so it would be beneficial for many people if it's not more difficult than necessary. I've attached screenshots from the same kind of request, one with opentracing and another with opentelemetry enabled.

opentracing:
opentracing

opentelemetry:
opentelemetry

This issue was also reported here: open-telemetry/opentelemetry-cpp-contrib#143, open-telemetry/opentelemetry-cpp-contrib#270.

Please consider reopening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-priority needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
Development

No branches or pull requests

10 participants