Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose collector outside of cluster #902

Closed
pavolloffay opened this issue May 31, 2022 · 15 comments · Fixed by #1206
Closed

Expose collector outside of cluster #902

pavolloffay opened this issue May 31, 2022 · 15 comments · Fixed by #1206
Labels
area:collector Issues for deploying collector enhancement New feature or request

Comments

@pavolloffay
Copy link
Member

Use case: a user wants to send telemetry data to the OTEL collector from outside of the cluster - e.g. mobile clients or just a different cluster.

@pavolloffay pavolloffay added enhancement New feature or request area:collector Issues for deploying collector labels May 31, 2022
@pavolloffay
Copy link
Member Author

We would like to support this for Kubernetes but as well for OpenShift via OCP routes. I did some research for OCP and to make it work the appProtocol has to be set to h2c (http2 over TLS) with edge TLS termination or a plaintext route. This is supported only on OCP 4.10 via new haproxy. The older OCP versions need to use TLS passthrough route with TLS certificates on the deployment.

@ringerc
Copy link

ringerc commented Jun 4, 2022

Wouldn't this typically be done on k8s via deployment of a LoadBalancer endpoint?

@sergeyshevch
Copy link

Another option is adding k8s Ingress support. Some ingress controllers have a good support for gRPC as well

@frzifus
Copy link
Member

frzifus commented Jul 25, 2022

Hey what would be a suitable solution for this? I have summarized my thoughts here. Would be very happy about your feedback.


Original post in duplicated issues

After creating an collector as deployment, daemonset or statefulset this results in three services using the default type ClusterIP. Looking at the implementation (service.go), it seems that there is no way to change this type. We could now manually create or patch a service with e.g. the ServiceType=LB. But from my point of view, it would be quite nice to be able to specify this immediately in the CRD.

As an extension of the CRD, I was thinking of something like this ingress entry in the following code snipped. (Inspired by the Skupper operator[1])

File opentelemetrycollector_types.go:

	// Mode represents how the collector should be deployed (deployment, daemonset, statefulset or sidecar)
	// +optional
	Mode Mode `json:"mode,omitempty"`

	// Ingress is used to specify how OpenTelemetry Collector is exposed. This
	// function is only available if one of the valid modes is set.
	// Valid modes are: deployment, daemonset and statefulset.
	// +optional
	Ingress struct {
		// Type default value is: none
		// Supported types are: route/loadbalancer/nodeport/nginx-ingress-v1/ingress
		Type string

		// Hostname by which the ingress proxy can be reached.
		Hostname string
            
                ...
	}

What are your thoughts on this?

  • Does it make sense to you?
  • Suggestions for additional entries? (maybe something to prevent exposing the collector without any protection? Like a Insecure bool that comes with a default false value.)
  • Would it make more sense to set the ServiceType in the existing or create an additonal one like <APP>-collector-external?

Might be related:

@frzifus
Copy link
Member

frzifus commented Sep 8, 2022

After playing around i have now ended up with this CR extension. What are your thoughts? This would eliminate a few manual steps on the receiver side. In open-telemetry/opentelemetry.io#1684 I wrote all the manual steps down.

Details..
// Ingress is used to specify how OpenTelemetry Collector is exposed. This
// functionality is only available if one of the valid modes is set.
// Valid modes are: deployment, daemonset and statefulset.
type Ingress struct {
	// Type default value is: none
	// Supported types are: route/loadbalancer/nodeport/ingress
	Type string

	// IngressClassName is the name of the IngressClass cluster resource.
	ClassName string

	// Hostname by which the ingress proxy can be reached.
	Hostname string

	// Protocol used in exposed backend.
	Protocol string

	// Annotations to add to ingress separated by comma,
	// e.g. 'cert-manager.io/cluster-issuer: "letsencrypt"'
	// +optional
	Annotations string

	// TLS configuration.
	// +optional
	TLS []networkingv1.IngressTLS
}

@yuriolisa
Copy link
Contributor

@frzifus, +1 for your idea.
would we have an implementation for Routes in Openshift scenario?

@sergeyshevch
Copy link

sergeyshevch commented Sep 16, 2022

@frzifus Looks like Kubernetes already has all types for ingress. Can we just reuse it in your implementation?

Also, It will be better to have annotations as map[string]string. For example in AWS ingress controller i have 8-10 different annotations per ingress

@PengWang-SMARTM
Copy link

Can someone summarize the current status regarding what's supported and what's not? I'm really confused. I'd appreciate if there is a sharable successfully story with details to follow.

I happened to work on this subject in the past week, and came here after reading the blog, which is posted about ten days ago.

According to the blog, it sounds like Ben(@frzifus) successfully made an otlp grpc connection over an nginx ingress between two otel collectors. However, the example instructions are questionable to me. For example, in the edge side client otel configuration, the endpoint should have a port number.

    exporters:
      otlp:
        endpoint: "<REPLACE: your domain endpoint, e.g.: "traces.example.com">"

Anyway, I'd admit I'm still struggling to create an otlp connection over a traefik ingress, no matter grpc or https. I've successfully made an https connection over the traefik ingress to browse an https server. I thought the otlphttp would be something similar, but I was wrong. I got some success, but still have questions and problems.

otlphttp

Client OTEL

exporters:
  otlphttp:
    endpoint: https://example.com
    auth:
      authenticator: basicauth/client
    tls:
      cert_file: /conf/certs/tls.crt
      key_file: /conf/certs/tls.key
      ca_file: /conf/certs/ca.crt

Server OTEL

receivers:
  otlp:
    protocols:
      http:
        auth:
          authenticator: basicauth/server
        tls:
          cert_file: /conf/certs/tls.crt
          key_file: /conf/certs/tls.key
          client_ca_file: /conf/certs/ca.crt

Service

  annotations:
    traefik.ingress.kubernetes.io/service.serversscheme: "https"
    traefik.ingress.kubernetes.io/service.passhostheader: "false"

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/router.tls: "true"
spec:
  tls:
    - hosts:
        - "example.com"
      secretName: "example-ingress"
  rules:
    - host: "example.com"
      http:
        paths:
          - path: "/"
            pathType: Prefix
            backend:
              service:
                name: example-otel
                port:
                  number: 4318

I started with the configuration above. The client otel got 502 errors, and the server otel said the client did not provide a certificate.

Then I disabled the service side client certificate verification by removing client_ca_file: /conf/certs/ca.crt. Without the certification verification, the otlphttp appears working now.

As you may see, the ingress routing path has to be "/". If I change it to something like (Client exporter endpoint to be https://example.com/otlphttp and Ingress path to be /otlphttp), the connection will fail, and the Client gets Permanent error: error exporting items, request to https://example.com/otlphttp/v1/metrics responded with HTTP Status Code 404 with nothing in the Server log.

traefik.ingress.kubernetes.io/service.passhostheader: "false", this seems important. If it is "true", somewhere would fail after about 3 or 4 minutes. Since I'm new here, I can only record my observation rather than figuring out what exactly happened. I thought if it works, it should work always. Working for couple minutes is something odd to me.

Anyway, I'm still having two issues with this otlphttp over ingress:

  1. how to enable the Server side client certificate verification
  2. how to add a basepath rather than "/"

otlpgrpc config

I'm having much more troubles with otlpgrpc.

Client OTEL

exporters:
      otlp:
        endpoint: example.com:443
        auth:
          authenticator: basicauth/client
        tls:
          cert_file: /conf/certs/tls.crt
          key_file: /conf/certs/tls.key
          ca_file: /conf/certs/ca.crt

Server OTEL

receivers:
  otlp:
    protocols:
          grpc:
            max_recv_msg_size_mib: 32
            auth:
              authenticator: basicauth/server
            tls:
              cert_file: /conf/certs/tls.crt
              key_file: /conf/certs/tls.key
              client_ca_file: /conf/certs/ca.crt

Service

  annotations:
    traefik.ingress.kubernetes.io/service.serversscheme: "h2c"
    traefik.ingress.kubernetes.io/service.passhostheader: "false"

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/router.tls: "true"
spec:
  tls:
    - hosts:
        - "example.com"
      secretName: "example-ingress"
  rules:
    - host: "example.com"
      http:
        paths:
          - path: "/"
            pathType: Prefix
            backend:
              service:
                name: example-otel
                port:
                  number: 4317

Then I started grpc test with the configuration above. Like https, the Client got Permanent error: rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); transport: received unexpected content-type \"text/plain; charset=utf-8\", and nothing on the Server side.

Then delete client_ca_file: /conf/certs/ca.crt. The Client gets a 502 (Bad Gateway) error message followed by the 500 error message seen above.

I certainly need a lot helps here.

@frzifus
Copy link
Member

frzifus commented Sep 26, 2022

would we have an implementation for Routes in Openshift scenario?

@yuriolisa yes, depending on the selected type, an ingress or route entry should be created.

Looks like Kubernetes already has all types for ingress. Can we just reuse it in your implementation?

@sergeyshevch Do you mean we should embed networkingv1.Ingress in the CRD? In my opinion, this would be equivalent to no CRD entry. It does not simplify the exposing of a collector. Also, the automatic setting/overwriting of e.g. service names and ports could lead to confusion.

Also, It will be better to have annotations as map[string]string. For example in AWS ingress controller i have 8-10 different annotations per ingress

Sounds ligitimate. I don't have a strong opinion on that.

Can someone summarize the current status regarding what's supported and what's not?

Hi @PengWang-SMARTM, the goal is to simplify exactly this point. A desirable end result would be to configure a hostname, routing types, annotations and tls in the OTEL-collector CRD. Then the collector should be reachable from outside a cluster.

Currently I am trying to figure out how to expose different receivers in an elegant way. One consideration would be to route the various receiver endpoints via the URL.

A translation would look like this:

receivers:
  otlp:
    protocols:
      grpc:
  otlp/2:
    protocols:
      http:
  jaeger:
    protocols:
      grpc:
      thrift_binary:
      thrift_compact:

Configured hostname: example.com, should lead to addresses like:

example.com/otlp/grpc
example.com/otlp-2/http
example.com/jaeger/grpc
example.com/jaeger/thrift_binary
example.com/jaeger/thrift_compact

This way, however, it would be difficult to allow different TLS configurations for different collectors. What are your thoughts? @yuriolisa @pavolloffay and all the others?


Regarding the issue you faced with exposing the collector:

I started with the configuration above. The client otel got 502 errors, and the server otel said the client did not provide a certificate.

I don't really have experience with traefik, but can it be that you need to enable tls passthrough? It seems that the tls certificate is no longer present, that traefik has already removed it. In the setup described in the blogpost I had the same problem and only had a tls connection up to the nginx proxy.


However, the example instructions are questionable to me. For example, in the edge side client otel configuration, the endpoint should have a port number.

Thanks for the info. In fact the port number was missing. I have fixed the error. Did you notice anything else?

@PengWang-SMARTM
Copy link

PengWang-SMARTM commented Sep 28, 2022

@frzifus , I'm new to Kubernetes, and I haven't got a chance to learn how to use CRD yet. Since I have a working otlphttp connection, I'll come back to the otlp grpc connection at later time.

Regarding the grpc configuration mentioned in your blog, I suspect endpoint: "<REPLACE: your domain endpoint, e.g.: "traces.example.com:443">" won't be enough.

That's because, from Ingress point of view, like routing http(s), it needs something to distinguish the target pod:port so it knows where to forward the incoming requests from port 443. Unless only one grpc routing is allowed through Ingress, there has to be something like the path for http (or subdomain), or each target pod:port needs a unique port number.

Regarding the certificate/502 issue, I feel it's something with certificate passthrough too. I read across, and noticed nginx has an annotation to enable the passthrough. However, I did not find the equivalent on traefik yet.

@frzifus
Copy link
Member

frzifus commented Oct 20, 2022

That's because, from Ingress point of view, like routing http(s), it needs something to distinguish the target pod:port so it knows where to forward the incoming requests from port 443. Unless only one grpc routing is allowed through Ingress, there has to be something like the path for http (or subdomain), or each target pod:port needs a unique port number.

In the example case it is specified in a ingress rule:

  rules:
  - host: <REPLACE: your domain endpoint, e.g.: "[email protected]">
    http:
      paths:
      - pathType: Prefix
        path: "/"
        backend:
          service:
            name: otel-collector-app-collector
            port:
              number: 4317    

There you can map different paths to different service port numbers.

Since v0.61.0 there is an Ingress entry in the CRD. Based on the specified ports and collectors, rules for routing the traffics are automatically created.

Example:

$ kubectl apply -f tests/e2e/ingress/00-install.yaml

$ kubectl describe ingress
...
Rules:
  Host         Path  Backends
  ----         ----  --------
  example.com  
               /otlp-grpc          simplest-collector:otlp-grpc (10.244.0.15:4317)
               /otlp-http          simplest-collector:otlp-http (10.244.0.15:4318)
...

Currently I am working on OpenShift route support and came across the ingress-to-route controller[method]. What do you think, would it make senes to expose the IngressClass field and simply link to the documentation that configuring openshift.io/ingress-to-route as classname does the job? It would be nice and simple. But it seems that this controller is not pre-installed.

@fredthomsen
Copy link

fredthomsen commented Nov 29, 2022

How are people actually leveraging the ingress to get data into a OTEL collector running in a cluster now? I am running into similar issues as @frzifus

  • You cannot add any path information to receiver endpoints in the CRD config ie 0.0.0.0:4318 is valid, but setting 0.0.0.0:4318/otlp-http/ is not.
  • The ingress reconciliation process doesn't seem to account for the fact that these paths (like otlp-http in the example above) are not removed by the ingress and require something like URL rewrite to be setup
  • The limited configuration of the ingress exposed by the CRD doesn't allow the above two issues to be resolved via changing the ingress config.

What am I missing here and how are people dealing with the above? The way I have worked around this now is to manually patch the ingress path to /.

@wizardist
Copy link

@fredthomsen this is not quite clear.

By default, the operator will use ruleType: prefix, and it will create prefix rules for the ingress. It's much more involved to support these prefixes in either Collector or SDKs, although it seems that at least the Collector received some work to support prefixes in the OTLP HTTP receiver. Not sure about gRPC.

Short of adding support to receivers, exporters and SDKs, you can utilize ruleType: subdomain if that works for you.

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otlc-main
spec:
  mode: deployment
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
    processors:
    exporters:
      debug:
    service:
      pipelines:
        metrics:
          receivers: [otlp]
          processors: []
          exporters: [debug]
  ingress:
    type: ingress
    ruleType: subdomain
    annotations:
      cert-manager.io/cluster-issuer: lets-encrypt # Replace with your issuer
      nginx.ingress.kubernetes.io/backend-protocol: GRPC
    ingressClassName: public
    hostname: 'otlc.example.org'
    tls:
      - hosts:
          - 'otlp-grpc.otlc.example.org'
        secretName: otlc-main-ingress-tls

otlc.example.org will be used as the higher-level domain for each of the exposed ports for receivers.

$ kubectl describe service otlc-main-collector
Name:              otlc-main-collector
IP:                10.152.183.207
Port:              otlp-grpc  4317/TCP
TargetPort:        4317/TCP
Endpoints:         10.1.27.137:4317

The ingress will be created as follows:

$ kubectl describe ingress otlc-main-ingress
Name:             otlc-main-ingress
Address:          127.0.0.1
Ingress Class:    public
Default backend:  <default>
TLS:
  otlc-main-ingress-tls terminates otlp-grpc.otlc.example.org
Rules:
  Host                                            Path  Backends
  ----                                            ----  --------
  otlp-grpc.otlc.example.org
                                                  /   otlc-main-collector:otlp-grpc (10.1.27.137:4317)
Annotations:                                      cert-manager.io/cluster-issuer: lets-encrypt
                                                  nginx.ingress.kubernetes.io/backend-protocol: GRPC
Events:                                           <none>

Obviously, this is a ClusterIP service with pretty much out-of-box configuration, so you'd be defining the endpoint as https://otlp-grpc.otlc.example.org:443. It might be far from a production-worthy configuration, but relieves you from handling the custom path prefix configuration.

@a0s
Copy link

a0s commented Jan 22, 2024

Hello, i am trying to expose collector's service as NodePort service.
I have the definition of OpenTelemetryCollector like this (cdktf typescript):

new Manifest(this, 'opentelemetry_collector', {
      manifest: {
        apiVersion: 'opentelemetry.io/v1alpha1',
        kind: 'OpenTelemetryCollector',
        metadata: {
          name: appName,
          namespace: namespace
        },
        spec: {
          ports: [
            {
              appProtocol: "grpc",
              name: "otlp",
              nodePort: grpcReceiverNodePort,
              port: grpcReceiverPort,
              protocol: "TCP",
              targetPort: grpcReceiverPort
            },
            {
              appProtocol: "http",
              name: "otlp-http",
              nodePort: httpReceiverNodePort,
              port: httpReceiverPort,
              protocol: "TCP",
              targetPort: httpReceiverPort
            }
          ],
          config: `
            receivers:
              otlp:
                protocols:
                  grpc:
                    endpoint: "0.0.0.0:${grpcReceiverPort}"
                  http:
                    endpoint: "0.0.0.0:${httpReceiverPort}"            
            exporters:
              otlp/openobserve:                
                endpoint: ${config.grpcEndpoint}
                headers:
                  Authorization: Basic ${config.auth}
                  organization: ${config.stackName}
                  stream-name: default
                tls:
                  insecure: true
                  insecure_skip_verify: true
            service:
              pipelines:
                traces:
                  receivers: [otlp]
                  exporters: [otlp/openobserve]
                logs:
                  receivers: [otlp]                  
                  exporters: [otlp/openobserve]
            `
        }
      }
    });

Looks like nodePort setting is totally ignoring here, i've still getting ClusterIP service instead of NodePort service.
I've checked the sources of telemetry operator and found nothing about nodePort proceding.

version: otel/opentelemetry-collector-contrib:0.89.0

@NickLarsenNZ
Copy link

Looks like nodePort setting is totally ignoring here, i've still getting ClusterIP service instead of NodePort service. I've checked the sources of telemetry operator and found nothing about nodePort proceding.

version: otel/opentelemetry-collector-contrib:0.89.0

@a0s, I have the same problem in opentelemetry-operator:0.96.0 (using opentelemetry-collector-contrib:0.97.0, though the collector version shouldn't matter).

@pavolloffay, I don't think this is fixed yet. The PR you mentioned (#1206) seems to be about OpenShift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants