[Feature] Per-Store TLS configuration in Thanos Query #977

daviddyball · 2019-03-26T14:50:38Z

Request

Feature request to add support for per-store tls configuration.

Use Case

I have store running alongside query on my main "monitoring" cluster and we don't use TLS for comms between pods on the same cluster. I also want to pull metrics from my remote Prometheus instances which exist on different clusters, but these endpoints are TLS encrypted because they are cross-cluster

Suggestion

Change the --store argument to use a more definitive style for remote endpoints e.g.:

grpc://<ip-address>:<port>
grpc+tls://<ip-address>:<port>

This would allow specifying per-store TLS settings and allow a combination of secure and insecure comms for store endpoints.

Related PR from @adrien-f : #899

The text was updated successfully, but these errors were encountered:

daviddyball · 2019-03-28T14:38:01Z

Definitely going to try and throw something together for this.

Ideas so far are:

a) use JDBC-like URLs to configure individual stores (e.g. grpc+tls://<addr>:<port>/?Option1=y&Option2=n)

or

b) use file-based configuration like the object store.

@bwplotka If I go for option "b" should I re-use the object-store file config stuff, or are you happy for me to implement a query-store specific config file?

bwplotka · 2019-04-13T22:41:13Z

So I think we talked about it here: #899

I would say for this specifc use case, let's start with option B and essentially limit, by saying --store are explicitly out of this feature.

I re-use the object-store file config stuff

What? We should never mix object store stuff with discovery stuff. In fact you have ready FileSD logic which is compatible with Prometheus FileSD. It allows arbitrary labels, so we can implement easily this:

One solution would be to do that work in the FileSD config and have labels such as tls_server_name to override TLS config.

I would say this is the way to go. What do you think? (:

Thanks for proposing this and sorry for massive delay!

daviddyball · 2019-04-15T08:56:40Z

@bwplotka Apologies, I meant I can look at writing some similar code to what is in place for FileSD to implement FileStoreConfig or something :-P Definitely don't want to mix the two 😆

bwplotka · 2019-04-15T12:24:41Z

Yea, I would say no specifc config just literally special labels we document properly in our docs

povilasv · 2019-04-16T13:53:46Z

IMO people just coud just run envoy/nginx/etc proxy and handle TLS outside of Thanos. As then we don't have to work on cert reloading all the different options, etc.

@bwplotka wdyt?

mleklund · 2019-04-16T14:17:47Z

I attempted to use nginx, but it appears that the grpc call is not sending a host header. Can anyone confirm this is the case? (Or at least anyone with better knowledge of the code and go than myself)

povilasv · 2019-04-16T16:14:26Z

Why does it need that?

I think you should be able to proxy all the traffic to GPRC backend.

Something like:

http {
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent"';
 
    server {
        listen 80 http2;
 
        access_log logs/access.log main;
 
        location / {
            # Replace localhost:50051 with the address and port of your gRPC server
            # The 'grpc://' prefix is optional; unencrypted gRPC is the default
            grpc_pass grpc://localhost:50051;
        }
    }
}

mleklund · 2019-04-16T19:27:46Z

In my case I would like to park the grpc on a virtual host, instead of running a special nginx instance just for grpc.

povilasv · 2019-04-17T08:38:03Z

I don't think we pass host headers nor we have an option to configure them. So right now I don't think it will work

daviddyball · 2019-04-17T08:58:24Z

@mleklund Try getambassador.io for a nice GRPC proxy. It's what we ended up using to get around the problem of not being able to mix TLS and non-TLS stores.

@povilasv I'm not sure I understand your objection to adding this feature. Do you think it serves no purpose?

povilasv · 2019-04-17T09:43:42Z

I'ts not an objection, I just wanted to point out that it won't work right now.

Whether we want to add host header or no is a seperate discussion and it will take a bit of time.

BTW a lot of people right now run isitio / linkerd, the sidecar per service approach, so I think it's not a wide problem.

mleklund · 2019-04-17T15:40:51Z

Up to this point nginx ingress has worked for us and I would rather not complicate things any more then I need to. I have been looking at envoy and ambassador, and it may be a good fit down the road. For the moment I am using port based tcp for nginx ingress since most of our use cases are behind vpn or firewall.

I was curious if I was missing something or if there were plans to pass header info since grpc is built on top of http(2).

benjaminhuo · 2019-05-08T11:22:27Z

@daviddyball , I'm not sure if I understand your use case correctly, but does this pr #508 solve your problem?

There are flags like --grpc-server-tls-cert/--grpc-client-tls-cert added by this pr in query doc and other components' docs

daviddyball · 2019-05-08T12:11:13Z

@benjaminhuo I don't think it does solve the problem here. The issue is that I'd like query to talk to different stores in different ways (e.g. TLS vs non-TLS) and there is no way to configure per-store endpoint TLS configuration

povilasv · 2019-05-08T12:40:34Z

For now we won't be implementining Per Store TLS configuration. Our general suggestion is run sidecar proxy envoy / nginx / etc.

Hopefully we will update docs soon. We will close the ticket once the docs are there.

caarlos0 · 2019-06-27T21:21:31Z

I would also like something like this.

My idea, which may be wrong, is:

a "central" cluster which gathers metrics from all other clusters
query instances on the central cluster talk to query instances in other clusters over TLS
query instances on the central cluster also talks to local cluster's store and sidecars
- right now this needs to be over TLS as well, and needs to have one endpoint for each sidecar, an extra hop et al

ideally, I would like to define something like:

    - query.cluster1.foobar.com:443
    - query.cluster2.foobar.com:443
    - query.cluster3.foobar.com:443
    - query.cluster4.foobar.com:443
    - dnssrv+_grpc._tcp.thanos-sidecar-grpc.thanos.svc.cluster.local
    - dnssrv+_grpc._tcp.thanos-query-grpc.thanos.svc.cluster.local

so, it seems that without setting --grpc-client-tls-secure the secure endpoints won't work, and if I enable it the dnssrv records won't work :(

as far as I understand there is no way of doing this right now.

benjaminhuo · 2019-07-16T06:16:42Z

Try getambassador.io for a nice GRPC proxy. It's what we ended up using to get around the problem of not being able to mix TLS and non-TLS stores.

@daviddyball I'm considering using ambassador as an edge proxy in each cluster too.
Ambassador has tls termination ability, and we can use it to accept tls connection from central monitoring cluster.

But for the central monitoring cluster, is it possible for ambassador to originate tls connection to the remote cluster(egress grpc tls traffic)?

Would you share your config? thanks.

@bwplotka @caarlos0 , I found a blog post https://improbable.io/blog/thanos-architecture-at-improbable saying

In terms of requests to a remote cluster, Envoy has been used securely to proxy our request between many clusters; meaning that a request will go via an Envoy sidecar, an edge Envoy egress proxy, and over the public internet to an edge Envoy ingress proxy (all over a secure connection). This latter will then forward the request onto the Thanos Sidecar to retrieve the data for the time period and labels specified. All of this is done using server streaming gRPC.

Just wondering where I can find envoy config examples like this for ingress and especially egress tls traffic?

Thanks very much
Ben

daviddyball · 2019-07-16T08:53:46Z

@benjaminhuo I had thanos-query talking directly to ambassador directly, so didn't use Ambassador for outbound. You'd probably want to use Envoy directly for that part, no?

benjaminhuo · 2019-07-16T10:09:51Z

@benjaminhuo I had thanos-query talking directly to ambassador directly, so didn't use Ambassador for outbound. You'd probably want to use Envoy directly for that part, no?

Yeah, I'm considering using envoy to manage outbound tls.

Then you've the same tls client config for all the store specified in thanos query including the one in the same cluster as thanos query?

daviddyball · 2019-07-16T10:45:07Z

We're not validating TLS at the moment @benjaminhuo , so there's no TLS config on query, just a list of store endpoints 👍

benjaminhuo · 2019-07-16T11:28:29Z

We're not validating TLS at the moment @benjaminhuo , so there's no TLS config on query, just a list of store endpoints 👍

I see :)

bwplotka · 2019-09-11T14:17:59Z

It looks like there are mix of things in this feature requests.

So is it about host header or per store TLS config?

daviddyball · 2019-09-11T14:31:35Z

Per-store TLS config. The ability to specify TLS for some stores and disable it for others. We've since worked around the issue, so if it doesn't fit with the intended design of Thanos, then it can be closed as Won't Fix 👍

max77p · 2021-06-21T21:49:19Z

@jppitout i am waiting for the same. @bwplotka any update on this?

Namanl2001 · 2021-06-21T22:20:39Z

I have started working and currently designing the proposal for this issue.

At the moment we can't connect http store if grpc tls config is passed in

@max77p which store endpoints are you referring to here? endpoints on local instance/remote instances/both?

max77p · 2021-06-22T01:39:21Z

@Namanl2001 by local instance do you mean side car that is in same namespace? and by remote do you mean side car in different cluster altogether?

To clarify from my side, I can add multiple sidecars to thanos query IF I use service name in my kubernetes cluster. But the moment I add a sidecar which is behind https and the respective TLS certs, I am unable to add any other side car.

stale · 2021-09-03T02:37:43Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Namanl2001 · 2021-09-03T20:00:20Z

Working PR #4389

stale · 2022-01-09T10:34:03Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2022-03-02T19:26:19Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

matej-g · 2022-03-03T08:48:47Z

Looks like this is still unresolved

stale · 2022-05-02T09:12:15Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

tal-ayalon · 2022-11-09T12:06:29Z

What is the status?

MysterioO · 2023-03-10T14:00:48Z

Hi all, is somebody working on this? We also have indications that this issue is not resolved.

rgarcia89 · 2023-05-23T07:06:02Z

This is a very necessary feature

brandonjwk · 2023-05-31T15:39:23Z

New to setting up Thanos and I've just spent a good amount of time trying to debug my ingress and TLS setup only to discover this issue and realising that you can only either have TLS-enabled stores with the --grpc-client-tls-secure flag or only non-TLS stores. Definitely a gotcha, probably deserving of a troubleshooting entry in the wiki.

doctorpangloss · 2023-06-13T01:55:56Z

I also got burned by this.

doctorpangloss · 2023-06-16T01:03:50Z

This is the extremely painful workaround: we're going to set up an envoyproxy proxy which forwards plaintext http/2 grpc traffic to your TLS grpc thanos-sidecar backend, no matter where it is.

In your cluster, you can create the envoy proxy this way:

# the purpose of this manifest is to convert
# https://some-address.example.com with a given Basic auth
# into a service http://some-thanos-sidecar.monitoring.svc.cluster.local with no SSL
# this works around https://github.com/thanos-io/thanos/issues/977
apiVersion: v1
kind: ConfigMap
metadata:
  name: some-thanos-sidecar-envoy
  namespace: monitoring
data:
  envoy.yaml: |
    static_resources:
      listeners:
        - name: listener_0
          address:
            socket_address: { address: 0.0.0.0, port_value: 8080 }
          filter_chains:
            - filters:
                - name: envoy.filters.network.http_connection_manager
                  typed_config:
                    "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                    stat_prefix: ingress_http
                    codec_type: AUTO
                    route_config:
                      name: local_route
                      virtual_hosts:
                        - name: local_service
                          domains: [ "*" ]
                          routes:
                            - match: { prefix: "/" }
                              route:
                                cluster: service_cluster
                                timeout: 0s
                                host_rewrite_literal: some-address.example.com
                          request_headers_to_add:
                            - header:
                                key: "Authorization"
                                # use the following to encode the value after Basic 
                                # echo -n "username:_some_password_" | base64
                                value: "Basic abcdefgxyz=="
                    http_filters:
                      - name: envoy.filters.http.router
                        typed_config:
                          "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      clusters:
        - name: service_cluster
          http2_protocol_options: {}
          connect_timeout: 0.25s
          type: LOGICAL_DNS
          # Comment out the following line to test on v6 networks
          dns_lookup_family: V4_ONLY
          lb_policy: ROUND_ROBIN
          load_assignment:
            cluster_name: service_cluster
            endpoints:
              - lb_endpoints:
                  - endpoint:
                      address:
                        socket_address:
                          address: some-address.example.com
                          port_value: 443
          transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
              common_tls_context:
                alpn_protocols: h2
              sni: some-address.example.com
    admin:
      address:
        socket_address: { address: 0.0.0.0, port_value: 8001 }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-thanos-sidecar-envoy-proxy
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: some-thanos-sidecar-envoy-proxy
  template:
    metadata:
      labels:
        app: some-thanos-sidecar-envoy-proxy
    spec:
      containers:
        - name: envoy
          image: envoyproxy/envoy:v1.26.2
          ports:
            - containerPort: 8080
          volumeMounts:
            - name: envoy-config
              mountPath: /etc/envoy
      volumes:
        - name: envoy-config
          configMap:
            name: some-thanos-sidecar-envoy
---
apiVersion: v1
kind: Service
metadata:
  name: some-thanos-sidecar-envoy-proxy
  namespace: monitoring
spec:
  selector:
    app: some-thanos-sidecar-envoy-proxy
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: ClusterIP

Observe an Authorization header is added. This makes sense if you are protecting your thanos sidecar in a remote cluster using authentication.

Now, add the following to your query service:

        - --endpoint=some-thanos-sidecar-envoy-proxy.monitoring.svc.cluster.local:80

Here's an example from my Flux HelmRelease manifest using the bitnami thanos chart:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: thanos
  namespace: monitoring
spec:
  interval: 5m
  chart:
    spec:
      version: "12.6.2"
      chart: thanos
      sourceRef:
        kind: HelmRepository
        name: bitnami
      interval: 60m
  install:
    crds: Create
  upgrade:
    crds: CreateReplace
  # https://github.com/prometheus-community/helm-charts/blob/main/charts/kube-prometheus-stack/values.yaml
  values:
    # this is specific to your cluster
    existingObjstoreSecret: thanos-objstore-config
    query:
      enabled: true
      extraFlags:
        # works around https://github.com/thanos-io/thanos/issues/977
        - --endpoint=some-thanos-sidecar-envoy-proxy.monitoring.svc.cluster.local:80
        - --query.timeout=5m
        - --query.lookback-delta=15m
      dnsDiscovery:
        enabled: true
        sidecarsService: kube-prometheus-stack-thanos-discovery
        sidecarsNamespace: monitoring
    queryFrontend:
      enabled: true
      extraFlags:
        - --query-frontend.compress-responses
        - --query-range.split-interval=12h
        - --labels.split-interval=12h
        - --query-range.max-retries-per-request=10
        - --labels.max-retries-per-request=10
        - --query-frontend.log-queries-longer-than=10s
    storegateway:
      enabled: true
    compactor:
      enabled: true

To secure your thanos sidecar with basic auth, use an appropriate ingress configuration.

sourcehawk · 2024-04-15T22:48:29Z

I would also like something like this.

My idea, which may be wrong, is:

a "central" cluster which gathers metrics from all other clusters

query instances on the central cluster talk to query instances in other clusters over TLS

query instances on the central cluster also talks to local cluster's store and sidecars

right now this needs to be over TLS as well, and needs to have one endpoint for each sidecar, an extra hop et al

ideally, I would like to define something like:
    - query.cluster1.foobar.com:443
    - query.cluster2.foobar.com:443
    - query.cluster3.foobar.com:443
    - query.cluster4.foobar.com:443
    - dnssrv+_grpc._tcp.thanos-sidecar-grpc.thanos.svc.cluster.local
    - dnssrv+_grpc._tcp.thanos-query-grpc.thanos.svc.cluster.local
so, it seems that without setting --grpc-client-tls-secure the secure endpoints won't work, and if I enable it the dnssrv records won't work :(

as far as I understand there is no way of doing this right now.

5 years and still the exact same issue 😢

brannondorsey · 2025-01-31T22:27:32Z

I just wasted the better part of a day on this. My Query component communicates with a collection of in-cluster and out-of-cluster stores, requiring TLS for the external communication, but not the internal ones.

If this isn't going to be prioritized as a feature, could it at least be called out as a gotcha somewhere pretty obvious in the Query docs?

MichaHoffmann · 2025-02-01T07:02:50Z

I prepared this by introducing a config file where every endpoint is configured - we can add grpc options there too! By now it should be straight forward I think

bwplotka added feature request/improvement difficulty: medium priority: P0 component: query labels Mar 26, 2019

bwplotka added priority: P1 and removed priority: P0 labels Apr 13, 2019

danielmotaleite mentioned this issue Sep 11, 2019

thanos+ingress-nginx+grpc: impossible setup due missing host header #1507

Closed

fktkrt mentioned this issue Jun 24, 2021

[kube-prometheus-stack] Adding support for multi ingress for thanos prometheus-community/helm-charts#799

Closed

3 tasks

Namanl2001 mentioned this issue Jun 26, 2021

proposal: Automated, per-endpoint mTLS #4377

Merged

jppitout mentioned this issue Jun 30, 2021

Configure Thanos Rule GRPC TLS with grpcTLS the same as Thanos Query bitnami/charts#6797

Closed

stale bot added the stale label Sep 3, 2021

stale bot removed the stale label Sep 3, 2021

stale bot added the stale label Jan 9, 2022

stale bot closed this as completed Mar 2, 2022

matej-g reopened this Mar 3, 2022

stale bot removed the stale label Mar 3, 2022

stale bot added the stale label May 2, 2022

GiedriusS added dont-go-stale Label for important issues which tells the stalebot not to close them and removed stale labels May 2, 2022

[Feature] Per-Store TLS configuration in Thanos Query #977

[Feature] Per-Store TLS configuration in Thanos Query #977

Comments

daviddyball commented Mar 26, 2019

Request

Use Case

Suggestion

daviddyball commented Mar 28, 2019

bwplotka commented Apr 13, 2019

daviddyball commented Apr 15, 2019

bwplotka commented Apr 15, 2019

povilasv commented Apr 16, 2019

mleklund commented Apr 16, 2019 • edited Loading

povilasv commented Apr 16, 2019 • edited Loading

mleklund commented Apr 16, 2019

povilasv commented Apr 17, 2019

daviddyball commented Apr 17, 2019

povilasv commented Apr 17, 2019 • edited Loading

mleklund commented Apr 17, 2019 • edited Loading

benjaminhuo commented May 8, 2019 • edited Loading

daviddyball commented May 8, 2019

povilasv commented May 8, 2019

caarlos0 commented Jun 27, 2019

benjaminhuo commented Jul 16, 2019 • edited Loading

daviddyball commented Jul 16, 2019

benjaminhuo commented Jul 16, 2019 • edited Loading

daviddyball commented Jul 16, 2019

benjaminhuo commented Jul 16, 2019

bwplotka commented Sep 11, 2019

daviddyball commented Sep 11, 2019

max77p commented Jun 21, 2021

Namanl2001 commented Jun 21, 2021

max77p commented Jun 22, 2021

stale bot commented Sep 3, 2021

Namanl2001 commented Sep 3, 2021

stale bot commented Jan 9, 2022

stale bot commented Mar 2, 2022

matej-g commented Mar 3, 2022

stale bot commented May 2, 2022

tal-ayalon commented Nov 9, 2022

MysterioO commented Mar 10, 2023

rgarcia89 commented May 23, 2023

brandonjwk commented May 31, 2023

doctorpangloss commented Jun 13, 2023

doctorpangloss commented Jun 16, 2023

sourcehawk commented Apr 15, 2024 • edited Loading

brannondorsey commented Jan 31, 2025

MichaHoffmann commented Feb 1, 2025

mleklund commented Apr 16, 2019 •

edited

Loading

povilasv commented Apr 16, 2019 •

edited

Loading

povilasv commented Apr 17, 2019 •

edited

Loading

mleklund commented Apr 17, 2019 •

edited

Loading

benjaminhuo commented May 8, 2019 •

edited

Loading

benjaminhuo commented Jul 16, 2019 •

edited

Loading

benjaminhuo commented Jul 16, 2019 •

edited

Loading

sourcehawk commented Apr 15, 2024 •

edited

Loading