Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul connect service mesh not working with consul 1.4.0 #16186

Closed
suikast42 opened this issue Feb 14, 2023 · 6 comments
Closed

Consul connect service mesh not working with consul 1.4.0 #16186

suikast42 opened this issue Feb 14, 2023 · 6 comments

Comments

@suikast42
Copy link
Contributor

Nomad version

v 1.5.0-beta.1

Consul version

v 1.14.4

This issue reported in (#15295) and marked as fixed but still present in nomad 1.5.0-beta.1 with consul 1.14.4. Dowgrade consul to v 1.13.3 and let nomad in v 1.5.0-beta.1 works as expected.

The logging output from consul connect proxy:

[2023-02-14 20:28:33.365][1][debug][connection] [source/common/network/connection_impl.cc:651] [C6] remote close
[2023-02-14 20:28:33.365][1][debug][connection] [source/common/network/connection_impl.cc:250] [C6] closing socket: 0
[2023-02-14 20:28:33.365][1][debug][client] [source/common/http/codec_client.cc:107] [C6] disconnect. resetting 1 pending requests
[2023-02-14 20:28:33.365][1][debug][client] [source/common/http/codec_client.cc:156] [C6] request reset
[2023-02-14 20:28:33.365][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:214] [C6] destroying stream: 0 remaining
[2023-02-14 20:28:33.365][1][debug][router] [source/common/router/router.cc:1212] [C0][S4014950647359578935] upstream reset: reset reason: connection termination, transport failure reason:
[2023-02-14 20:28:33.365][1][debug][http] [source/common/http/async_client_impl.cc:105] async http request response headers (end_stream=true):
':status', '200'
'content-type', 'application/grpc'
'grpc-status', '14'
'grpc-message', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination'
[2023-02-14 20:28:33.365][1][debug][config] [./source/common/config/grpc_stream.h:207] DeltaAggregatedResources gRPC config stream to local_agent closed: 14, upstream connect error or disconnect/reset before headers. reset reason: connection termination
[2023-02-14 20:28:33.365][1][debug][config] [source/common/config/grpc_subscription_impl.cc:115] gRPC update for type.googleapis.com/envoy.config.listener.v3.Listener failed
[2023-02-14 20:28:33.366][1][debug][config] [source/common/config/grpc_subscription_impl.cc:115] gRPC update for type.googleapis.com/envoy.config.cluster.v3.Cluster failed
[2023-02-14 20:28:33.366][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:483] [C6] client disconnected, failure reason:
[2023-02-14 20:28:33.366][1][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false
[2023-02-14 20:28:36.955][1][debug][main] [source/server/server.cc:251] flushing stats
[2023-02-14 20:28:36.955][1][debug][main] [source/server/server.cc:261] Envoy is not fully initialized, skipping histogram merge and flushing stats
[2023-02-14 20:28:36.960][1][warning][config] [source/common/config/grpc_subscription_impl.cc:120] gRPC config: initial fetch timed out for type.googleapis.com/envoy.config.listener.v3.Listener
[2023-02-14 20:28:36.960][1][debug][init] [source/common/init/watcher_impl.cc:14] target LDS initialized, notifying init manager Server
[2023-02-14 20:28:36.960][1][debug][init] [source/common/init/watcher_impl.cc:14] init manager Server initialized, notifying RunHelper
[2023-02-14 20:28:36.960][1][info][config] [source/server/listener_manager_impl.cc:831] all dependencies initialized. starting workers
[2023-02-14 20:28:36.960][1][debug][config] [source/server/listener_manager_impl.cc:868] starting worker 0
[2023-02-14 20:28:36.960][15][debug][main] [source/server/worker_impl.cc:124] worker entering dispatch loop
[2023-02-14 20:28:36.960][16][debug][grpc] [source/common/grpc/google_async_client_impl.cc:51] completionThread running
[2023-02-14 20:28:36.960][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1101] adding TLS cluster local_agent
[2023-02-14 20:28:36.961][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1179] membership update for TLS cluster local_agent added 1 removed 0
[2023-02-14 20:28:36.961][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1101] adding TLS cluster self_admin
[2023-02-14 20:28:36.961][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1179] membership update for TLS cluster self_admin added 1 removed 0
[2023-02-14 20:28:36.961][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1101] adding TLS cluster collector_cluster_name
[2023-02-14 20:28:36.961][15][debug][upstream] [source/common/upstream/cluster_manager_impl.cc:1179] membership update for TLS cluster collector_cluster_name added 1 removed 0
[2023-02-14 20:28:36.979][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:354] dns resolution for tempo-zipkin.service.consul started
[2023-02-14 20:28:36.982][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:275] dns resolution for tempo-zipkin.service.consul completed with status 0
[2023-02-14 20:28:36.982][1][debug][upstream] [source/common/upstream/upstream_impl.cc:351] transport socket match, socket default selected for host with address 10.21.21.42:9411
[2023-02-14 20:28:36.982][1][debug][upstream] [source/common/upstream/strict_dns_cluster.cc:180] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2023-02-14 20:28:41.961][1][debug][main] [source/server/server.cc:251] flushing stats
[2023-02-14 20:28:41.984][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:354] dns resolution for tempo-zipkin.service.consul started
[2023-02-14 20:28:41.986][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:275] dns resolution for tempo-zipkin.service.consul completed with status 0
[2023-02-14 20:28:41.986][1][debug][upstream] [source/common/upstream/upstream_impl.cc:351] transport socket match, socket default selected for host with address 10.21.21.42:9411
[2023-02-14 20:28:41.987][1][debug][upstream] [source/common/upstream/strict_dns_cluster.cc:180] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms
[2023-02-14 20:28:46.964][1][debug][main] [source/server/server.cc:251] flushing stats
[2023-02-14 20:28:46.988][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:354] dns resolution for tempo-zipkin.service.consul started
[2023-02-14 20:28:46.991][1][debug][dns] [source/extensions/network/dns_resolver/cares/dns_impl.cc:275] dns resolution for tempo-zipkin.service.consul completed with status 0
[2023-02-14 20:28:46.991][1][debug][upstream] [source/common/upstream/upstream_impl.cc:351] transport socket match, socket default selected for host with address 10.21.21.42:9411
[2023-02-14 20:28:46.991][1][debug][upstream] [source/common/upstream/strict_dns_cluster.cc:180] DNS refresh rate reset for tempo-zipkin.service.consul, refresh rate 5000 ms

@shoenig
Copy link
Member

shoenig commented Feb 14, 2023

Hi @suikast42 can you make sure your Nomad client config is updated to deal with the breaking change Consul made? if you're using TLS gRPC you need to use port 8503 now (or configure the grpc_tls port as 8502 on the Consul side).

E.g. in #15360 (comment)

@suikast42
Copy link
Contributor Author

Hi @suikast42 can you make sure your Nomad client config is updated to deal with the breaking change Consul made? if you're using TLS gRPC you need to use port 8503 now (or configure the grpc_tls port as 8502 on the Consul side).

E.g. in #15360 (comment)

Thats my nomad server and agent consul config. But the result is still the same.

consul{
 ssl= true
 address = "127.0.0.1:8501"
 grpc_address = "127.0.0.1:8503"
 # this works only with ACL enabled
 allow_unauthenticated= true
 ca_file   = "{{cluster_intermediate_ca_bundle}}"
 cert_file = "{{nomad_cert}}"
 key_file  = "{{nomad_cert_key}}"
}

@shoenig
Copy link
Member

shoenig commented Feb 14, 2023

@suikast42 I think you will also need to set grpc_ca_file (it can be the same value as ca_file)

@suikast42
Copy link
Contributor Author

@suikast42 I think you will also need to set grpc_ca_file (it can be the same value as ca_file)

Excactly that 👍 . It seems working as before now. I close this issue and reopen it if I detect any issue belongs to this.
Thanks.

@timotheencl
Copy link

Hello ! When do you folks plan a 1.4.5 release to include the fix ? Actually the config field grps_ca_file is not recognized by Nomad 1.4.4. Thanks !

@shoenig
Copy link
Member

shoenig commented Feb 27, 2023

Soon @timotheencl! We don't have exact dates but we're starting the process of rolling out an RC for 1.5, then the GA release shortly after that which includes the backport releases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

4 participants