-
Notifications
You must be signed in to change notification settings - Fork 385
UDP port 8301 does not work with client.exposeGossipPort
set to true
#389
Comments
Hey @dschaaff, thanks for creating this issue! Is this something you're seeing on a new installation or on an upgrade? My initial thought is that it most likely depends on the specific CNI implementation. Although, the AWS VPC CNI just uses the portmap plugin aws/amazon-vpc-cni-k8s#153. There is a known issue around upgrading aws/amazon-vpc-cni-k8s#373, that's why I'm curious if this is something that happens on a clean install. |
The issue occurs with both a clean install (brand new cluster and helm deployment) and an upgrade of an existing deployment. I have also seen issues with upgrading that look like the vpc cni issue you linked, but that is unrelated to what is happening here. When that issue occurs it only affects a subset of nodes and I rotate them out. |
I may switch to running with Would you be open to a pull request that makes an optional config item? |
Hey @dschaaff, sorry for the delay. I'm having trouble reproducing this issue. Here is the list of things I've done:
After the install, everything looked healthy. The client agents on EKS are able to join my EC2 server instance. I have run
Let me know if I'm missing something. I saw your PR, and thank you for making a contribution 🙏 I'd like to understand the problem first if at all possible. |
Let pull in some more info on the cluster and the setup. I have 3 separate eks clusters that all exhibit this behavior with Server SetupI have 3 consul servers running directly on ec2 outside of the kubernetes cluster. Here is the config file {
"acl": {
"default_policy": "deny",
"down_policy": "extend-cache",
"enabled": true,
"token_ttl": "30s",
"tokens": {
"agent": "redacted",
"default": "redacted",
"master": "redacted",
"replication": "redacted"
}
},
"addresses": {
"dns": "0.0.0.0",
"grpc": "0.0.0.0",
"http": "0.0.0.0",
"https": "0.0.0.0"
},
"advertise_addr": "10.20.202.203",
"advertise_addr_wan": "10.20.202.203",
"autopilot": {
"cleanup_dead_servers": false,
"last_contact_threshold": "200ms",
"max_trailing_logs": 250,
"server_stabilization_time": "10s"
},
"bind_addr": "10.20.202.203",
"bootstrap": false,
"bootstrap_expect": 3,
"ca_file": "/etc/consul/ssl/ca.crt",
"cert_file": "/etc/consul/ssl/server.crt",
"client_addr": "0.0.0.0",
"data_dir": "/var/consul",
"datacenter": "stg-us-west-2",
"disable_update_check": false,
"domain": "consul",
"enable_local_script_checks": false,
"enable_script_checks": false,
"encrypt": "redacted",
"key_file": "/etc/consul/ssl/server.key",
"log_file": "/var/log/consul/consul.log",
"log_level": "INFO",
"log_rotate_bytes": 0,
"log_rotate_duration": "24h",
"log_rotate_max_files": 0,
"node_name": "ip-10-20-202-203.us-west-2.compute.internal",
"performance": {
"leave_drain_time": "5s",
"raft_multiplier": 1,
"rpc_hold_timeout": "7s"
},
"ports": {
"dns": 8600,
"grpc": 8502,
"http": 8500,
"https": 8501,
"serf_lan": 8301,
"serf_wan": 8302,
"server": 8300
},
"primary_datacenter": "stg-us-west-2",
"raft_protocol": 3,
"retry_interval": "30s",
"retry_interval_wan": "30s",
"retry_join": [
"provider=aws tag_key=consul-datacenter tag_value=stg-us-west-2"
],
"retry_max": 0,
"retry_max_wan": 0,
"server": true,
"tls_min_version": "tls12",
"tls_prefer_server_cipher_suites": false,
"translate_wan_addrs": false,
"ui": true,
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true
}
Client SetupHere is the content of my values file for the helm chart. fullnameOverride: consul
# Available parameters and their default values for the Consul chart.
global:
# enabled is the master enabled switch. Setting this to true or false
# will enable or disable all the components within this chart by default.
# Each component can be overridden using the component-specific "enabled"
# value.
enabled: false
# Domain to register the Consul DNS server to listen for.
domain: consul
# Image is the name (and tag) of the Consul Docker image for clients and
# servers below. This can be overridden per component.
#
# Examples:
# image: "consul:1.5.0"
# image: "hashicorp/consul-enterprise:1.5.0-ent" # Enterprise Consul image
image: "consul:1.7.2"
# imageK8S is the name (and tag) of the consul-k8s Docker image that
# is used for functionality such as the catalog sync. This can be overridden
# per component below.
# Note: support for the catalog sync's liveness and readiness probes was added
# to consul-k8s v0.6.0. If using an older consul-k8s version, you may need to
# remove these checks to make the sync work.
imageK8S: "hashicorp/consul-k8s:0.12.0"
# Datacenter is the name of the datacenter that the agents should register
# as. This shouldn't be changed once the Consul cluster is up and running
# since Consul doesn't support an automatic way to change this value
# currently: https://github.com/hashicorp/consul/issues/1858
datacenter: stg-us-west-2
# enablePodSecurityPolicies is a boolean flag that controls whether pod
# security policies are created for the consul components created by this
# chart. See https://kubernetes.io/docs/concepts/policy/pod-security-policy/
enablePodSecurityPolicies: false
# Gossip encryption key. To enable gossip encryption, provide the name of
# a Kubernetes secret that contains a gossip key. You can create a gossip
# key with the "consul keygen" command.
# See https://www.consul.io/docs/commands/keygen.html
gossipEncryption:
secretName: consul-secrets
secretKey: gossip-encryption-key
# bootstrapACLs will automatically create and assign ACL tokens within
# the Consul cluster. This currently requires enabling both servers and
# clients within Kubernetes. Additionally requires Consul v1.4+ and
# consul-k8s v0.8.0+.
bootstrapACLs: false
# Server, when enabled, configures a server cluster to run. This should
# be disabled if you plan on connecting to a Consul cluster external to
# the Kube cluster.
server:
enabled: false
# Client, when enabled, configures Consul clients to run on every node
# within the Kube cluster. The current deployment model follows a traditional
# DC where a single agent is deployed per node.
client:
enabled: true
image: null
join: null
# grpc should be set to true if the gRPC listener should be enabled.
# This should be set to true if connectInject is enabled.
grpc: true
exposeGossipPorts: true
# enable host network mode see https://github.com/hashicorp/consul-helm/pull/392
enableHostNetworkMode: true
# Resource requests, limits, etc. for the client cluster placement. This
# should map directly to the value of the resources field for a PodSpec,
# formatted as a multi-line string. By default no direct resource request
# is made.
resources: |
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "256Mi"
# extraConfig is a raw string of extra configuration to set with the
# server. This should be JSON.
extraConfig: |
{
"verify_incoming": true,
"verify_outgoing": true,
"verify_server_hostname": true,
"ca_file": "/consul/userconfig/consul-secrets/ca.crt",
"cert_file": "/consul/userconfig/consul-secrets/client.pem",
"key_file": "/consul/userconfig/consul-secrets/client-key.pem",
"ports": {
"http": 8500,
"https": 8501,
"server": 8300
},
"retry_join": [
"provider=aws tag_key=consul-datacenter tag_value=stg-us-west-2"
],
"telemetry": {
"disable_hostname": true,
"prometheus_retention_time": "6h"
}
}
# extraVolumes is a list of extra volumes to mount. These will be exposed
# to Consul in the path `/consul/userconfig/<name>/`. The value below is
# an array of objects, examples are shown below.
extraVolumes:
- type: secret
name: consul-secrets
load: false
- type: secret
name: consul-acl-config
load: true # if true, will add to `-config-dir` to load by Consul
# Toleration Settings for Client pods
# This should be a multi-line string matching the Toleration array
# in a PodSpec.
# The example below will allow Client pods to run on every node
# regardless of taints
# tolerations: |
# - operator: "Exists"
tolerations: ""
# nodeSelector labels for client pod assignment, formatted as a muli-line string.
# ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
# Example:
# nodeSelector: |
# beta.kubernetes.io/arch: amd64
nodeSelector: null
# used to assign priority to client pods
# ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/
priorityClassName: ""
# Extra annotations to attach to the client pods
# This should be a multi-line string mapping directly to the a map of
# the annotations to apply to the client pods
annotations: null
# extraEnvVars is a list of extra enviroment variables to set with the pod. These could be
# used to include proxy settings required for cloud auto-join feature,
# in case kubernetes cluster is behind egress http proxies. Additionally, it could be used to configure
# custom consul parameters.
extraEnvironmentVars:
CONSUL_CACERT: /consul/userconfig/consul-secrets/ca.crt
CONSUL_HTTP_TOKEN_FILE: /consul/userconfig/consul-secrets/consul.token
CONSUL_CLIENT_CERT: /consul/userconfig/consul-secrets/client.pem
CONSUL_CLIENT_KEY: /consul/userconfig/consul-secrets/client-key.pem
# http_proxy: http://localhost:3128,
# https_proxy: http://localhost:3128,
# no_proxy: internal.domain.com
# Configuration for DNS configuration within the Kubernetes cluster.
# This creates a service that routes to all agents (client or server)
# for serving DNS requests. This DOES NOT automatically configure kube-dns
# today, so you must still manually configure a `stubDomain` with kube-dns
# for this to have any effect:
# https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#configure-stub-domain-and-upstream-dns-servers
dns:
enabled: true
ui:
# True if you want to enable the Consul UI. The UI will run only
# on the server nodes. This makes UI access via the service below (if
# enabled) predictable rather than "any node" if you're running Consul
# clients as well.
enabled: false
# syncCatalog will run the catalog sync process to sync K8S with Consul
# services. This can run bidirectional (default) or unidirectionally (Consul
# to K8S or K8S to Consul only).
#
# This process assumes that a Consul agent is available on the host IP.
# This is done automatically if clients are enabled. If clients are not
# enabled then set the node selection so that it chooses a node with a
# Consul agent.
syncCatalog:
# True if you want to enable the catalog sync. "-" for default.
enabled: true
image: null
default: true # true will sync by default, otherwise requires annotation
# toConsul and toK8S control whether syncing is enabled to Consul or K8S
# as a destination. If both of these are disabled, the sync will do nothing.
toConsul: true
toK8S: true
# k8sPrefix is the service prefix to prepend to services before registering
# with Kubernetes. For example "consul-" will register all services
# prepended with "consul-". (Consul -> Kubernetes sync)
k8sPrefix: null
# consulPrefix is the service prefix which preprends itself
# to Kubernetes services registered within Consul
# For example, "k8s-" will register all services peprended with "k8s-".
# (Kubernetes -> Consul sync)
consulPrefix: null
# k8sTag is an optional tag that is applied to all of the Kubernetes services
# that are synced into Consul. If nothing is set, defaults to "k8s".
# (Kubernetes -> Consul sync)
k8sTag: null
# syncClusterIPServices syncs services of the ClusterIP type, which may
# or may not be broadly accessible depending on your Kubernetes cluster.
# Set this to false to skip syncing ClusterIP services.
syncClusterIPServices: true
# nodePortSyncType configures the type of syncing that happens for NodePort
# services. The valid options are: ExternalOnly, InternalOnly, ExternalFirst.
# - ExternalOnly will only use a node's ExternalIP address for the sync
# - InternalOnly use's the node's InternalIP address
# - ExternalFirst will preferentially use the node's ExternalIP address, but
# if it doesn't exist, it will use the node's InternalIP address instead.
nodePortSyncType: ExternalFirst
# aclSyncToken refers to a Kubernetes secret that you have created that contains
# an ACL token for your Consul cluster which allows the sync process the correct
# permissions. This is only needed if ACLs are enabled on the Consul cluster.
aclSyncToken:
secretName: consul-secrets
secretKey: consul-k8s-sync.token
# nodeSelector labels for syncCatalog pod assignment, formatted as a muli-line string.
# ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector
# Example:
# nodeSelector: |
# beta.kubernetes.io/arch: amd64
# ConnectInject will enable the automatic Connect sidecar injector.
connectInject:
enabled: false
# Requires Consul v1.5+ and consul-k8s v0.8.1+
centralConfig:
enabled: false
Security GroupsI have consul security group that is added to all nodes participating in the consul cluster. In this case that is both the servers and eks nodes. I have confirmed network communication is open as expected. {
"SecurityGroups": [
{
"Description": "tf: stg consul client security group",
"GroupName": "stg-consul-client-sg",
"IpPermissions": [
{
"FromPort": 8500,
"IpProtocol": "tcp",
"IpRanges": [],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 8502,
"UserIdGroupPairs": [
{
"Description": "eks",
"GroupId": "sg-01197f8e4ab4e793d",
"UserId": "00000000000"
},
{
"GroupId": "sg-056f1f0147a614203",
"UserId": "00000000000"
}
]
},
{
"FromPort": 8300,
"IpProtocol": "tcp",
"IpRanges": [],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 8300,
"UserIdGroupPairs": [
{
"Description": "eks",
"GroupId": "sg-01197f8e4ab4e793d",
"UserId": "00000000000"
},
{
"GroupId": "sg-056f1f0147a614203",
"UserId": "00000000000"
}
]
},
{
"FromPort": 8301,
"IpProtocol": "udp",
"IpRanges": [],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 8302,
"UserIdGroupPairs": [
{
"GroupId": "sg-056f1f0147a614203",
"UserId": "00000000000"
}
]
},
{
"FromPort": 8600,
"IpProtocol": "udp",
"IpRanges": [],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 8600,
"UserIdGroupPairs": [
{
"GroupId": "sg-056f1f0147a614203",
"UserId": "00000000000"
}
]
},
{
"FromPort": 8301,
"IpProtocol": "tcp",
"IpRanges": [],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 8302,
"UserIdGroupPairs": [
{
"GroupId": "sg-056f1f0147a614203",
"UserId": "00000000000"
}
]
},
{
"FromPort": 8300,
"IpProtocol": "tcp",
"IpRanges": [],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 8302,
"UserIdGroupPairs": [
{
"Description": "eks",
"GroupId": "sg-01197f8e4ab4e793d",
"UserId": "00000000000"
}
]
},
{
"FromPort": 8600,
"IpProtocol": "tcp",
"IpRanges": [],
"Ipv6Ranges": [],
"PrefixListIds": [],
"ToPort": 8600,
"UserIdGroupPairs": [
{
"GroupId": "sg-056f1f0147a614203",
"UserId": "00000000000"
}
]
}
],
"OwnerId": "00000000000",
"GroupId": "sg-056f1f0147a614203",
"IpPermissionsEgress": [],
"Tags": [
{
"Key": "Name",
"Value": "stg-consul-sg"
},
{
"Key": "environment",
"Value": "stg"
},
{
"Key": "src",
"Value": "terraform"
},
{
"Key": "terraform",
"Value": "true"
},
{
"Key": "TFManaged",
"Value": "true"
}
],
"VpcId": "vpc-00000000000"
}
]
} TroubleshootingI just updated the daemonset config to remove
The logs show these for each eks host. This is interesting because netcat shows no issues connecting from the server to the client.
If I then switch I'm happy to collect any additional information that would be helpful. |
Thanks for all this detailed info @dschaaff! That's super helpful. To confirm, are you experiencing any errors? Does your sync process sync services from Kube? In other words, other than those warning messages, is the Consul cluster operational? I can dig into the warning messages. It looks like I'm seeing them after a while on my cluster too. |
As far as I have been able to tell the cluster functions while in that state. The sync service is successfully registering services with consul and I have a number of containers pulling config items from the consul k/v store through the local agent without issue. If use
|
I just wanted to drop in and see if there any updates on this or the corresponding PR? Thanks so much! |
Quick update. I feel like a bit of a ding dong for not thinking about this prior, but pod IPs are directly routable when using the vpc cni in EKS. This means I don't need to use |
Hey @dschaaff, thanks for the update! Yes, if allowing traffic between the pod Network and the consul server network is an option, it's definitely a better solution. The I'll take a look at this behavior again today. It's definitely strange, and I'm curious if this is something specific to AWS or generic to other clouds too. |
After looking a bit more into this, I'm fairly certain this problem is related to the portmap CNI plugin, which is the plugin that various CNIs, including the VPC CNI, use for port mapping when you're using Given that you're using the pod network through VPC CNI, do you feel like #392 is still necessary? |
It’s not necessary for my particular use, no. I’m not sure if it’s been requested before or not, but I’m good if you’d like to close it. Thanks for all the help! |
Ok, I'll close both this issue and the PR for now, but definitely let us know if this comes up again. Thanks for staying engaged on this issue 😀 💯 |
My consul servers run on ec2 outside of my kubernetes cluster. I am using the helm chart to deploy only the consul client daemonset.
In my values file
This sets the following in the ports for the daemonset
However, only tcp traffic is allowed (confirmed via netcat). If edit the spec to set
hostNetwork: true
then udp works as expected.I'm not sure if this is a consul issue or a kubernetes issue. I'm running kubernetes 1.15 on AWS EKS with version 1.5.5 of the vpc cni plugin. I'm happy to provider more information if its useful.
The text was updated successfully, but these errors were encountered: