-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes don't route LoadBalancer traffic correctly #535
Comments
Hello @danderson, thanks for reaching out. I haven't heard of such reports, nor of any issues with I don't have time to try deploying MetalLB on my clusters right now (furthermore, this could be complicated since they run on OpenStack which doesn't really like MACs or IPs it didn't assign itself on the virtual network, by default...), but for reference, here's the apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/config.hash: 44f08526ac5cac1d20d11cebd0c92953
kubernetes.io/config.mirror: 44f08526ac5cac1d20d11cebd0c92953
kubernetes.io/config.seen: 2018-12-17T20:15:21.385349621Z
kubernetes.io/config.source: file
kubespray.kube-proxy-cert/serial: A3E727B9A6023607
creationTimestamp: null
labels:
k8s-app: kube-proxy
name: kube-proxy-metalk8s-01
selfLink: /api/v1/namespaces/kube-system/pods/kube-proxy-metalk8s-01
spec:
containers:
- command:
- /hyperkube
- proxy
- --v=2
- --kubeconfig=/etc/kubernetes/kube-proxy-kubeconfig.yaml
- --bind-address=10.200.4.185
- --cluster-cidr=10.233.64.0/18
- --proxy-mode=ipvs
- --oom-score-adj=-998
- --healthz-bind-address=127.0.0.1
- --masquerade-all
- --ipvs-min-sync-period=5s
- --ipvs-sync-period=5s
- --ipvs-scheduler=rr
image: gcr.io/google-containers/hyperkube:v1.10.11
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10256
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 15
name: kube-proxy
resources:
limits:
cpu: 500m
memory: 2G
requests:
cpu: 150m
memory: 64M
securityContext:
privileged: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/ssl/certs
name: ssl-certs-host
readOnly: true
- mountPath: /etc/kubernetes/ssl
name: etc-kube-ssl
readOnly: true
- mountPath: /etc/kubernetes/kube-proxy-kubeconfig.yaml
name: kubeconfig
readOnly: true
- mountPath: /var/run/dbus
name: var-run-dbus
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /run/xtables.lock
name: xtables-lock
dnsPolicy: ClusterFirst
hostNetwork: true
nodeName: metalk8s-01
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
operator: Exists
volumes:
- hostPath:
path: /etc/pki/tls
type: ""
name: ssl-certs-host
- hostPath:
path: /etc/kubernetes/ssl
type: ""
name: etc-kube-ssl
- hostPath:
path: /etc/kubernetes/kube-proxy-kubeconfig.yaml
type: ""
name: kubeconfig
- hostPath:
path: /var/run/dbus
type: ""
name: var-run-dbus
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
status:
phase: Pending
qosClass: Burstable Likewise, I quickly set up a cluster using apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
generateName: kube-proxy-
labels:
controller-revision-hash: 68b57dcf7d
k8s-app: kube-proxy
pod-template-generation: "3"
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: kube-proxy
uid: 434b9d9f-0015-11e9-96df-fa163e7325df
selfLink: /api/v1/namespaces/kube-system/pods/kube-proxy-l26r5
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- kubeadm-node01
containers:
- command:
- /usr/local/bin/kube-proxy
- --config=/var/lib/kube-proxy/config.conf
- --hostname-override=$(NODE_NAME)
env:
- name: NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
image: k8s.gcr.io/kube-proxy:v1.13.0
imagePullPolicy: IfNotPresent
name: kube-proxy
resources: {}
securityContext:
privileged: true
procMount: Default
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/lib/kube-proxy
name: kube-proxy
- mountPath: /run/xtables.lock
name: xtables-lock
- mountPath: /lib/modules
name: lib-modules
readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-proxy-token-8qnhs
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
hostNetwork: true
nodeName: kubeadm-node01
priority: 2000001000
priorityClassName: system-node-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: kube-proxy
serviceAccountName: kube-proxy
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/network-unavailable
operator: Exists
volumes:
- configMap:
defaultMode: 420
name: kube-proxy
name: kube-proxy
- hostPath:
path: /run/xtables.lock
type: FileOrCreate
name: xtables-lock
- hostPath:
path: /lib/modules
type: ""
name: lib-modules
- name: kube-proxy-token-8qnhs
secret:
defaultMode: 420
secretName: kube-proxy-token-8qnhs
status:
phase: Pending
qosClass: BestEffort
---
apiVersion: v1
data:
config.conf: |-
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
clientConnection:
acceptContentTypes: ""
burst: 10
contentType: application/vnd.kubernetes.protobuf
kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
qps: 5
clusterCIDR: 10.22.0.0/16
configSyncPeriod: 15m0s
conntrack:
max: null
maxPerCore: 32768
min: 131072
tcpCloseWaitTimeout: 1h0m0s
tcpEstablishedTimeout: 24h0m0s
enableProfiling: false
healthzBindAddress: 0.0.0.0:10256
hostnameOverride: ""
iptables:
masqueradeAll: false
masqueradeBit: 14
minSyncPeriod: 0s
syncPeriod: 30s
ipvs:
excludeCIDRs: null
minSyncPeriod: 0s
scheduler: ""
syncPeriod: 30s
kind: KubeProxyConfiguration
metricsBindAddress: 127.0.0.1:10249
mode: ipvs
nodePortAddresses: null
oomScoreAdj: -999
portRange: ""
resourceContainer: /kube-proxy
udpIdleTimeout: 250ms
kubeconfig.conf: |-
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://10.18.0.12:6443
name: default
contexts:
- context:
cluster: default
namespace: default
user: default
name: default
current-context: default
users:
- name: default
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
kind: ConfigMap
metadata:
creationTimestamp: null
labels:
app: kube-proxy
name: kube-proxy
selfLink: /api/v1/namespaces/kube-system/configmaps/kube-proxy With |
So, some more insight. I set up a cluster in an environment where MetalLB in ARP mode can work. First of all, MetalK8s sets up a couple of
In my test environment, I disabled the ones until and including After making these changes and reverting the values to the kernel defaults, accessing my test
So, ARP is working fine (also validated by checking the ARP table on the client), and packets are correctly routed to the host, but then dropped (my test One Google search later I found kubernetes/kubernetes#59976 which is exactly what's going wrong on this deployment. After manually applying the fix on the host ( So, I guess the core bug is a combination of MetalLB with Kubernetes 1.10 running Are you aware of this issue in other MetalLB deployments @danderson ? |
Great debugging! Sounds like that k8s bug is indeed the root cause. I did have some other people reporting that particular problem (and you can see me doing a bunch of debugging in that bug :) ). I didn't realize MetalK8s is configuring IPVS mode. My general advice for IPVS mode right now is: don't use it unless you really need the scalability benefits. The implementation has historically had bug after bug where LoadBalancer behavior is completely broken... And because there's no conformance tests for LoadBalancer in OSS k8s (only cloud provider tests that exercise their own custom implementations), the bugs are undetected until someone tries using MetalLB (none of the main cloud providers enable IPVS - for exactly the same reason, too many bugs). In theory, with k8s 1.13, IPVS mode should finally work correctly, but until I set up my e2e test framework, I can't guarantee it :/. Until you can positively verify that kube-proxy in ipvs mode works correctly, my advice is to revert to iptables mode, or warn MetalK8s users that load-balancers like MetalLB just won't work on MetalK8s. |
Thanks for the confirmation and the insight in Reverting to I'll also need to investigate whether the custom |
Unfortunately no, you can't switch kube-proxy modes cleanly. The safest way I know to do that is to reconfigure kube-proxy, and then reboot the node to start from clean state :(. Since you're deploying with ansible, you could add some more workflow to try and clean up, but again that's a bit risky. Did you test this with MetalK8s 1.1.0-alpha1 ? I see you're ugprading to k8s 1.11 in that release. I don't know if 1.11 works correctly, but it should fix at least some of the ipvs bugs that broke MetalLB in the past. |
I didn't try with Kubernetes 1.11. We may, however, not do a 1.1.x release, but skip and go for 1.2 (K8s 1.12). I'll give that a try. At the same time, I'll check with the team whether it's possible to add a test to our suite which deploys MetalLB and validates things are (or are not) working as expected. However, given our CI runs on OpenStack, this may again require some tweaking of the current test environment to work around Neutron-enforced default network security policies. |
For running on OpenStack, at least for MetalLB's L2 mode you need to disable IP spoofing protection on the VMs, otherwise the OpenStack network layer drops ARP responses sent by MetalLB. I think BGP mode should just work out of the box, although it's more complex to set up because now you need to set up a BGP router as well. |
Yeah, given our systems BGP will likely not work. I got L2 mode to work, by indeed disabling spoofing protection etc, though given our environment you need to set up custom networks etc. to be able to set this up. We'll sort it out :) Anyway, just checked the impact of the various I'll try upgrading to our 1.2 development branch (Kubernertes 1.12) and see what that gives. Thanks for all input! |
Thank you for the quick and thorough response! If I can ever get my e2e testing stack set up, I'll try throwing MetalK8s into the test matrix on the MetalLB side as well :) |
That'd be really cool. Let us know if there's anything we can do to help with that. Also, any chance you can share some details about the MetalLB users who tried to run on MetalK8s? Always happy to hear more user stories! |
I upgraded my cluster to the A So, I guess this is kind-of sorted out... If someone wants to use MetalK8s 1.0.0, one can change the default inventory/var by applying the following patch: diff --git a/playbooks/group_vars/k8s-cluster/10-metal-k8s.yml b/playbooks/group_vars/k8s-cluster/10-metal-k8s.yml
index b48de38cc3..cb614ab6b2 100644
--- a/playbooks/group_vars/k8s-cluster/10-metal-k8s.yml
+++ b/playbooks/group_vars/k8s-cluster/10-metal-k8s.yml
@@ -3,7 +3,7 @@ kube_basic_auth: True
kubeconfig_localhost: True
dns_mode: 'coredns'
-kube_proxy_mode: 'ipvs'
+kube_proxy_mode: 'iptables'
kube_version: 'v1.10.11'
The same may be required for the 1.1 branch. Starting with the 1.2 branch, MetalLB should work out-of-the-box on MetalK8s. Given some ideas we have w.r.t. the future directions of MetalK8s (which may include MetalLB), I'd rather not spend time on switching to |
Thanks to you both! By changing the |
@NicolasT would it be useful to have access to any bare metal resources from Packet? We support local BGP, so it might be useful for testing, etc. Happy to support the community. Let me know! |
Thanks for this feedback @NicolasT , I was getting worried as I couldn't get MetalLB to work on a Metal-k8s based cluster. Integrating MetalLB to your project would really be a great feature. Short quesiton: is it possible to apply the update of 10-metal-k8s.yml (ipvs to iptables) to an already running clulster ? I just tried running the playbook again with different arguments, but it seems the change is been ignored (also after a full restart/reboot). Merci & kind regards. |
Is this still an issue? |
I will check in the next few days if I can find an issue with this. I know I had issues with MetalLB and kubernetes, but it may have been my own setup. So far I am really enjoying MetalK8s's philosophy, and hoping very much it will fit my needs. ATM it's not required for me to use MetalLB, but it would be nice to know it can work in the future. Thanks! |
That's super cool to hear, thanks for sharing. Please let us know whenever there's something missing to suit your needs, issues you'd run into,... |
Hi there! Author of MetalLB here. I've been receiving a bunch of reports of MetalLB not working correctly on MetalK8s. AFAICT, MetalLB is working correctly in these cases, the problem is that kube-proxy on the nodes is either misconfigured, or otherwise outright broken, and isn't correctly handling traffic for type=LoadBalancer services.
The symptom is simply that when packets destined for a LoadBalancer service IP arrive at the node, they're not getting routed correctly to the target pod(s). From the user's perspective, the service IP just doesn't respond at all.
Unfortunately I don't have time to debug in more detail right now, but I figured I'd get this filed to get it on the radar. What I would suggest as a next step is to compare your kube-proxy configuration with the one
kubeadm
generates, and adjust any discrepancies. You can also try installing MetalLB and using its L2 mode, to get a quick demonstration of the breakage.The text was updated successfully, but these errors were encountered: