Flannel should respond to k8s node delete events by cleaning up lease #954

davidmccormick · 2018-02-20T15:53:57Z

Expected Behavior

When a node a purposely removed/deleted from a k8s cluster, we want its network lease to be freed up immediately rather than having to wait for lease to expire. We have a number of large clusters with a reduced pod network/17 which means we are running out of leases when we roll/upgrade them. We feel that it is a waste to keep leases around for ephemeral nodes that are not going to return when we need faster lease recycling.

Current Behavior

Tested on kubernetes 1.8.4 with flannel 0.9.1. When I run kubectl delete node ABC then ABC is removed from k8s but checking etcd contents the lease remains behind.

Possible Solution

Does kubernetes pass a delete event through or could flannel watch for node delete events like an operator?

Steps to Reproduce (for bugs)

use kubectl and get nodes list.
use etcdctl (v2 api) to list the leases under /coreos.com/network/subnets/*.
use kubectl delete node ABC to delete one of the nodes.
use etcdctl to list the leases again.
check that lease for deleted node has been removed.

Context

Rolling out updates to large clusters or clusters with few leases left is problematic because nodes are terminated and replacements spun up but flannel runs of leases to give out and so the new nodes are unusable - whilst there are a number of leases blocked until their 24h expiration times out.

Your Environment

Flannel version: 0.9.1
Backend used vxlan
Etcd version: v3.2.10
Kubernetes version (if used): 1.8.4.coreos
Operating System and version: CoreOS (beta branch)
Link to your project (optional):

Deploy Calico and Flannel networking as a daemonset with Kubernetes API as the backing store. Removes the need for nodes connecting to etcd and frees up node podCIDR leases faster -addressing cluster role issue: flannel-io/flannel#954. This is an experimental feature, disabled by default. Kubernetes controllers become responsible for allocating node CIDRs. Switch between Calico+Flannel (Canal) or Flannel. Fast roll out into existing clusters with minimal disruption. Optional calico Typha service for easing load on apiservers in large clusters. Resolves #909

stale · 2023-01-26T14:44:23Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mumoshu mentioned this issue Feb 21, 2018

Move flanneld + calico setup to use Kubernetes storage kubernetes-retired/kube-aws#909

Closed

This was referenced Mar 21, 2018

Networking DaemonSets kubernetes-retired/kube-aws#1193

Closed

Networking Daemonsets kubernetes-retired/kube-aws#1194

Closed

Add networking-daemonsets feature kubernetes-retired/kube-aws#1195

Merged

stale bot added the wontfix label Jan 26, 2023

stale bot closed this as completed Feb 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flannel should respond to k8s node delete events by cleaning up lease #954

Flannel should respond to k8s node delete events by cleaning up lease #954

davidmccormick commented Feb 20, 2018

stale bot commented Jan 26, 2023

Flannel should respond to k8s node delete events by cleaning up lease #954

Flannel should respond to k8s node delete events by cleaning up lease #954

Comments

davidmccormick commented Feb 20, 2018

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

stale bot commented Jan 26, 2023