Wait until the discover controller cache is synced #738

uruddarraju · 2019-12-02T22:00:30Z

This change makes sure we don't have a race condition when the ipamD agent initially starts. In the current state, without these changes, there is a possibility that ipamD would start assigning IPs to pods before its internal state is synced

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

mogren · 2019-12-03T06:17:33Z

pkg/k8sapi/discovery.go

@@ -185,6 +185,11 @@ func (d *Controller) K8SGetLocalPodIPs() ([]*K8SPodInfo, error) {
 	return localPods, nil
 }

+func (d *Controller) WaitUntilReady() {
+	stopCh := make(chan struct{})
+	cache.WaitForCacheSync(stopCh, d.controller.informer.HasSynced)


This sounds like something we should do, but the go docs say WaitForNamedCacheSync should be preferred. Could you change to used that function instead? Also, I'll try to test this some before it gets merged.

jaypipes · 2019-12-03T16:09:28Z

main.go

@@ -53,6 +53,10 @@ func _main() int {
 	discoverController := k8sapi.NewController(kubeClient)
 	go discoverController.DiscoverK8SPods()


If you want to wait for pods to be discovered before moving on, then why not just move the DiscoverK8SPods() call into the NewController function and perform this initialization in the controller's construction instead of in a separate goroutine here? Wouldn't that be simpler?

I thought briefly about this before I made this change. This is used for DiscoveryController today but might be a too strong assumption to say this would be the case going forward. I see this logic to be equivalent to an initialization logic, why start the RPC server or any other process when the required initialization is not done.

jaypipes · 2019-12-03T16:17:34Z

pkg/k8sapi/discovery.go

@@ -185,6 +185,11 @@ func (d *Controller) K8SGetLocalPodIPs() ([]*K8SPodInfo, error) {
 	return localPods, nil
 }

+func (d *Controller) WaitUntilReady() {
+	stopCh := make(chan struct{})
+	cache.WaitForCacheSync(stopCh, d.controller.informer.HasSynced)


What about the Controller.synced field? In Controller.run, the synced field is set to true:

go d.controller.informer.Run(stopCh) log.Info("Waiting for controller cache sync") // Wait for all involved caches to be synced, before processing items from the queue is started if !cache.WaitForCacheSync(stopCh, d.controller.informer.HasSynced) { log.Error("Timed out waiting for caches to sync!") runtime.HandleError(fmt.Errorf("timed out waiting for caches to sync")) return } log.Info("Synced successfully with APIServer") d.synced = true

It seems to me that the simplest solution to this problem is to constrain all this setup/cache-prep logic to the construction of the Controller object itself by creating the SharedInformer structs in the NewController function and then returning the ready-to-run Controller object. You could remove the Controller.synced field entirely if you did this, and the code would become a whole lot easier to read.

uruddarraju · 2019-12-05T16:53:40Z

@mogren @jaypipes Thanks for the feedback. That makes sense. I made some changes to address the feedback. Let me know if this looks better now.

jaypipes · 2019-12-05T22:40:31Z

pkg/k8sapi/discovery.go

+
+	log.Info("Waiting for controller cache sync")
+	// Wait for all involved caches to be synced, before processing items from the queue is started
+	if !cache.WaitForCacheSync(stopCh, d.controller.informer.HasSynced) {


On a future PR (not this one), we should use the WaitForNamedCacheSync method that @mogren alluded to in an earlier comment:

https://github.com/kubernetes/client-go/blob/bc9b51d240b2f5000adc2fa96231ab219ab0d702/tools/cache/shared_informer.go#L196

Which would basically remove the need for the log lines here.

jaypipes · 2019-12-05T22:42:01Z

pkg/k8sapi/discovery.go

@@ -68,15 +68,31 @@ type Controller struct {
 	controller *controller
 	kubeClient kubernetes.Interface
 	myNodeName string
-	synced     bool
 }

 // NewController creates a new DiscoveryController


Please update the docstring comment here to indicate that the function constructs the caches (and connects to the k8s API server to do so) before returning a fully-constructed controller object.

jaypipes · 2019-12-05T22:46:03Z

main.go

+		log.Errorf("Failed to create new Discover Controller: %v", err)
+		return 1
+	}
+
 	go discoverController.DiscoverK8SPods()


This needs to go inside the NewController() function and be synchronously executed (see below)

jaypipes · 2019-12-05T22:48:53Z

pkg/k8sapi/discovery.go

+
+	log.Info("Starting Pod informer")
+
+	go d.controller.informer.Run(stopCh)


d.controller will be nil here, since d.controller is created in the DiscoverK8sPods() method:

amazon-vpc-cni-k8s/pkg/k8sapi/discovery.go

Line 155 in 7c6dec7

d.controller = newController(queue, indexer, informer)

What you need to do is basically get rid of the DiscoverK8SPods() method entirely, and move the code that is in there into the NewController() function.

The only reason that the DiscoverK8SPods() method even exists as a separate struct method apparently is to "start the controller" by doing:

select {}

but that's really kinda hacky IMHO. A cleaner solution would be to simply move all this setup code into NewController():

// create the pod watcher podListWatcher := cache.NewListWatchFromClient(d.kubeClient.CoreV1().RESTClient(), "pods", metav1.NamespaceAll, fields.OneTermEqualSelector("spec.nodeName", d.myNodeName)) // create the workqueue queue := workqueue.NewRateLimitingQueue(workqueue.DefaultControllerRateLimiter()) // Bind the workqueue to a cache with the help of an informer. This way we make sure that // whenever the cache is updated, the pod key is added to the workqueue. // Note that when we finally process the item from the workqueue, we might see a newer version // of the Pod than the version which was responsible for triggering the update. indexer, informer := cache.NewIndexerInformer(podListWatcher, &v1.Pod{}, 0, cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) { key, err := cache.MetaNamespaceKeyFunc(obj) if err == nil { queue.Add(key) } }, UpdateFunc: func(old interface{}, new interface{}) { key, err := cache.MetaNamespaceKeyFunc(new) if err == nil { queue.Add(key) } }, DeleteFunc: func(obj interface{}) { // IndexerInformer uses a delta queue, therefore for deletes we have to use this // key function. key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj) if err == nil { queue.Add(key) } }, }, cache.Indexers{}) d.controller = newController(queue, indexer, informer)

and move this code:

// Now let's start the controller stop := make(chan struct{}) defer close(stop) go d.run(1, stop) // Wait forever select {}

to a separate Controller.Listen() method that you can call after returning successfully from NewController()

adammw · 2020-05-05T21:14:02Z

~~is this still required after 1.6 rewrites?~~

just noticed the original bug report was for a rc version of 1.6 so answering my own question with a YES.

mogren · 2020-06-05T22:04:33Z

Replaced by changes in #972

uruddarraju mentioned this pull request Dec 2, 2019

Duplicate IP Addresses for pods w/o host network #711

Closed

mogren suggested changes Dec 3, 2019

View reviewed changes

jaypipes suggested changes Dec 3, 2019

View reviewed changes

uruddarraju force-pushed the waitForSync branch from 39ab3c8 to 0df2e37 Compare December 5, 2019 16:51

Waiting until the discover controller cache is synced

0108430

uruddarraju force-pushed the waitForSync branch from 0df2e37 to 0108430 Compare December 5, 2019 16:53

uruddarraju requested review from jaypipes and mogren December 5, 2019 17:00

jaypipes changed the title ~~Waiting until the discover controller cache is synced~~ Wait until the discover controller cache is synced Dec 5, 2019

jaypipes suggested changes Dec 5, 2019

View reviewed changes

jayanthvn mentioned this pull request Jun 5, 2020

Sync IPAMD state before assigning pod Ips #1009

Closed

mogren mentioned this pull request Jun 5, 2020

Persist IPAM state to local file and use across restarts #972

Merged

mogren closed this Jun 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait until the discover controller cache is synced #738

Wait until the discover controller cache is synced #738

uruddarraju commented Dec 2, 2019

mogren Dec 3, 2019

jaypipes Dec 3, 2019

uruddarraju Dec 4, 2019

jaypipes Dec 3, 2019

uruddarraju commented Dec 5, 2019

jaypipes Dec 5, 2019

jaypipes Dec 5, 2019

jaypipes Dec 5, 2019

jaypipes Dec 5, 2019

adammw commented May 5, 2020 •

edited

Loading

mogren commented Jun 5, 2020

		@@ -53,6 +53,10 @@ func _main() int {
		discoverController := k8sapi.NewController(kubeClient)
		go discoverController.DiscoverK8SPods()


		log.Info("Starting Pod informer")

		go d.controller.informer.Run(stopCh)

Wait until the discover controller cache is synced #738

Wait until the discover controller cache is synced #738

Conversation

uruddarraju commented Dec 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uruddarraju commented Dec 5, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adammw commented May 5, 2020 • edited Loading

mogren commented Jun 5, 2020

adammw commented May 5, 2020 •

edited

Loading