Use machine.Spec.ProviderID to filter a machine #93

vikaschoudhary16 · 2019-05-17T15:25:55Z

https://jira.coreos.com/browse/CLOUD-311

frobware · 2019-05-17T15:40:22Z

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_controller.go

-
-	if machineName, found := node.Annotations[machineAnnotationKey]; found {
-		return c.findMachine(machineName)
+	for _, m := range machines {


This is O(n). It was O(1).

@frobware There was a problem with using annotation from Node. To delete/cleanup unregistered machines, unregistered nodes are passed to this function. Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered. This node object is created locally in the core and used just to pass providerID. This node object will never have annotation set by nodelink-controller and thus clean up of unregistered aws instances will never happen.
So the question is not O(n) vs O(1), it was a bug and now this is being fixed.

Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered.

Disagree. We find this node by looking up in the informer store. This is why this code is weird. We are passed a node object, yet we go on again to look it up in the store. Why? Because as you've found out the object passed in here is incomplete - the core of the autoscaler only sets the ProviderID field. The node we eventually lookup is a fully-paid-up and genuine node.

here node objects are being created for the unregistered nodes and these objects have just provider id. Unregistered Nodes are the provider instances which do not have a v1.Node object at the apiserver.

List of unregistered nodes(provider IDs), which is created in step 1, gets stored/saved in the ClusterStateRegistry by the core here

Then machine-api provider's implementation of NodeGroupForNode(unregistered.Node) gets invoked. Node which is being passed here is the one created in first step,

NodeGroupForNode() invokes nodeGroupForNode() with the same node object, which further passes same node object to findMachineByNodeProviderId(node)

here we are trying to find unregistered node using informer.
Node informer will never be able to find this node at apiserver, because the list of nodes created in step 1 has only those providerIDs which do not have a node object. Therefore, for unregistered machines, L175 will never gets executed.

Thanks for the analysis. +1

openshift-ci-robot · 2019-05-17T16:40:14Z

@vikaschoudhary16: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/prow/git-history	`48eb367`	link	`/test git-history`
ci/prow/unit	`48eb367`	link	`/test unit`
ci/prow/e2e-aws-operator	`48eb367`	link	`/test e2e-aws-operator`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

frobware · 2019-05-22T15:50:41Z

LGTM.

But this needs openshift/cluster-api-provider-aws#210 first - true?

frobware · 2019-05-22T15:52:05Z

cluster-autoscaler/cloudprovider/openshiftmachineapi/machineapi_controller.go

 func (c *machineController) findMachineByNodeProviderID(node *apiv1.Node) (*v1beta1.Machine, error) {
-	objs, err := c.nodeInformer.GetIndexer().ByIndex(nodeProviderIDIndex, node.Spec.ProviderID)
+	machines, err := c.machineInformer.Lister().Machines("").List(labels.Everything())


Let's add an index (for provider-id) on Machines - this will make this cheap again.

vikaschoudhary16 · 2019-05-27T03:23:15Z

@frobware did indexing improvement on top of this and created a new PR, so closing this one.
#97

Use machine.Spec.ProviderID to filter a machine using provider id key

48eb367

openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 17, 2019

openshift-ci-robot requested review from paulfantom and spangenberg May 17, 2019 15:26

frobware reviewed May 17, 2019

View reviewed changes

vikaschoudhary16 mentioned this pull request May 18, 2019

Add cluster-api based cloudprovider kubernetes/autoscaler#1866

Merged

frobware suggested changes May 22, 2019

View reviewed changes

vikaschoudhary16 closed this May 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use machine.Spec.ProviderID to filter a machine #93

Use machine.Spec.ProviderID to filter a machine #93

vikaschoudhary16 commented May 17, 2019

frobware May 17, 2019

vikaschoudhary16 May 17, 2019

frobware May 17, 2019

vikaschoudhary16 May 17, 2019

frobware May 22, 2019

openshift-ci-robot commented May 17, 2019

frobware commented May 22, 2019

frobware May 22, 2019

vikaschoudhary16 commented May 27, 2019

Use machine.Spec.ProviderID to filter a machine #93

Use machine.Spec.ProviderID to filter a machine #93

Conversation

vikaschoudhary16 commented May 17, 2019

frobware May 17, 2019

Choose a reason for hiding this comment

vikaschoudhary16 May 17, 2019

Choose a reason for hiding this comment

frobware May 17, 2019

Choose a reason for hiding this comment

vikaschoudhary16 May 17, 2019

Choose a reason for hiding this comment

frobware May 22, 2019

Choose a reason for hiding this comment

openshift-ci-robot commented May 17, 2019

frobware commented May 22, 2019

frobware May 22, 2019

Choose a reason for hiding this comment

vikaschoudhary16 commented May 27, 2019