Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use machine.Spec.ProviderID to filter a machine #93

Conversation

vikaschoudhary16
Copy link

@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 17, 2019

if machineName, found := node.Annotations[machineAnnotationKey]; found {
return c.findMachine(machineName)
for _, m := range machines {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is O(n). It was O(1).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frobware There was a problem with using annotation from Node. To delete/cleanup unregistered machines, unregistered nodes are passed to this function. Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered. This node object is created locally in the core and used just to pass providerID. This node object will never have annotation set by nodelink-controller and thus clean up of unregistered aws instances will never happen.
So the question is not O(n) vs O(1), it was a bug and now this is being fixed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered.

Disagree. We find this node by looking up in the informer store. This is why this code is weird. We are passed a node object, yet we go on again to look it up in the store. Why? Because as you've found out the object passed in here is incomplete - the core of the autoscaler only sets the ProviderID field. The node we eventually lookup is a fully-paid-up and genuine node.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. here node objects are being created for the unregistered nodes and these objects have just provider id. Unregistered Nodes are the provider instances which do not have a v1.Node object at the apiserver.
  2. List of unregistered nodes(provider IDs), which is created in step 1, gets stored/saved in the ClusterStateRegistry by the core here
  3. Then machine-api provider's implementation of NodeGroupForNode(unregistered.Node) gets invoked. Node which is being passed here is the one created in first step,
  4. NodeGroupForNode() invokes nodeGroupForNode() with the same node object, which further passes same node object to findMachineByNodeProviderId(node)
  5. here we are trying to find unregistered node using informer.
    Node informer will never be able to find this node at apiserver, because the list of nodes created in step 1 has only those providerIDs which do not have a node object. Therefore, for unregistered machines, L175 will never gets executed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the analysis. +1

@openshift-ci-robot
Copy link

@vikaschoudhary16: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/git-history 48eb367 link /test git-history
ci/prow/unit 48eb367 link /test unit
ci/prow/e2e-aws-operator 48eb367 link /test e2e-aws-operator

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@frobware
Copy link

LGTM.

But this needs openshift/cluster-api-provider-aws#210 first - true?

func (c *machineController) findMachineByNodeProviderID(node *apiv1.Node) (*v1beta1.Machine, error) {
objs, err := c.nodeInformer.GetIndexer().ByIndex(nodeProviderIDIndex, node.Spec.ProviderID)
machines, err := c.machineInformer.Lister().Machines("").List(labels.Everything())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an index (for provider-id) on Machines - this will make this cheap again.

@vikaschoudhary16
Copy link
Author

@frobware did indexing improvement on top of this and created a new PR, so closing this one.
#97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants