Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use machine.Spec.ProviderID to filter a machine #93

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -152,30 +152,17 @@ func (c *machineController) run(stopCh <-chan struct{}) error {

// findMachineByNodeProviderID find associated machine using
// node.Spec.ProviderID as the key. Returns nil if either the Node by
// node.Spec.ProviderID cannot be found or if the node has no machine
// annotation. A DeepCopy() of the object is returned on success.
// node.Spec.ProviderID cannot be found or if there is no machine which has matching machine.Spec.ProviderID
func (c *machineController) findMachineByNodeProviderID(node *apiv1.Node) (*v1beta1.Machine, error) {
objs, err := c.nodeInformer.GetIndexer().ByIndex(nodeProviderIDIndex, node.Spec.ProviderID)
machines, err := c.machineInformer.Lister().Machines("").List(labels.Everything())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an index (for provider-id) on Machines - this will make this cheap again.

if err != nil {
return nil, err
}

switch n := len(objs); {
case n == 0:
return nil, nil
case n > 1:
return nil, fmt.Errorf("internal error; expected len==1, got %v", n)
}

node, ok := objs[0].(*apiv1.Node)
if !ok {
return nil, fmt.Errorf("internal error; unexpected type %T", node)
}

if machineName, found := node.Annotations[machineAnnotationKey]; found {
return c.findMachine(machineName)
for _, m := range machines {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is O(n). It was O(1).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frobware There was a problem with using annotation from Node. To delete/cleanup unregistered machines, unregistered nodes are passed to this function. Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered. This node object is created locally in the core and used just to pass providerID. This node object will never have annotation set by nodelink-controller and thus clean up of unregistered aws instances will never happen.
So the question is not O(n) vs O(1), it was a bug and now this is being fixed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered.

Disagree. We find this node by looking up in the informer store. This is why this code is weird. We are passed a node object, yet we go on again to look it up in the store. Why? Because as you've found out the object passed in here is incomplete - the core of the autoscaler only sets the ProviderID field. The node we eventually lookup is a fully-paid-up and genuine node.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. here node objects are being created for the unregistered nodes and these objects have just provider id. Unregistered Nodes are the provider instances which do not have a v1.Node object at the apiserver.
  2. List of unregistered nodes(provider IDs), which is created in step 1, gets stored/saved in the ClusterStateRegistry by the core here
  3. Then machine-api provider's implementation of NodeGroupForNode(unregistered.Node) gets invoked. Node which is being passed here is the one created in first step,
  4. NodeGroupForNode() invokes nodeGroupForNode() with the same node object, which further passes same node object to findMachineByNodeProviderId(node)
  5. here we are trying to find unregistered node using informer.
    Node informer will never be able to find this node at apiserver, because the list of nodes created in step 1 has only those providerIDs which do not have a node object. Therefore, for unregistered machines, L175 will never gets executed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the analysis. +1

if m.Spec.ProviderID != nil && *m.Spec.ProviderID == node.Spec.ProviderID {
return m, nil
}
}

return nil, nil
}

Expand Down