-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use machine.Spec.ProviderID to filter a machine #93
Use machine.Spec.ProviderID to filter a machine #93
Conversation
|
||
if machineName, found := node.Annotations[machineAnnotationKey]; found { | ||
return c.findMachine(machineName) | ||
for _, m := range machines { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is O(n). It was O(1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frobware There was a problem with using annotation from Node. To delete/cleanup unregistered machines, unregistered nodes are passed to this function. Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered. This node object is created locally in the core and used just to pass providerID. This node object will never have annotation set by nodelink-controller and thus clean up of unregistered aws instances will never happen.
So the question is not O(n) vs O(1), it was a bug and now this is being fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now here is the catch, the node object which is being received here for such unregistered nodes, has no existence at apiserver because this particular machine never got registered.
Disagree. We find this node by looking up in the informer store. This is why this code is weird. We are passed a node object, yet we go on again to look it up in the store. Why? Because as you've found out the object passed in here is incomplete - the core of the autoscaler only sets the ProviderID field. The node we eventually lookup is a fully-paid-up and genuine node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- here node objects are being created for the unregistered nodes and these objects have just provider id. Unregistered Nodes are the provider instances which do not have a v1.Node object at the apiserver.
- List of unregistered nodes(provider IDs), which is created in step 1, gets stored/saved in the
ClusterStateRegistry
by the core here - Then machine-api provider's implementation of
NodeGroupForNode(unregistered.Node)
gets invoked. Node which is being passed here is the one created in first step, - NodeGroupForNode() invokes nodeGroupForNode() with the same node object, which further passes same node object to
findMachineByNodeProviderId(node)
- here we are trying to find unregistered node using informer.
Node informer will never be able to find this node at apiserver, because the list of nodes created in step 1 has only those providerIDs which do not have a node object. Therefore, for unregistered machines, L175 will never gets executed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the analysis. +1
@vikaschoudhary16: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
LGTM. But this needs openshift/cluster-api-provider-aws#210 first - true? |
func (c *machineController) findMachineByNodeProviderID(node *apiv1.Node) (*v1beta1.Machine, error) { | ||
objs, err := c.nodeInformer.GetIndexer().ByIndex(nodeProviderIDIndex, node.Spec.ProviderID) | ||
machines, err := c.machineInformer.Lister().Machines("").List(labels.Everything()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an index (for provider-id) on Machines - this will make this cheap again.
https://jira.coreos.com/browse/CLOUD-311