Skip to content

Commit

Permalink
Don't pile up successive full refreshes during AWS scaledowns
Browse files Browse the repository at this point in the history
Force refreshing everything at every `DeleteNodes` calls causes slow down
and throttling on large clusters with many ASGs (and lot of activity).
that function might be called several times in a row during scale-down
(once for each ASG having a node to be removed).

Each time the forced refresh will re-discover all ASGs, all
LaunchConfigurations, then re-list all instances from disovered ASGs.

That immediate refresh isn't required anyway, as the cache's DeleteInstances
concrete implementation will decrement the nodegroup size, and we can
schedule a grouped refresh for the next loop iteration.

As a later step, I'm considering spliting the asgCache.generate() function
to support per ASG refreshes (and maybe per ASG caches TTLs + jitter, to
spread API calls). But that should address the current issue for now.
  • Loading branch information
bpineau committed Jan 6, 2021
1 parent 7761d70 commit 0f745a5
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions cluster-autoscaler/cloudprovider/aws/aws_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -294,8 +294,9 @@ func (m *AwsManager) DeleteInstances(instances []*AwsInstanceRef) error {
if err := m.asgCache.DeleteInstances(instances); err != nil {
return err
}
klog.V(2).Infof("Some ASG instances might have been deleted, forcing ASG list refresh")
return m.forceRefresh()
klog.V(2).Infof("Some ASG instances might have been deleted, scheduling an ASG list refresh")
m.lastRefresh = time.Now().Add(-refreshInterval)
return nil
}

// GetAsgNodes returns Asg nodes.
Expand Down

0 comments on commit 0f745a5

Please sign in to comment.