Skip to content

Commit

Permalink
Sync with upstream v1.20.3 (#130)
Browse files Browse the repository at this point in the history
* Fix cluster-autoscaler clusterapi sample manifest

This commit fixes sample manifest of cluster-autoscaler clusterapi
provider.

(cherry picked from commit a5fee21)

* Adding functionality to cordon the node before destroying it. This helps load balancer to remove the node from healthy hosts (ALB does have this support).
This won't fix the issue of 502 completely as there is some time node has to live even after cordoning as to serve In-Flight request but load balancer can be configured to remove Cordon nodes from healthy host list.
This feature is enabled by cordon-node-before-terminating flag with default value as false to retain existing behavior.

* Set maxAsgNamesPerDescribe to the new maximum value

While this was previously effectively limited to 50, `DescribeAutoScalingGroups` now supports
fetching 100 ASG per calls on all regions, matching what's documented:
https://docs.aws.amazon.com/autoscaling/ec2/APIReference/API_DescribeAutoScalingGroups.html
```
     AutoScalingGroupNames.member.N
       The names of the Auto Scaling groups.
       By default, you can only specify up to 50 names.
       You can optionally increase this limit using the MaxRecords parameter.
     MaxRecords
       The maximum number of items to return with this call.
       The default value is 50 and the maximum value is 100.
```

Doubling this halves API calls on large clusters, which should help to prevent throttling.

* Break out unmarshal from GenerateEC2InstanceTypes

Refactor to allow for optimisation

* Optimise GenerateEC2InstanceTypes unmarshal memory usage

The pricing json for us-east-1 is currently 129MB. Currently fetching
this into memory and parsing results in a large memory footprint on
startup, and can lead to the autoscaler being OOMKilled.

Change the ReadAll/Unmarshal logic to a stream decoder to significantly
reduce the memory use.

* use aws sdk to find region

* update readme

* Update cluster-autoscaler/cloudprovider/aws/README.md

Co-authored-by: Guy Templeton <[email protected]>

* Merge pull request kubernetes#4274 from kinvolk/imran/cloud-provider-packet-fix

Cloud provider[Packet] fixes

* Fix bug where a node that becomes ready after 2 mins can be treated as unready. Deprecated LongNotStarted

 In cases where node n1 would:
 1) Be created at t=0min
 2) Ready condition is true at t=2.5min
 3) Not ready taint is removed at t=3min
 the ready node is counted as unready

 Tested cases after fix:
 1) Case described above
 2) Nodes not starting even after 15mins still
 treated as unready
 3) Nodes created long ago that suddenly become unready are
 counted as unready.

* Improve misleading log

Signed-off-by: Sylvain Rabot <[email protected]>

* dont proactively decrement azure cache for unregistered nodes

* Cluster Autoscaler: fix unit tests after kubernetes#3924 was backported to 1.20 in kubernetes#4319

The backport included unit tests using a function that changed signature
after 1.20. This was not detected before merging because CI is not
running correctly on 1.20.

* Cluster Autoscaler: backport Github Actions CI to 1.20 (kubernetes#4366)

* annotate fakeNodes so that cloudprovider implementations can identify them if needed

* move annotations to cloudprovider package

* fix 1.19 test

* remove flaky test that's removed in master

* Cluster Autoscaler 1.20.1

* Make arch-specific releases use separate images instead of tags on the same image

This seems to be the current convention in k8s.

* Cluster Autoscaler: add arch-specific build targets to .gitignore

* CA - AWS - Instance List Update 03-10-21 - 1.20 release branch

* CA - AWS - Instance List Update 29-10-21 - 1.20 release branch

* Cluster-Autoscaler update AWS EC2 instance types with g5, m6 and r6

* CA - AWS Instance List Update - 13/12/21 - 1.20

* Merge pull request kubernetes#4497 from marwanad/add-more-azure-instance-types

add more azure instance types

* Cluster Autoscaler 1.20.2

* Add `--feature-gates` flag to support scale up on volume limits (CSI migration enabled)

Signed-off-by: ialidzhikov <[email protected]>

* CA - AWS Cloud Provider - 1.20 Static Instance List Update 02-06-2022

* Cluster Autoscaler - 1.20.3 release

* sync_file updates & other changes

* Updating vendor against [email protected]:kubernetes/kubernetes.git:e3de62298a730415c5d2ab72607ef6adadd6304d (e3de622)

* fixed some declaration errors

Co-authored-by: Kubernetes Prow Robot <[email protected]>
Co-authored-by: Hidekazu Nakamura <[email protected]>
Co-authored-by: atul <[email protected]>
Co-authored-by: Benjamin Pineau <[email protected]>
Co-authored-by: Adrian Lai <[email protected]>
Co-authored-by: darkpssngr <[email protected]>
Co-authored-by: Guy Templeton <[email protected]>
Co-authored-by: Vivek Bagade <[email protected]>
Co-authored-by: Sylvain Rabot <[email protected]>
Co-authored-by: Marwan Ahmed <[email protected]>
Co-authored-by: Jakub Tużnik <[email protected]>
Co-authored-by: GuyTempleton <[email protected]>
Co-authored-by: sturman <[email protected]>
Co-authored-by: Maciek Pytel <[email protected]>
Co-authored-by: ialidzhikov <[email protected]>
  • Loading branch information
16 people authored Jun 25, 2022
1 parent 18bc9ca commit e4c8f8d
Show file tree
Hide file tree
Showing 37 changed files with 2,530 additions and 468 deletions.
36 changes: 36 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: Tests

on:
- push
- pull_request

env:
GOPATH: ${{ github.workspace }}/go

jobs:
test-and-verify:
runs-on: ubuntu-latest
steps:
- name: Set up Go
uses: actions/setup-go@v2
with:
go-version: 1.15

- uses: actions/checkout@v2
with:
path: ${{ env.GOPATH }}/src/k8s.io/autoscaler

- name: Apt-get
run: sudo apt-get install libseccomp-dev -qq

- name: Prepare
working-directory: ${{ env.GOPATH }}/src/k8s.io/autoscaler
run: hack/install-verify-tools.sh

- name: Verify
working-directory: ${{ env.GOPATH }}/src/k8s.io/autoscaler
run: hack/verify-all.sh -v

- name: Test
working-directory: ${{ env.GOPATH }}/src/k8s.io/autoscaler
run: hack/for-go-proj.sh test
2 changes: 2 additions & 0 deletions cluster-autoscaler/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
cluster-autoscaler
cluster-autoscaler-amd64
cluster-autoscaler-arm64
cluster_autoscaler
main
.cover
Expand Down
119 changes: 66 additions & 53 deletions cluster-autoscaler/FAQ.md

Large diffs are not rendered by default.

15 changes: 9 additions & 6 deletions cluster-autoscaler/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ ifdef DOCKER_RM
else
RM_FLAG=
endif
IMAGE=$(REGISTRY)/cluster-autoscaler$(PROVIDER)

export DOCKER_CLI_EXPERIMENTAL := enabled

build: build-arch-$(GOARCH)

Expand All @@ -53,24 +56,24 @@ make-image: make-image-arch-$(GOARCH)
make-image-arch-%:
ifdef BASEIMAGE
docker build --pull --build-arg BASEIMAGE=${BASEIMAGE} \
-t ${REGISTRY}/cluster-autoscaler${PROVIDER}:${TAG}-$* \
-t ${IMAGE}-$*:${TAG} \
-f Dockerfile.$* .
else
docker build --pull \
-t ${REGISTRY}/cluster-autoscaler${PROVIDER}:${TAG}-$* \
-t ${IMAGE}-$*:${TAG} \
-f Dockerfile.$* .
endif
@echo "Image ${TAG}${FOR_PROVIDER}-$* completed"

push-image: push-image-arch-$(GOARCH)

push-image-arch-%:
./push_image.sh ${REGISTRY}/cluster-autoscaler${PROVIDER}:${TAG}-$*
./push_image.sh ${IMAGE}-$*:${TAG}

push-manifest:
DOCKER_CLI_EXPERIMENTAL=enabled docker manifest create ${REGISTRY}/cluster-autoscaler${PROVIDER}:${TAG} \
$(addprefix $(REGISTRY)/cluster-autoscaler$(PROVIDER):$(TAG)-, $(ALL_ARCH))
DOCKER_CLI_EXPERIMENTAL=enabled docker manifest push --purge ${REGISTRY}/cluster-autoscaler${PROVIDER}:${TAG}
docker manifest create ${IMAGE}:${TAG} \
$(addprefix $(REGISTRY)/cluster-autoscaler$(PROVIDER)-, $(addsuffix :$(TAG), $(ALL_ARCH)))
docker manifest push --purge ${IMAGE}:${TAG}

execute-release: $(addprefix make-image-arch-,$(ALL_ARCH)) $(addprefix push-image-arch-,$(ALL_ARCH)) push-manifest
@echo "Release ${TAG}${FOR_PROVIDER} completed"
Expand Down
26 changes: 25 additions & 1 deletion cluster-autoscaler/SYNC-CHANGES/SYNC-CHANGES-1.20.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,13 @@
- [During merging](#during-merging)
- [During vendoring k8s](#during-vendoring-k8s)
- [Others](#others)

- [v1.20.1](#v1201)
- [Synced with which upstream CA](#synced-with-which-upstream-ca-1)
- [Changes made](#changes-made-1)
- [To FAQ](#to-faq-1)
- [During merging](#during-merging-1)
- [During vendoring k8s](#during-vendoring-k8s-1)
- [Others](#others-1)


# v1.20.0
Expand All @@ -34,3 +40,21 @@ _None_
### Others
- Updated README.md for cluster-autoscaler repo to contain new [release matrix](../README.md#releases-gardenerautoscaler) of Gardener Autoscaler

# v1.20.1


## Synced with which upstream CA

[v1.20.3](https://github.com/kubernetes/autoscaler/tree/cluster-autoscaler-1.20.3/cluster-autoscaler)

## Changes made

### To FAQ
_None_
### During merging
- didn't update `cluster-autoscaler/cloudprovider/aws/ec2_instance_types.go` as its not used anymore by mcm, the one used
now (in worst case) is `cluster-autoscaler/cloudprovider/mcm/ec2_instance_types.go`
### During vendoring k8s
_None_
### Others
_None_
3 changes: 3 additions & 0 deletions cluster-autoscaler/cloudprovider/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -354,3 +354,6 @@ To refresh static list, please run `go run ec2_instance_types/gen.go` under
`aws:///us-east-1a/i-01234abcdef`.
* If you want to use regional STS endpoints (e.g. when using VPC endpoint for
STS) the env `AWS_STS_REGIONAL_ENDPOINTS=regional` should be set.
* If you want to run it on instances with IMDSv1 disabled make sure your
EC2 launch configuration has the setting `Metadata response hop limit` set to `2`.
Otherwise, the `/latest/api/token` call will timeout and result in an error. See [AWS docs here](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html#configuring-instance-metadata-options) for further information.
14 changes: 7 additions & 7 deletions cluster-autoscaler/cloudprovider/aws/auto_scaling_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,22 @@ import (
"github.com/stretchr/testify/require"
)

func TestMoreThen50Groups(t *testing.T) {
func TestMoreThen100Groups(t *testing.T) {
service := &AutoScalingMock{}
autoScalingWrapper := &autoScalingWrapper{
autoScaling: service,
}

// Generate 51 ASG names
names := make([]string, 51)
// Generate 101 ASG names
names := make([]string, 101)
for i := 0; i < len(names); i++ {
names[i] = fmt.Sprintf("asg-%d", i)
}

// First batch, first 50 elements
// First batch, first 100 elements
service.On("DescribeAutoScalingGroupsPages",
&autoscaling.DescribeAutoScalingGroupsInput{
AutoScalingGroupNames: aws.StringSlice(names[:50]),
AutoScalingGroupNames: aws.StringSlice(names[:100]),
MaxRecords: aws.Int64(maxRecordsReturnedByAPI),
},
mock.AnythingOfType("func(*autoscaling.DescribeAutoScalingGroupsOutput, bool) bool"),
Expand All @@ -51,10 +51,10 @@ func TestMoreThen50Groups(t *testing.T) {
fn(testNamedDescribeAutoScalingGroupsOutput("asg-1", 1, "test-instance-id"), false)
}).Return(nil)

// Second batch, element 51
// Second batch, element 101
service.On("DescribeAutoScalingGroupsPages",
&autoscaling.DescribeAutoScalingGroupsInput{
AutoScalingGroupNames: aws.StringSlice([]string{"asg-50"}),
AutoScalingGroupNames: aws.StringSlice([]string{"asg-100"}),
MaxRecords: aws.Int64(maxRecordsReturnedByAPI),
},
mock.AnythingOfType("func(*autoscaling.DescribeAutoScalingGroupsOutput, bool) bool"),
Expand Down
4 changes: 2 additions & 2 deletions cluster-autoscaler/cloudprovider/aws/aws_manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ const (
operationWaitTimeout = 5 * time.Second
operationPollInterval = 100 * time.Millisecond
maxRecordsReturnedByAPI = 100
maxAsgNamesPerDescribe = 50
maxAsgNamesPerDescribe = 100
refreshInterval = 1 * time.Minute
autoDiscovererTypeASG = "asg"
asgAutoDiscovererKeyTag = "tag"
Expand Down Expand Up @@ -312,7 +312,7 @@ func (m *AwsManager) getAsgTemplate(asg *asg) (*asgTemplate, error) {
region := az[0 : len(az)-1]

if len(asg.AvailabilityZones) > 1 {
klog.Warningf("Found multiple availability zones for ASG %q; using %s\n", asg.Name, az)
klog.V(4).Infof("Found multiple availability zones for ASG %q; using %s for %s label\n", asg.Name, az, apiv1.LabelFailureDomainBetaZone)
}

instanceTypeName, err := m.buildInstanceType(asg)
Expand Down
101 changes: 69 additions & 32 deletions cluster-autoscaler/cloudprovider/aws/aws_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,26 @@ import (
"encoding/json"
"errors"
"fmt"
"github.com/aws/aws-sdk-go/aws/endpoints"
"io/ioutil"
klog "k8s.io/klog/v2"
"io"
"net/http"
"os"
"regexp"
"strconv"
"strings"

"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/ec2metadata"
"github.com/aws/aws-sdk-go/aws/endpoints"
"github.com/aws/aws-sdk-go/aws/session"

klog "k8s.io/klog/v2"
)

var (
ec2MetaDataServiceUrl = "http://169.254.169.254/latest/dynamic/instance-identity/document"
ec2MetaDataServiceUrl = "http://169.254.169.254"
ec2PricingServiceUrlTemplate = "https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/%s/index.json"
ec2PricingServiceUrlTemplateCN = "https://pricing.cn-north-1.amazonaws.com.cn/offers/v1.0/cn/AmazonEC2/current/%s/index.json"
staticListLastUpdateTime = "2019-10-14"
staticListLastUpdateTime = "2022-06-02"
)

type response struct {
Expand Down Expand Up @@ -82,16 +87,9 @@ func GenerateEC2InstanceTypes(region string) (map[string]*InstanceType, error) {

defer res.Body.Close()

body, err := ioutil.ReadAll(res.Body)
unmarshalled, err := unmarshalProductsResponse(res.Body)
if err != nil {
klog.Warningf("Error parsing %s skipping...\n", url)
continue
}

var unmarshalled = response{}
err = json.Unmarshal(body, &unmarshalled)
if err != nil {
klog.Warningf("Error unmarshalling %s, skip...\n", url)
klog.Warningf("Error parsing %s skipping...\n%s\n", url, err)
continue
}

Expand Down Expand Up @@ -127,6 +125,58 @@ func GetStaticEC2InstanceTypes() (map[string]*InstanceType, string) {
return InstanceTypes, staticListLastUpdateTime
}

func unmarshalProductsResponse(r io.Reader) (*response, error) {
dec := json.NewDecoder(r)
t, err := dec.Token()
if err != nil {
return nil, err
}
if delim, ok := t.(json.Delim); !ok || delim.String() != "{" {
return nil, errors.New("Invalid products json")
}

unmarshalled := response{map[string]product{}}

for dec.More() {
t, err = dec.Token()
if err != nil {
return nil, err
}

if t == "products" {
tt, err := dec.Token()
if err != nil {
return nil, err
}
if delim, ok := tt.(json.Delim); !ok || delim.String() != "{" {
return nil, errors.New("Invalid products json")
}
for dec.More() {
productCode, err := dec.Token()
if err != nil {
return nil, err
}

prod := product{}
if err = dec.Decode(&prod); err != nil {
return nil, err
}
unmarshalled.Products[productCode.(string)] = prod
}
}
}

t, err = dec.Token()
if err != nil {
return nil, err
}
if delim, ok := t.(json.Delim); !ok || delim.String() != "}" {
return nil, errors.New("Invalid products json")
}

return &unmarshalled, nil
}

func parseMemory(memory string) int64 {
reg, err := regexp.Compile("[^0-9\\.]+")
if err != nil {
Expand Down Expand Up @@ -155,26 +205,13 @@ func GetCurrentAwsRegion() (string, error) {
region, present := os.LookupEnv("AWS_REGION")

if !present {
klog.V(1).Infof("fetching %s\n", ec2MetaDataServiceUrl)
res, err := http.Get(ec2MetaDataServiceUrl)
if err != nil {
return "", fmt.Errorf("Error fetching %s", ec2MetaDataServiceUrl)
}

defer res.Body.Close()

body, err := ioutil.ReadAll(res.Body)
c := aws.NewConfig().
WithEndpoint(ec2MetaDataServiceUrl)
sess, err := session.NewSession()
if err != nil {
return "", fmt.Errorf("Error parsing %s", ec2MetaDataServiceUrl)
return "", fmt.Errorf("failed to create session")
}

var unmarshalled = map[string]string{}
err = json.Unmarshal(body, &unmarshalled)
if err != nil {
klog.Warningf("Error unmarshalling %s, skip...\n", ec2MetaDataServiceUrl)
}

region = unmarshalled["region"]
return ec2metadata.New(sess, c).Region()
}

return region, nil
Expand Down
Loading

0 comments on commit e4c8f8d

Please sign in to comment.