Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPCLOUD-2060 Merge https://github.com/kubernetes/autoscaler:master (d3ec0c4) into master #256

Merged

Conversation

cloud-team-rebase-bot[bot]
Copy link

No description provided.

k8s-ci-robot and others added 30 commits February 14, 2023 05:31
Check min size of node group and resource limits for set of nodes
* Added GetNodeGpuConfig to cloud provider which returns a GpuConfig
  struct containing the gpu label, type and resource name if the node
  has a GPU.
* Added initial implementaion of the GetNodeGpuConfig to all cloud
  providers.
* Changed the `utilization.Calculate()` function to use GpuConfig
  instead of GPU label.
* Started using GpuConfig in utilization threshold calculations.
Add GpuConfig to cloud provider. Use GpuConfig in utilization calculations.
regenerate the ec2 instance types using latest metadata to fetch m7g/r7g instances
This change removes an `if` statement that was left behind after a
refactor. The test in question has the same logic embedded into a
previous conditional and the removed statement has no effect on the
tests.
remove dead code in clusterapi provider tests
Signed-off-by: Guangwen Feng <[email protected]>
Update VPA dependency github.com/emicklei/go-restful/v3
…nodes_total metrics

* Added the new resource_name field to scaled_up/down_gpu_nodes_total,
  representing the resource name for the gpu.
* Changed metrics registrations to use GpuConfig
update FAQ.md to add version in the pause container image due the latest that is not valid
Add "resource_name" to scaled_up_gpu_nodes_total and scaled_down_gpu_nodes_total metrics
Added support for the AWS Inferentia 2 instance types based on the NeuronCore v2 chip architecture
…ero-with-labels-taints

Use annotations to set labels and taints for clusterapi nodegroups
Merge taint utils into one package, make taint modifying methods public
Track PDBRemainingDisruptions in AutoscalingContext
@JoelSpeed
Copy link

/hold

@elmiko Have HyperShift been notified that their tests are failing on this? Is a discussion open there to make sure we don't break them?

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 5, 2023
@openshift-ci
Copy link

openshift-ci bot commented Jun 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elmiko, JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@elmiko
Copy link

elmiko commented Jun 8, 2023

@JoelSpeed ack, let them know

@enxebre
Copy link
Member

enxebre commented Jun 8, 2023

/test e2e-hypershift

@enxebre
Copy link
Member

enxebre commented Jun 8, 2023

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_kubernetes-autoscaler/256/pull-ci-openshift-kubernetes-autoscaler-master-e2e-hypershift/1664597395830214656/artifacts/e2e-hypershift/run-e2e/artifacts/TestAutoscaling_PreTeardownClusterDump/namespaces/e2e-clusters-f6m5l-example-jz5rp/core/pods/logs/cluster-autoscaler-5bd4b658b5-nfj67-cluster-autoscaler.log

W0602 12:48:11.795940       1 reflector.go:533] k8s.io/client-go/dynamic/dynamicinformer/informer.go:108: failed to list cluster.x-k8s.io/v1beta1, Resource=machinepools: machinepools.cluster.x-k8s.io is forbidden: User "system:serviceaccount:e2e-clusters-f6m5l-example-jz5rp:cluster-autoscaler" cannot list resource "machinepools" in API group "cluster.x-k8s.io" in the namespace "e2e-clusters-f6m5l-example-jz5rp"

I'll update hypershift rbac.

@enxebre
Copy link
Member

enxebre commented Jun 15, 2023

/test e2e-hypershift

@elmiko
Copy link

elmiko commented Jun 15, 2023

it seems like some of our carry commits got dropped, and i'm not sure why. looking into re-adding them
/hold

thanks to @aleskandro for catching it =)

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 15, 2023
@elmiko
Copy link

elmiko commented Jun 15, 2023

i think i've fixed the missing commit, see 834cebd

i'll wait for tests to start passing before removing the hold

@elmiko elmiko force-pushed the rebase-bot-master branch from 834cebd to 175bd5e Compare June 15, 2023 13:04
@muraee
Copy link

muraee commented Jun 15, 2023

/test e2e-hypershift

@elmiko
Copy link

elmiko commented Jun 15, 2023

gonna keep the hold here while we work out a question with the scale from zero annotations

the upstream annotations for the scale from zero capacity resources is
slighty different than the openshift implementation. the largest
difference is the addition of a gpu type annotation. openshift does not
yet utilize this annotation and thus this patch should be carried until
the machineset controllers for the various providers on openshift have
been modified to use the new annotations.

another important change is the modification of the memory annotation.
previously in openshift we expected this value to be a count of memory
in Mebibytes. the conversion function and tests have been modified to
allow continued openshift operation.

this change can be dropped when the annotations in openshift have been
updated, the progress for this effort can be followed at
https://issues.redhat.com/browse/OCPCLOUD-944
@elmiko elmiko force-pushed the rebase-bot-master branch from 175bd5e to c74af56 Compare June 26, 2023 21:18
@elmiko
Copy link

elmiko commented Jun 27, 2023

/retest

@openshift-ci
Copy link

openshift-ci bot commented Jun 27, 2023

@cloud-team-rebase-bot[bot]: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/git-history c74af56 link false /test git-history

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@dtobolik
Copy link

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Jun 29, 2023
@elmiko
Copy link

elmiko commented Jun 29, 2023

/unhold
/lgtm

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jun 29, 2023
@openshift-merge-robot openshift-merge-robot merged commit b597b81 into openshift:master Jun 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. qe-approved Signifies that QE has signed off on this PR rebase/manual Indicates the PR should not be rebased by the rebasebot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.