-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warning FailedGetResourceMetric horizontal-pod-autoscaler missing request for cpu #79365
Comments
@DirectXMan12 |
@max-rocket-internet could you please share the yaml content of the pod? It seems the pod's cpu request is not set. |
Sure. Here's from the deployment ( {
"limits": {
"cpu": "2",
"memory": "2Gi"
},
"requests": {
"cpu": "1",
"memory": "2Gi"
}
} This shows there's only a single container in these pods. Here's a list of pods:
And resources from all pods ( {
"limits": {
"cpu": "2",
"memory": "2Gi"
},
"requests": {
"cpu": "1",
"memory": "2Gi"
}
}
{
"limits": {
"cpu": "2",
"memory": "2Gi"
},
"requests": {
"cpu": "1",
"memory": "2Gi"
}
} That just repeats 14 times, once for each pod. And then the 3 completed pods show as: {}
{}
{} |
Ahhh, I deleted those
|
But these Here's the pod JSON from one: {
"apiVersion": "v1",
"kind": "Pod",
"metadata": {
"annotations": {
"checksum/config": "31e32a934d7d95c9399fc8ca8250ca6e6974c543e4ee16397b5dcd04b4399679"
},
"creationTimestamp": "2019-06-26T00:00:03Z",
"generateName": "myapp-issue-detection-job-1561507200-",
"labels": {
"controller-uid": "59709b7a-97a5-11e9-b7c2-06c556123efe",
"env": "prd01",
"job-name": "myapp-issue-detection-job-1561507200",
"team": "vendor"
},
"name": "myapp-issue-detection-job-1561507268cnr",
"namespace": "default",
"ownerReferences": [
{
"apiVersion": "batch/v1",
"blockOwnerDeletion": true,
"controller": true,
"kind": "Job",
"name": "myapp-issue-detection-job-1561507200",
"uid": "59709b7a-97a5-11e9-b7c2-06c556123efe"
}
],
"resourceVersion": "56293646",
"selfLink": "/api/v1/namespaces/default/pods/myapp-issue-detection-job-1561507268cnr",
"uid": "59733023-97a5-11e9-b7c2-06c556123efe"
}
} I will test creating more |
It is a little weird. How could I reproduce it? |
HPA use only selector to filter pods without checking ownership, I think this is a horrible mistake! |
@zq-david-wang It might be the cause of this issue. Would you like to send a PR for it? If not, I could try to fix it. @max-rocket-internet Could you share the yaml of |
@hex108 I am not working on this. :) |
Here the job: apiVersion: batch/v1
kind: Job
metadata:
creationTimestamp: "2019-06-27T14:37:06Z"
labels:
app: app01
controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
env: prd01
grafana: grafana_dashboard_link
job-name: myapp-runner-job-1561646220
rps_region: eu01
team: vendor
name: myapp-runner-job-1561646220
namespace: default
ownerReferences:
- apiVersion: batch/v1beta1
blockOwnerDeletion: true
controller: true
kind: CronJob
name: myapp-runner-job
uid: d47b5438-98e6-11e9-935d-02a07544d854
resourceVersion: "56786867"
selfLink: /apis/batch/v1/namespaces/default/jobs/myapp-runner-job-1561646220
uid: 09662df3-98e9-11e9-b7c2-06c556123efe
spec:
backoffLimit: 6
completions: 1
parallelism: 1
selector:
matchLabels:
controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
template:
metadata:
annotations:
checksum/config: 2177f5ab128ca89f6256ef363e9ea5615352d57fc5f207f614f0bc401d2c2b7e
creationTimestamp: null
labels:
app: app01
controller-uid: 09662df3-98e9-11e9-b7c2-06c556123efe
env: prd01
grafana: grafana_dashboard_link
job-name: myapp-runner-job-1561646220
rps_region: eu01
team: vendor
spec:
containers:
- args:
- -c
- node main-report-most-unreachable.js
command:
- /bin/sh
- -c
- sleep 5
env:
# Deleted
image: ubuntu
imagePullPolicy: IfNotPresent
name: app
ports:
- containerPort: 8060
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /rps_rm_service/app/config/parameters
name: config
dnsPolicy: ClusterFirst
restartPolicy: OnFailure
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 180
volumes:
- configMap:
defaultMode: 420
name: myapp
name: config
status:
completionTime: "2019-06-27T14:37:13Z"
conditions:
- lastProbeTime: "2019-06-27T14:37:13Z"
lastTransitionTime: "2019-06-27T14:37:13Z"
status: "True"
type: Complete
startTime: "2019-06-27T14:37:06Z"
succeeded: 1 And here's the deployment: apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "224"
kubernetes.io/change-cause: kubectl patch deployment myapp
--kubeconfig=/home/spinnaker/.hal/default/staging/dependencies/1354313958-kubeconfig
--context=eks-cluster01 --namespace=default --record=true --type=strategic
--patch={"metadata":{"labels":{"app_version":"0.0.1.7293"}},"spec":{"template":{"metadata":{"labels":{"app_version":"0.0.1.7293"}},"spec":{"containers":[{"image":"xxx:0.0.1.7293","name":"app"}]}}}}
moniker.spinnaker.io/application: myapp
creationTimestamp: "2019-02-14T09:36:40Z"
generation: 6399
labels:
app: app01
app_version: 0.0.1.7293
env: prd01
grafana: grafana_dashboard_link
rps_region: eu01
team: vendor
name: myapp
namespace: default
resourceVersion: "56784691"
selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/myapp
uid: 08077060-303c-11e9-9855-0a17475bde48
spec:
progressDeadlineSeconds: 600
replicas: 14
revisionHistoryLimit: 10
selector:
matchLabels:
app: app01
env: prd01
rps_region: eu01
team: vendor
strategy:
rollingUpdate:
maxSurge: 15%
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
annotations:
checksum/config: d378dcf69c87a9daa71f2bd8d23e8584f884a181368619886595e99c8b3233a8
creationTimestamp: null
labels:
app: app01
app_version: 0.0.1.7293
env: prd01
grafana: grafana_dashboard_link
rps_region: eu01
team: vendor
spec:
containers:
- args:
- -c
- node main-migrate.js && node main-start.js
command:
- sh
env:
# Deleted
image: xxxx:0.0.1.7293
imagePullPolicy: IfNotPresent
livenessProbe:
# Deleted
name: app
ports:
- containerPort: 8060
protocol: TCP
readinessProbe:
# Deleted
resources:
limits:
cpu: "2"
memory: 2Gi
requests:
cpu: "1"
memory: 2Gi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /rps_rm_service/app/config/parameters
name: config
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 180
volumes:
- configMap:
defaultMode: 420
name: myapp
name: config
status:
availableReplicas: 14
conditions:
- lastTransitionTime: "2019-06-27T13:49:12Z"
lastUpdateTime: "2019-06-27T13:49:12Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2019-06-05T14:30:55Z"
lastUpdateTime: "2019-06-27T14:28:12Z"
message: ReplicaSet "myapp-f88fc9499" has successfully
progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 6399
readyReplicas: 14
replicas: 14
updatedReplicas: 14 I deleted all the
|
Maybe worth noting that even though I got these events and the HPA is saying |
Any update @zq-david-wang? |
... @max-rocket-internet I am not working on this issue.... |
Ah sorry, my mistake! |
@zq-david-wang Ah I missed that |
Experiencing this same issue 👍 |
I checked the code, but not sure whether it is intended to get pods list just by labels and not check the owner reference. @DirectXMan12 Could you please help confirm it? Thanks! If it is not intended, I could send a PR to fix it. |
I get the same issue:
Here is my HPA spec:
Wonder why this only happens to ONE StatefulSet but not the others? |
Hi, Unfortunately, I am also facing the same: Any idea how to fix this? |
@max-rocket-internet |
experiencing the same issueConfigured HPA as
and
Deployment CPU resources as
Cluster details
|
I found the same issue and in my case the reason because the pod or pods are failing with the metrics is because the POD is not 100% ready... Check the healtchecks , security groups, etc. Here more info: https://docs.aws.amazon.com/eks/latest/userguide/horizontal-pod-autoscaler.html |
@alexvaque In my case, I had to add the request resource to the deployment to fix the issue |
It works for me thanks!! |
I`ve got this issue when all my pods were under one selector an I had to explicity fill resoures blocks for each pod affected this selector |
Had same issue with a deployment that could not scale because of the
error that the HPA of the deployment was showing. Finally got it fixed now. Here the reasons & background: My deployment consist of a
The "main app" container had "resources" set. So first problem were the missing "resources" specs on both sidecar containers. Such behavior with multiple containers in the POD is described in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
The second problem was that the "Job" that ran before the actual app deployment has ALSO to have "resources" defined. And THAT was really unexpected. That is something where @max-rocket-internet also stumbled upon & what i've tested then. So, TIL:
|
Also run into this. Had some old pods without requests which were in |
Thank you so much <3 in minikube case just run "minikube start" |
Encountered the same issue when attempting HPA based on CPU utilisation. Fixed the issue by setting CPU resource request at the Deployment level. |
it worked for me. The cause is I used Helm to deploy the app along with post-install-job without resource limit. |
Faced this same issue. In my case my pod container had limits set, but my Dapr injected sidecar container didn't. |
I think #88167 (substitutes #86044, proposed by @tedyu) approach that checks the GroupVersionKind of pod metrics is suboptimal. E.g. one may have two deployments that use the same selector and then HPA would get metrics for pods of both deployments, checking the kind would not work to tell them apart. See also #78761 (comment) - basically HPA, although it targets deployment by name, uses deployment selector labels to get pod's metrics. |
@AlexanderYastrebov I talked with @arjunrn and the underlying issue is that you can only query metrics-server by label. |
My application is running on a pod with a
Also I have trying set this on my HPA definition but I have faced two issues:
How can I set the metric based on my main container ignoring the sidecar? |
Wondering if this issue should remain open, it seems what another user pointed out above could be the cause, and was true in my case:
|
I'm getting same issue, I have enable metric-server in minikube, when I create hpa its always says my deployment able to get scale, but not scale down even after hours, --------Edited------ I have tried my same deployment with kind cluster and its working fine, there is some issue with minikube |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
Reading the discussion it seems to me the error message |
/assign |
What happened:
HPA always has a target of
<unknown>/70%
and events that say:kubectl top pod
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"
Here's the HPA in YAML:
What you expected to happen:
No
<unknown>
in HPA targetHow to reproduce it (as minimally and precisely as possible):
I can't be sure. It's only a single HPA in our cluster. 10 other HPAs are working OK.
Anything else we need to know?:
Environment:
kubectl version
): 1.12.6The text was updated successfully, but these errors were encountered: