-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to ignore some containers (sidecars) when evaluating if a pod is evictable or not #3947
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
I think this feature is important. Not sure If I have permission to remove stale, but here goes: |
Since is currently closed istio/istio#19395, it looks like the known issue is no longer relevant, while in fact is is still pending on kubernetes/autoscaler#3947. This is just a trivial replacement of the reference issue. I hope it is ok.
Since is currently closed istio/istio#19395, it looks like the known issue is no longer relevant, while in fact is is still pending on kubernetes/autoscaler#3947. This is just a trivial replacement of the reference issue. I hope it is ok.
Since is currently closed istio/istio#19395, it looks like the known issue is no longer relevant, while in fact is is still pending on kubernetes/autoscaler#3947. This is just a trivial replacement of the reference issue. I hope it is ok.
Since is currently closed istio/istio#19395, it looks like the known issue is no longer relevant, while in fact is is still pending on kubernetes/autoscaler#3947. This is just a trivial replacement of the reference issue. I hope it is ok. Co-authored-by: Cynthia Lopes do Sacramento <[email protected]>
This indeed is an important feature that should be added. |
This is a show stopper for workloads that use a mix of local storage and HA services. You can't use the |
@bskiba or @jbartosik what are your thoughts on a feature to ignore some sidecars for autoscaling? |
I would love to see this feature implemented. |
@marwanad or @bskiba is on your roadmap for cluster-autoscaler, or does someone else needs to drive this contribution? The ability to ignore some volumes (based on an annotation) is a very important feature to prevent people from using the |
@ldemailly alternatively, we could implement this as |
whichever way gets us the feature sounds good to me |
I'm trying to understand why this feature is needed. Couldn't you set |
@mbyio On GKE, for example, you cannot change autoscaler configuration, afaik: https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-autoscaler#scheduling-and-disruption |
As mentioned by @mfilocha it seems we are stuck for now until it's customisable on the main existing autoscalers? Better to go with a manual annotation patch in the meantime, no? |
According to https://cloud.google.com/kubernetes-engine/docs/release-notes#October_27_2021: "In GKE version 1.22 and later, GKE cluster autoscaler (...) will support scaling down nodes with pods requesting local storage." |
Because to do so you would need to repeat/reimplement the logic of checking the other containers for volumes. the use case is side-car injection through admission controller for instance. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
We should keep tracking this issue. I still believe adding a way to ignore specific volumes (possibly with an annotation like /remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
I think the issue is still relevant and must be stay open /remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Much Needed Feature as this block the optimal use of istio alongside , the Kubernetes Platform. |
I guess /remove-lifecycle rotten I don't work on Cluster Autoscaler. So I don't plan to work on this. I'd suggest asking about this on the SIG meeting. I think this is much more likely to happen if someone volunteers to contribute this |
It sounds like we just want to ignore local storage during node scale down for certain volumes. This is kind of putting the horse before the cart (I am in the middle of trying to reproduce the issue) but I just whipped a reference implementation PR which I think should work: #5594 |
I can reproduce this issue. Steps to reproduce:
Click here to see sample scale-up/dummy workloadapiVersion: apps/v1
kind: Deployment
metadata:
name: node-scale-up-with-pod-anti-affinity
namespace: default
spec:
selector:
matchLabels:
app: node-scale-up-with-pod-anti-affinity
replicas: 1
template:
metadata:
labels:
app: node-scale-up-with-pod-anti-affinity
spec:
nodeSelector:
cloud.google.com/gke-nodepool: pool-1
# tolerations:
# - effect: NoSchedule
# key: test
# operator: Equal
# value: "true"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- node-scale-up-with-pod-anti-affinity
topologyKey: "kubernetes.io/hostname"
containers:
- name: node-scale-up-with-pod-anti-affinity
image: registry.k8s.io/pause:2.0 Click here to see what the istio sidecar injected test workload pod looks likeapiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/default-container: nginx
kubectl.kubernetes.io/default-logs-container: nginx
prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scrape: "true"
sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
creationTimestamp: "2023-03-14T06:16:01Z"
generateName: nginx-7ddf8d98f9-
labels:
app: nginx
pod-template-hash: 7ddf8d98f9
security.istio.io/tlsMode: istio
service.istio.io/canonical-name: nginx
service.istio.io/canonical-revision: latest
name: nginx-7ddf8d98f9-ngq4k
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: nginx-7ddf8d98f9
uid: 60ba242d-8bdc-469c-91b1-1f274164795b
resourceVersion: "27326"
uid: e45e2546-8e8d-4afc-905c-e8b32db3860b
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-fl78v
readOnly: true
- args:
- proxy
- sidecar
- --domain
- $(POD_NAMESPACE).svc.cluster.local
- --proxyLogLevel=warning
- --proxyComponentLogLevel=misc:error
- --log_output_level=default:info
- --concurrency
- "2"
env:
- name: JWT_POLICY
value: third-party-jwt
- name: PILOT_CERT_PROVIDER
value: istiod
- name: CA_ADDR
value: istiod.istio-system.svc:15012
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: INSTANCE_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: SERVICE_ACCOUNT
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.serviceAccountName
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: PROXY_CONFIG
value: |
{}
- name: ISTIO_META_POD_PORTS
value: |-
[
]
- name: ISTIO_META_APP_CONTAINERS
value: nginx
- name: ISTIO_META_CLUSTER_ID
value: Kubernetes
- name: ISTIO_META_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: ISTIO_META_INTERCEPTION_MODE
value: REDIRECT
- name: ISTIO_META_WORKLOAD_NAME
value: nginx
- name: ISTIO_META_OWNER
value: kubernetes://apis/apps/v1/namespaces/default/deployments/nginx
- name: ISTIO_META_MESH_ID
value: cluster.local
- name: TRUST_DOMAIN
value: cluster.local
image: docker.io/istio/proxyv2:1.17.1
imagePullPolicy: IfNotPresent
name: istio-proxy
ports:
- containerPort: 15090
name: http-envoy-prom
protocol: TCP
readinessProbe:
failureThreshold: 30
httpGet:
path: /healthz/ready
port: 15021
scheme: HTTP
initialDelaySeconds: 1
periodSeconds: 2
successThreshold: 1
timeoutSeconds: 3
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 10m
memory: 40Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1337
runAsNonRoot: true
runAsUser: 1337
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/workload-spiffe-uds
name: workload-socket
- mountPath: /var/run/secrets/credential-uds
name: credential-socket
- mountPath: /var/run/secrets/workload-spiffe-credentials
name: workload-certs
- mountPath: /var/run/secrets/istio
name: istiod-ca-cert
- mountPath: /var/lib/istio/data
name: istio-data
- mountPath: /etc/istio/proxy
name: istio-envoy
- mountPath: /var/run/secrets/tokens
name: istio-token
- mountPath: /etc/istio/pod
name: istio-podinfo
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-fl78v
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- args:
- istio-iptables
- -p
- "15001"
- -z
- "15006"
- -u
- "1337"
- -m
- REDIRECT
- -i
- '*'
- -x
- ""
- -b
- '*'
- -d
- 15090,15021,15020
- --log_output_level=default:info
image: docker.io/istio/proxyv2:1.17.1
imagePullPolicy: IfNotPresent
name: istio-init
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: 10m
memory: 40Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_ADMIN
- NET_RAW
drop:
- ALL
privileged: false
readOnlyRootFilesystem: false
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-fl78v
readOnly: true
nodeName: gke-cluster-1-pool-1-45d8490d-twxf
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: test
value: "true"
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: workload-socket
- emptyDir: {}
name: credential-socket
- emptyDir: {}
name: workload-certs
- emptyDir:
medium: Memory
name: istio-envoy
- emptyDir: {}
name: istio-data
- downwardAPI:
defaultMode: 420
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.labels
path: labels
- fieldRef:
apiVersion: v1
fieldPath: metadata.annotations
path: annotations
name: istio-podinfo
- name: istio-token
projected:
defaultMode: 420
sources:
- serviceAccountToken:
audience: istio-ca
expirationSeconds: 43200
path: istio-token
- configMap:
defaultMode: 420
name: istio-ca-root-cert
name: istiod-ca-cert
- name: kube-api-access-fl78v
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2023-03-14T06:16:09Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2023-03-14T06:16:11Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2023-03-14T06:16:11Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2023-03-14T06:16:02Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: containerd://95a275e1ce8cbfa55df2aa872dc6313706fbb9672a5290b1342e7ac1892563e7
image: docker.io/istio/proxyv2:1.17.1
imageID: docker.io/istio/proxyv2@sha256:2152aea5fbe2de20f08f3e0412ad7a4cd54a492240ff40974261ee4bdb43871d
lastState: {}
name: istio-proxy
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-03-14T06:16:10Z"
- containerID: containerd://f1aa9db4d8cbdadb0fd6125cd1dd94914d697a752cfbf1903ebcec1c05205bd9
image: docker.io/library/nginx:latest
imageID: docker.io/library/nginx@sha256:aa0afebbb3cfa473099a62c4b32e9b3fb73ed23f2a75a65ce1d4b4f55a5c2ef2
lastState: {}
name: nginx
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-03-14T06:16:10Z"
hostIP: 10.128.0.61
initContainerStatuses:
- containerID: containerd://688d48287ced31d3bff4dd8ef7d20279b7875255c3b5a7bafec93ce30f0027fc
image: docker.io/istio/proxyv2:1.17.1
imageID: docker.io/istio/proxyv2@sha256:2152aea5fbe2de20f08f3e0412ad7a4cd54a492240ff40974261ee4bdb43871d
lastState: {}
name: istio-init
ready: true
restartCount: 0
state:
terminated:
containerID: containerd://688d48287ced31d3bff4dd8ef7d20279b7875255c3b5a7bafec93ce30f0027fc
exitCode: 0
finishedAt: "2023-03-14T06:16:07Z"
reason: Completed
startedAt: "2023-03-14T06:16:07Z"
phase: Running
podIP: 10.4.3.4
podIPs:
- ip: 10.4.3.4
qosClass: Burstable
startTime: "2023-03-14T06:16:02Z" Extra infoKubernetes and istioctl version I used:
1.17.1 is compatible with 1.24.x version of Kubernetes based on https://istio.io/latest/docs/releases/supported-releases/#support-status-of-istio-releases Note that I deployed my own cluster-autoscaler in a GKE cluster to test this issue. This is a trimmed down version of my own notes. Let me know if you want me to add more info. |
For those watching, it looks like cluster-autoscaler I think we can still improve the I have also commented this idea here: #5594 (comment) |
Quoting my comment from the PR #5594 (comment):
|
Note that we are patching this feature back to 1.24, 1.25 and 1.26 as well. Expect the patch to be released this week: #5774 (comment) |
@thesuperzapper if you have any thoughts around #3947 (comment) let me know. I will create a separate ticket to enhance the annotation so that it supports |
Right now if you have a sidecar like istio's (istio/istio#19395) with an EmptyDir and the main service without it; the pod is excluded
A work around is to set the annotation
but that applies at the whole pod level and one would need to duplicate the logic checking for volumes to decide true/false
Can we instead have an annotation like
(or maybe a pattern,
ignored-containers: pattern
)That would skip said container(s) [volumes] for the determination of whether a pod is evictable or not ?
The text was updated successfully, but these errors were encountered: