Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podAffinity #985

Closed
olemarkus opened this issue Dec 14, 2021 · 8 comments
Closed

podAffinity #985

olemarkus opened this issue Dec 14, 2021 · 8 comments
Assignees
Labels
feature New feature or request scheduling Issues related to Kubernetes scheduling

Comments

@olemarkus
Copy link
Contributor

Tell us about your request
What do you want us to build?

I know you'll love this: Support podAffinity

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
This is a sibling issue to #942 , but specifically on podAffinity.

Cilium's hubble has component called hubble-relay that needs to run on the same node as a cilium agent.
The cilium install manifests has this podAffinity as non-optional: https://github.com/cilium/cilium/blob/master/install/kubernetes/cilium/templates/hubble-relay/deployment.yaml#L34-L43

This is a fairly essential component to those using Cilium.

On many clusters, one can assume cilium agent on all nodes, but there are clusters with Fargate nodes and Windows nodes using Cilium, where the agent wouldn't run.

Are you currently working around this issue?

In our case, we can patch out that podAffinity.

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@olemarkus olemarkus added the feature New feature or request label Dec 14, 2021
@ellistarn ellistarn added the scheduling Issues related to Kubernetes scheduling label Dec 14, 2021
@tmokmss
Copy link
Contributor

tmokmss commented Jan 27, 2022

+1 to podAffinity support

We're investigating if we can use Karpenter with Agones, and found that Agones use podAffinity for pod scheduling to pack as much as possible into the smallest set of nodes. doc
Because Karpenter currently ignores pods with podAffinity, we cannot use Karpenter to provision nodes for Agones pods.

@ellistarn
Copy link
Contributor

ellistarn commented Jan 27, 2022

For hostname affinity, what if the pods aren't created at the same time? If Karpenter only sees the first pod, it will launch a node for it. If the next pod comes in a minute later, it's too late for that pod to be included in the scheduling algorithm. This doesn't exist for anti-affinity because scheduling apart achieves the same objective.

FWIW, I think this is a similar problem for the cluster autoscaler.

I'd love to make progress on podaffinity, but I don't have a viable path forward. How are you envisioning it? I could potentially see soft pod affinity, where we attempt to schedule them if we happen to see the pods together.

@tmokmss
Copy link
Contributor

tmokmss commented Jan 27, 2022

Hi thanks for the suggestion. As the doc says

The default Kubernetes scheduler doesn’t do a perfect job of packing, but it’s a good enough job for what we need - at least at this stage.

Agones itself admits it isn't perfect, but they seem to know it works well at most cases.

btw I'll also try the cluster autoscaler to see how Agones works with it, thanks!

@ellistarn
Copy link
Contributor

Reading through agones, preferred pod affinity would be doable.

@tmokmss
Copy link
Contributor

tmokmss commented Jan 27, 2022

Sounds great! Dedicated game servers tend to need much rapid auto scaling, so Karpenter with Agones would be fantastic I believe 👍

@ellistarn
Copy link
Contributor

Further, as I study the podaffintiy page, I think it would also be doable to implement zonal podaffinity (e.g. schedule all pods to a specific zone).

@kotlovs
Copy link

kotlovs commented Feb 9, 2022

Just another vote for podAffinity.
In our environment, we run a large number of Spark applications simultaneously. It would be very desirable to set a preference that any EC2 instance contains pods of only one Spark application (only pods with the same label) if possible.
In the case of Cluster Autoscaler, I can almost achieve this with

preferred podAffinity
  podAffinity:
	preferredDuringSchedulingIgnoredDuringExecution:
	- weight: 1
	  podAffinityTerm:
		labelSelector:
		  matchExpressions:
		  - key: spark/app-name
			operator: In
			values:
			- {self.application_name}
		topologyKey: kubernetes.io/hostname
Without this specification, pods of different applications are scattered across different EC2 instances. Further, some applications finish work much earlier than others, and we are left with a large number of partially filled instances. But if there are only pods of one application on the instance, then after the end of this application, the instance is released entirely and becomes available for deletion. At the same time, this should be a soft rule, so that if the ec2 instance has some free place, then it can still be filled to the end by pods with a different label.

@tzneal tzneal self-assigned this Apr 5, 2022
tzneal added a commit that referenced this issue Apr 13, 2022
 implement pod affinity & anti-affinity

- implement pod affinity/anti-affinity
- rework topology spread support

Fixes #942 and #985 

Co-authored-by: Ellis Tarn <[email protected]>
@tzneal
Copy link
Contributor

tzneal commented Apr 14, 2022

If you've got a fairly recent version of Karpenter already running, you can drop in this snapshot controller image to test out pod affinity & pod anti-affinity. This is just a snapshot, not meant for production, etc.:

kubectl set image deployment/karpenter -n karpenter controller=public.ecr.aws/z4v8y7u8/controller:08ac9ec303aabe2da56edb0ee0e235a60a287206

@tzneal tzneal closed this as completed Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request scheduling Issues related to Kubernetes scheduling
Projects
None yet
Development

No branches or pull requests

5 participants