Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade VMs type and number of nodes for GKE cluster #4650

Merged
merged 2 commits into from
Jan 30, 2025

Conversation

ViniciustCosta
Copy link
Collaborator

Following the work being done to migrate chrome-tests-syncer to k8s, it was noticed that the current spec of GKE clusters is not enough for scheduling the tests-syncer as a k8s cronjob. During the tests done, the cronjob never gets schedule due to insufficient resources (memory/cpu).
Thus, this PR aims at upgrading the machine types and the number of nodes available.

Related bug: b/389048599

@jonathanmetzman
Copy link
Collaborator

This PR seems harmless enough so feel free to merge.
Could you clarify what you mean by "is not enough for scheduling the tests-syncer as a k8s cronjob. During the tests done, the cronjob never gets schedule due to insufficient resources (memory/cpu)." and why this means we need to increase the nodes and the specs of the nodes? Is it not being scheduled because there aren't enough nodes? Or because the specs it requires are too much for any of the nodes? Or both?

@ViniciustCosta
Copy link
Collaborator Author

Could you clarify what you mean by "is not enough for scheduling the tests-syncer as a k8s cronjob. During the tests done, the cronjob never gets schedule due to insufficient resources (memory/cpu)." and why this means we need to increase the nodes and the specs of the nodes? Is it not being scheduled because there aren't enough nodes? Or because the specs it requires are too much for any of the nodes? Or both?

So, I noticed that tests-syncer used to run in a single GCE instance of type E2-standard-16, but using only like 5-10% of the reserved vCPU (around 2.5). I tried to request similar resources in the cronjob spec (cpu:2, memory:2G), however the pod status remained pending and when I inspected it, the fail was:
"FailedScheduling: 0/6 nodes are available: 2 Insufficient memory, 6 Insufficient cpu. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod."

I suppose that means that both the amount of nodes and their specs are not enough (but I'm not sure, please tell me if this makes sense), and maybe we will only be able to check if these resources are enough for tests-syncer if the cronjob actually gets scheduled.

@ViniciustCosta ViniciustCosta merged commit ad7e815 into master Jan 30, 2025
7 checks passed
@ViniciustCosta ViniciustCosta deleted the increase_gke_nodes branch January 30, 2025 19:18
@jonathanmetzman
Copy link
Collaborator

Could you clarify what you mean by "is not enough for scheduling the tests-syncer as a k8s cronjob. During the tests done, the cronjob never gets schedule due to insufficient resources (memory/cpu)." and why this means we need to increase the nodes and the specs of the nodes? Is it not being scheduled because there aren't enough nodes? Or because the specs it requires are too much for any of the nodes? Or both?

So, I noticed that tests-syncer used to run in a single GCE instance of type E2-standard-16, but using only like 5-10% of the reserved vCPU (around 2.5). I tried to request similar resources in the cronjob spec (cpu:2, memory:2G), however the pod status remained pending and when I inspected it, the fail was: "FailedScheduling: 0/6 nodes are available: 2 Insufficient memory, 6 Insufficient cpu. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod."

I suppose that means that both the amount of nodes and their specs are not enough (but I'm not sure, please tell me if this makes sense), and maybe we will only be able to check if these resources are enough for tests-syncer if the cronjob actually gets scheduled.

Hmm...I wonder why the other cronjobs could run? What machine specs do they request?

@ViniciustCosta
Copy link
Collaborator Author

Could you clarify what you mean by "is not enough for scheduling the tests-syncer as a k8s cronjob. During the tests done, the cronjob never gets schedule due to insufficient resources (memory/cpu)." and why this means we need to increase the nodes and the specs of the nodes? Is it not being scheduled because there aren't enough nodes? Or because the specs it requires are too much for any of the nodes? Or both?

So, I noticed that tests-syncer used to run in a single GCE instance of type E2-standard-16, but using only like 5-10% of the reserved vCPU (around 2.5). I tried to request similar resources in the cronjob spec (cpu:2, memory:2G), however the pod status remained pending and when I inspected it, the fail was: "FailedScheduling: 0/6 nodes are available: 2 Insufficient memory, 6 Insufficient cpu. preemption: 0/6 nodes are available: 6 No preemption victims found for incoming pod."
I suppose that means that both the amount of nodes and their specs are not enough (but I'm not sure, please tell me if this makes sense), and maybe we will only be able to check if these resources are enough for tests-syncer if the cronjob actually gets scheduled.

Hmm...I wonder why the other cronjobs could run? What machine specs do they request?

Yeah, great question, and actually no other cronjob requested more than 1.5 CPU. Looking into our GKE nodes, I noticed that all nodes have 1.93 allocatable CPUs, so that's probably why the cronjob never gets schedule (there are no nodes with that amount of resources available!!). I will try to reduce chrome-tests-syncer to 1.5 CPU and see how much time it takes to run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants