Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Updating the vmsize for e2e cilium to avoid resource scarcity #2014

Merged
merged 1 commit into from
Jun 14, 2023

Conversation

vipul-21
Copy link
Contributor

@vipul-21 vipul-21 commented Jun 12, 2023

Reason for Change:

  • Updating the vm size used for e2e test for cilium

Issue Fixed:

  • Improves the pipeline failure due to unavailability of resources(memory ) in case of cilium e2e

Requirements:

Notes:

@vipul-21 vipul-21 force-pushed the singhvipul/ciliume2e branch 6 times, most recently from dbbab4b to 2550c63 Compare June 13, 2023 15:50
@vipul-21
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@vipul-21 vipul-21 force-pushed the singhvipul/ciliume2e branch from 2550c63 to 81108b4 Compare June 13, 2023 17:36
@vipul-21 vipul-21 added fix Fixes something. and removed do-not-merge work-in-progress labels Jun 13, 2023
@vipul-21 vipul-21 marked this pull request as ready for review June 13, 2023 17:42
@vipul-21 vipul-21 requested a review from a team as a code owner June 13, 2023 17:42
@vipul-21 vipul-21 requested a review from isaac-dasan June 13, 2023 17:42
@vipul-21 vipul-21 changed the title Testing the e2e test for cilium fix: Updating the vmsize and goldpinger resource limit for e2e cilium to avoid resource scarcity Jun 13, 2023
@@ -32,7 +32,7 @@ steps:
mkdir -p ~/.kube/
echo "Create AKS Overlay cluster"
make -C ./hack/swift azcfg AZCLI=az REGION=$(REGION_OVERLAY_CLUSTER_TEST)
make -C ./hack/swift overlay-no-kube-proxy-up AZCLI=az REGION=$(REGION_OVERLAY_CLUSTER_TEST) SUB=$(SUB_AZURE_NETWORK_AGENT_TEST) CLUSTER=${{ parameters.clusterName }}-$(make revision)
make -C ./hack/swift overlay-no-kube-proxy-up AZCLI=az REGION=$(REGION_OVERLAY_CLUSTER_TEST) SUB=$(SUB_AZURE_NETWORK_AGENT_TEST) CLUSTER=${{ parameters.clusterName }}-$(make revision) VM_SIZE=Standard_DS4_v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard_DS4_v2 is a HUGE jump up in VM size and cost (12x!). I originally picked B2s as the cheapest viable option - did you evaluate any other SKUs? Maybe we could start at Standard_B2ms which has double the mem at only double the cost?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried/tested with Standard_D4_v3 for the perf dashboard( which use goldpinger) that why choose a similar one for our e2e tests.
Will try to test with Standard_B2ms and see if that works.
Thanks !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Standard_B2ms worked without the resource limit. @rbtr . Thanks for the help
Will try another run to be sure.

@@ -76,3 +76,10 @@ spec:
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
resources:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resource block is omitted intentionally so that the goldpinger Pods will overprovision on the nodes. When you add this, only (node mem)/100Mi Goldpinger pods can be scheduled on the Node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the node mem is 8gb( in case Standard_B2ms) the number of pods that can be schedule on that node would be 80. And we need to scale the pods till 100.
Do you recommend adding a smaller limit to accommodate approx ~110 pods OR remove the limit altogether so that we can overprovision on that node

@vipul-21 vipul-21 force-pushed the singhvipul/ciliume2e branch 2 times, most recently from 7dccfc5 to 13e9c22 Compare June 13, 2023 18:23
@vipul-21 vipul-21 changed the title fix: Updating the vmsize and goldpinger resource limit for e2e cilium to avoid resource scarcity fix: Updating the vmsize for e2e cilium to avoid resource scarcity Jun 13, 2023
@vipul-21
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@vipul-21 vipul-21 requested a review from rbtr June 13, 2023 20:15
@vipul-21 vipul-21 enabled auto-merge (squash) June 13, 2023 21:43
@vipul-21 vipul-21 force-pushed the singhvipul/ciliume2e branch from 13e9c22 to 4b495eb Compare June 13, 2023 22:59
@vipul-21 vipul-21 merged commit 6b805e5 into master Jun 14, 2023
@vipul-21 vipul-21 deleted the singhvipul/ciliume2e branch June 14, 2023 01:17
thatmattlong pushed a commit that referenced this pull request Jul 27, 2023
thatmattlong pushed a commit that referenced this pull request Jul 28, 2023
@thatmattlong thatmattlong mentioned this pull request Jul 28, 2023
4 tasks
thatmattlong added a commit that referenced this pull request Jul 31, 2023
* fix: assume invalid semver CNI has the required dump state command (#2078)

* fix: Updating the vmsize for e2e cilium to avoid resource scarcity (#2014)

CI: Testing the e2e test for cilium

---------

Co-authored-by: Vipul Singh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Infra or tooling. fix Fixes something.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants