Adding Wave Autoscale to the Partners #321

Ari-suhyeon · 2024-11-27T09:18:59Z

Description of changes:
This pull request integrates Wave Autoscale (STCLab) into the AWS EKS testing framework. It includes the following artifacts:

FluxCD configuration
External Secret

It needs two keys in Parameter Store. The keys are included in the self-assessment spreadsheet we sent.

GHRC_TOKEN
WA_LICENSE

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…mRepository, and HelmRelease

…ration

…nced security

elamaran11

Functional Test job is missing. Please check other partner submission for more details. Also please send us your external secrets with full licsense via email.

…ix license validation condition

…nd enhance health check script for improved error handling

mikemcd3912 · 2024-12-02T23:09:19Z

Thanks for the PR!

I have loaded the secrets you provided into our testing account, however when I went to go test that it was having a little trouble scheduling on our testing environments due to the resource demands. Is the wave-autoscale-autopilot container meant to request 3cpu and 3000Mi memory?

…rce allocation

…e resource usage

mikemcd3912 · 2024-12-03T18:05:17Z

Functional Test job is missing. Please check other partner submission for more details

Hi @Ari-suhyeon,

I was able to get the solution deployed and test it across our EKS environments, and I noticed that with the functional testing it appears the focus is endpoint health checks and it appeared that the test was still completing as successful when pods were in a pending state for some reason. (Point 3 on the functional job test requirements). Can we update this to include a test of the functionality as well to make sure those pods are both up and functioning as intended?

…d enhance health check logic

Ari-suhyeon · 2024-12-04T01:28:24Z

Hi @mikemcd3912

I have identified an issue in the status-check logic within the cronjob.
The logic has been updated to validate the status by checking the response values.
Additionally, I have added a StatefulSet check logic that includes communication with the Kubernetes API.

Please review the changes at your convenience.

mikemcd3912 · 2024-12-06T22:49:52Z

Hi @Ari-suhyeon,

On the functional testing what we're looking for a test of the main function of the solution to show that it is up and running properly, so the health check with the K8s API is still not quite as thorough as we need. One example of this is a partner Tetrate's solution that creates and the subsequently destroys ingress resources to test the functionality of their product who's function is to manage networking and ingress

Based on my understanding of your solution ideally we'd like to see a test that validates that the intelligent scaling is working as expected, so possibly creating a temporary pod that will be resource constrained from the start then checking whether wave-autoscale performs the necessary autoscaling operation on it before cleaning up the testing resources.

Let me know if you have any questions or if I can provide additional clarification!

Ari-suhyeon · 2024-12-09T00:47:16Z

Hi @mikemcd3912 ,

I understood.
To test the auto-scaling functionality, we need to collect metrics, and for this, an agent must be installed as a DaemonSet.
I hope this is acceptable to you.
As my schedule does not allow for it this week, I will proceed with the work next week and push the changes accordingly.

… check logic

…st deployment configuration

Ari-suhyeon · 2024-12-11T11:15:19Z

Hi @mikemcd3912 ,
We have added a script to verify the scale-out functionality of Wave Autoscale.
This includes deploying a DaemonSet-based agent for metric collection and adding the target Deployment for scaling out.

Additionally, with the version update of Wave Autoscale, it is necessary to remove the previously installed PV and PVC. Please ensure this step is checked carefully.

elamaran11 · 2024-12-12T16:03:53Z

Looks like your deployment wa-test-dp is failing.
Specifically, it looks like these pods are failing:

Pod: wa-test-dp-549f5f7f75-2mtcl.
Pod: wa-test-dp-549f5f7f75-6sn6n.
Pod: wa-test-dp-549f5f7f75-6zcw5.
Pod: wa-test-dp-549f5f7f75-c4j8x.
Pod: wa-test-dp-549f5f7f75-cgfn8.
Pod: wa-test-dp-549f5f7f75-f5k85.
Pod: wa-test-dp-549f5f7f75-g78sx.
Pod: wa-test-dp-549f5f7f75-hgmrw.
Pod: wa-test-dp-549f5f7f75-jfrbx.
Pod: wa-test-dp-549f5f7f75-kktgg.
Pod: wa-test-dp-549f5f7f75-ljn4p.
Pod: wa-test-dp-549f5f7f75-mc4dt.
Pod: wa-test-dp-549f5f7f75-mzhf5.
Pod: wa-test-dp-549f5f7f75-prfw6.
Pod: wa-test-dp-549f5f7f75-qrxd9.
Pod: wa-test-dp-549f5f7f75-sswh8.
Pod: wa-test-dp-549f5f7f75-thb79.
Pod: wa-test-dp-549f5f7f75-tzvsc.
Pod: wa-test-dp-549f5f7f75-v4zv6.
Pod: wa-test-dp-549f5f7f75-v7bgn.
Pod: wa-test-dp-549f5f7f75-wsdh2.
Pod: wa-test-dp-549f5f7f75-ww4ld.

Ari-suhyeon · 2024-12-12T16:26:28Z

Hi @elamaran11
Please let us know what specifically you mean by it failing.
Logs from the cronjob would be great too.
Or are you just saying that the wa-test-dp Deployment failed, not the cronjob?
If so, please provide logs from the pod as well.
Also, did you remove the pv and pvc of the existing wave-autoscale and redeploy it?

elamaran11 · 2024-12-12T16:46:24Z

@mikemcd3912 Can you report more details on this error.

mikemcd3912 · 2024-12-12T19:14:16Z

Hi @Ari-suhyeon,

Thanks for the updates! The message about all those failed pods was part of an automation and appear to have been remedied by the redeployment of the solution/volumes after those updates so I'll keep you updated as I complete the rest of the testing. So far our VMware environment test was successful so I believe the current iteration is looking promising

elamaran11 · 2024-12-12T22:19:14Z

Looks like your deployment wa-test-dp is failing.
Specifically, it looks like these pods are failing:

Pod: wa-test-dp-8ccbc64df-5rxrf.

mikemcd3912 · 2024-12-12T22:22:54Z

Looks like your deployment wa-test-dp is failing. Specifically, it looks like these pods are failing:
* Pod: `wa-test-dp-8ccbc64df-5rxrf`.

Feel free to ignore this - it appears that we're reaching cpu capacity on one of our test clusters and that's causing some automated messages to go out when the scheduler doesn't have space for new pods

mikemcd3912

Vmware - Success
Hybrid - Success
Baremetal - Success
Cloud Bottlerocket x86 - Success
Cloud Bottlerocket ARM - Success
Auto Mode - Success

Pods Deploy and testers complete successfully in all environments - LGTM

Functional job has been added and completes successfully in all environments

Ari-suhyeon added 6 commits November 27, 2024 17:33

Add Wave Autoscale resources including ExternalSecret, Namespace, Hel…

cc6986c

…mRepository, and HelmRelease

Rename HelmRepository from wave-autoscale-charts to wave-autoscale-helm

2a93296

Update HelmRepository namespace from wave-autoscale to flux-system

1a93771

Enable GitHub Container Registry in Wave Autoscale configuration

2a9a4ea

Add WA_API_SERVER_HOST environment variable to Wave Autoscale configu…

ee3607f

…ration

Refactor WA_LICENSE environment variable to use secretKeyRef for enha…

6be45be

…nced security

elamaran11 requested a review from mikemcd3912 November 28, 2024 21:03

elamaran11 previously requested changes Nov 28, 2024

View reviewed changes

Ari-suhyeon added 3 commits December 2, 2024 09:42

Add CronJob for Wave Autoscale health checks

edc6163

Update Wave Autoscale health check schedule to run every minute and f…

0297c20

…ix license validation condition

Update Wave Autoscale health check schedule to run every 10 minutes a…

b146f06

…nd enhance health check script for improved error handling

Ari-suhyeon added 2 commits December 3, 2024 09:01

Add resource requests for Wave Autoscale components to optimize resou…

36942cd

…rce allocation

Reduce CPU resource requests for Wave Autoscale components to optimiz…

9f22fd7

…e resource usage

Ari-suhyeon added 2 commits December 4, 2024 10:14

Update Wave Autoscale health check schedule to run every 5 minutes an…

7fd8487

…d enhance health check logic

Add self statefulset check to Wave Autoscale health check script

330092a

Ari-suhyeon added 6 commits December 10, 2024 14:00

Add Helm repository and release configuration for Wave Autoscale agent

34456fc

Update Wave Autoscale Helm chart version to 1.10.2

0550fdd

Downgrade Wave Autoscale Helm chart version to 1.10.0

169cb0e

Update Wave Autoscale Helm chart version to 1.10.2 and enhance health…

a507c7d

… check logic

Add service account and enhance health check for Wave Autoscale cronjob

48e5016

Enhance Wave Autoscale cronjob with extended health checks and add te…

6faae64

…st deployment configuration

mikemcd3912 approved these changes Dec 12, 2024

View reviewed changes

mikemcd3912 merged commit e82551a into aws-samples:main Dec 12, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Wave Autoscale to the Partners #321

Adding Wave Autoscale to the Partners #321

Ari-suhyeon commented Nov 27, 2024

elamaran11 left a comment

mikemcd3912 commented Dec 2, 2024

mikemcd3912 commented Dec 3, 2024 •

edited

Loading

Ari-suhyeon commented Dec 4, 2024

mikemcd3912 commented Dec 6, 2024 •

edited

Loading

Ari-suhyeon commented Dec 9, 2024 •

edited

Loading

Ari-suhyeon commented Dec 11, 2024

elamaran11 commented Dec 12, 2024

Ari-suhyeon commented Dec 12, 2024

elamaran11 commented Dec 12, 2024

mikemcd3912 commented Dec 12, 2024

elamaran11 commented Dec 12, 2024

mikemcd3912 commented Dec 12, 2024

mikemcd3912 left a comment

Adding Wave Autoscale to the Partners #321

Adding Wave Autoscale to the Partners #321

Conversation

Ari-suhyeon commented Nov 27, 2024

elamaran11 left a comment

Choose a reason for hiding this comment

mikemcd3912 commented Dec 2, 2024

mikemcd3912 commented Dec 3, 2024 • edited Loading

Ari-suhyeon commented Dec 4, 2024

mikemcd3912 commented Dec 6, 2024 • edited Loading

Ari-suhyeon commented Dec 9, 2024 • edited Loading

Ari-suhyeon commented Dec 11, 2024

elamaran11 commented Dec 12, 2024

Ari-suhyeon commented Dec 12, 2024

elamaran11 commented Dec 12, 2024

mikemcd3912 commented Dec 12, 2024

elamaran11 commented Dec 12, 2024

mikemcd3912 commented Dec 12, 2024

mikemcd3912 left a comment

Choose a reason for hiding this comment

mikemcd3912 commented Dec 3, 2024 •

edited

Loading

mikemcd3912 commented Dec 6, 2024 •

edited

Loading

Ari-suhyeon commented Dec 9, 2024 •

edited

Loading