Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump Kubecost to v2.0.2 #208

Merged
merged 4 commits into from
Feb 8, 2024
Merged

Conversation

chipzoller
Copy link
Contributor

Signed-off-by: chipzoller [email protected]

Issue #, if available:

Description of changes:

Bumps the Kubecost add-on to version 2.0.0.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Signed-off-by: chipzoller <[email protected]>
@elamaran11
Copy link
Contributor

Please validate this too!

Signed-off-by: chipzoller <[email protected]>
@chipzoller
Copy link
Contributor Author

Fixed and verified with the test job locally.

@mikemcd3912
Copy link
Contributor

Hi @chipzoller,

Thank you for submitting the update PR. When testing in our vmware environment I am getting an error from the healthtest job with the following output:

k logs -n kubecost kubecost-healthtest-28455820-w2mch
Getting current Kubecost state.
Failed to fetch Kubecost configuration. Response was
sh: out of range

Do you happen to know why this might be unable to fetch the config?

@chipzoller
Copy link
Contributor Author

Are you deploying with a name other than "kubecost"? I tested this on my end with the changes reflected in this PR and the health test succeeds.

@elamaran11
Copy link
Contributor

Hi @chipzoller

  • We changed the requirements last year for the functional job. Health checks are no longer considered a functional test job, please check our requirements. Since you are already changing the test job, please update it to align to requirements pls.
  • Also our environment is configured via flux to exactly deploy the helm charts and test jobs in incremental fashion so its the same as your run if you are using FLUX.
  • I also see you have removed ClusterRole and stuff, not sure if something to do with that. Please double check on this and let us know.

@mikemcd3912
Copy link
Contributor

mikemcd3912 commented Feb 8, 2024

Thanks for your patience while I researched that issue - The naming does seem to be the issue, since we're using flux as our CD mechanism the helm release was deployed with resources that were prefixed with the namespace to avoid collisions, so our service looked like this:

NAME                                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
service/kubecost-kubecost-cost-analyzer              ClusterIP   10.109.124.79    <none>        9003/TCP,9090/TCP   19m

I think we could restore functionality for our side while maintaining yours with that svc lookup that was in the original test file - is there a reason we moved away from that method with this update?

Also as Ela mentioned we are looking for a more in-depth evaluation of the health of the deployment through functional testing, so we will need the health check itself updated to align with the functional testing requirements before we can approve the PR

@chipzoller
Copy link
Contributor Author

Can I just ask before addressing comments, is it not acceptable to use Helm tests here? If you look at the chart, there is a test case built in allowing you or any user to simply run helm test <release> after installation which will essentially perform the same validation.

@elamaran11
Copy link
Contributor

Can I just ask before addressing comments, is it not acceptable to use Helm tests here? If you look at the chart, there is a test case built in allowing you or any user to simply run helm test <release> after installation which will essentially perform the same validation.

@chipzoller Everthing we drive here including functional is driven through GitOps via Flux including functional testing. So the expectation is to submit a Kubernetes CronJob which runs on daily schedule to validate the functionality of the product. Please check the functional design requirements.

Signed-off-by: chipzoller <[email protected]>
@chipzoller
Copy link
Contributor Author

Fixed tests. They check out on my end.

Signed-off-by: chipzoller <[email protected]>
@chipzoller
Copy link
Contributor Author

Also as Ela mentioned we are looking for a more in-depth evaluation of the health of the deployment through functional testing, so we will need the health check itself updated to align with the functional testing requirements before we can approve the PR

As far as this goes, it is a functional test and not a health check. I believe we even discussed this in my previous PR. This check is basically an end-to-end smoke test that Kubecost is not only health but that all its dependencies are health. Even though it looks simple, the curl to /model/getConfigs internally traverses all the code paths in order to get a response. The response here is the full operating configuration of Kubecost at that time. If any one image is either down or not returning basic functional info, there will not be a 200 response returned.

If you're saying this still isn't good enough then the third requirement, quoted below, needs significant expansion and clarification.

  1. Healthchecks, service endpoints checks or any other technical checks do not represent sufficient coverage required for the functional test

@elamaran11
Copy link
Contributor

/model/getConfigs

@chipzoller Now i notice /model/getConfigs which basically qualifies for a functional test. I also remember our conversation on this in previous PR. Im good on this. Lets get the PR tested and move along. On Review looks good.

@mikemcd3912 Lets get it running in our labs and shared feedback to Chris.

Copy link
Contributor

@mikemcd3912 mikemcd3912 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good with that new revision for v2.0.2, deployment and test job are running in our vmware environment successfully! Thanks @chipzoller for working with us on those updates, testing proceeding on other environments now

@mikemcd3912
Copy link
Contributor

Working in our environments, No issues noted

@mikemcd3912 mikemcd3912 merged commit 71cdf4e into aws-samples:main Feb 8, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants