-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Telemetry webhook validation #482
Telemetry webhook validation #482
Conversation
a7a62a5
to
f698284
Compare
I'm trying to figure out what is causing the tests to fail on this one. It isn't really straight forward, especially considering some tests are apparently still flaky. I'm uncertain if I can progress this without #472 being merged. One source of new flakes appears to be the envtest
This definitely should not be happening, as this is a transient failure due to optimistic locking and should never be fatal. It doesn't happen locally but occurs on GHA because the Patch call has no timing tolerance. Wrapping in an eventually now to see if that improves things. |
I've wrapped the patch operation in an
The StopDatacenter test still fails on GHA, and now fails locally as well (BONUS!) This is going well. |
You shouldn't need to wrap writes in Eventually calls unless you are doing so to add retry logic. It's not needed for consistency though. |
Yes, but I need to add retry logic because of the optimistic locking error above |
1f58be8
to
2611db0
Compare
Writing old object 100 times will not make a difference, it's still the old object. You need to refresh it (Get) |
Oh wow, how did I miss that. Thanks @burmanm you just saved me a fair bit of time honestly. |
055e6ff
to
7a526ba
Compare
…y. It is breaking the unit test workflow on GHA.
7a526ba
to
c93a7cf
Compare
… utils from `pkg/test`. Add test for telemetry validation in webhook.
…is possible when commonlabels defined.
…lly defined in KCluster.
ae3557f
to
706a70b
Compare
706a70b
to
347fffc
Compare
I think this PR is ready for preliminary review. I've fixed 5 instances where tests were fragile due to being timing sensitive. There are 8 tests still failing. All fail due to timing related issues, 6/8 are in the multi-cluster suite. I'm going to investigate the two that are failing in the single-cluster suite, I'm hoping the multi-cluster failures will be dealt with by #472, but we can look to make the calls to the remote cluster concurrent if that doesn't resolve things. (Edited as a different set of tests is now failing...) |
Make patches in stargate tests async-safe. Make e2e cleanup retry on failure.
This is largely outdated and we never managed to reach a consensus so I'll go ahead and close it. |
What this PR does:
Validates telemetry specifications within the K8ssandraCluster, having regard for the installation status of Prometheus across each K8s DC in the cluster and the telemetry requested for it.
This is preferred to the current situation where validation occurs during reconciliation for two reasons:
Which issue(s) this PR fixes:
Fixes #235
Checklist