-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load balancers getting deleted randomly on deletion of ingress records with a whole diff groupname #3304
Comments
From what I found after going through the code
This code piece can mark the loadbalancer for replacement / deletion when there's the spec does not match. My hunch is that whatever caused the below log line resulted in spec going out of sync and resulting in this function returning true.
|
@someshkoli, Hi, do you have multiple controllers in difference namespaces? |
@oliviassss -ve only one controller |
BTW, In general, each Ingress group is reconciled independently, changing one Ingress group shouldn't impact another. |
@M00nF1sh Hey,
yes exactly my concern, how did this happen in first place. I'm assuming this is what caused the lb to get marked as deleted -> causing controller to send finalizer null signal to the ingress record -> then making the ingress to get queued for deletion.
That is how its supposed to behave, ive no idea why this happened. (twice) |
I'm having trouble figuring out where that line could be logged from. |
@johngmyers which one ?
this ? So I found that this error pops up when you have |
Another interesting thing that I found while trying to replicate this entire thing PS: this is whole new thing, might raise new issue for this when a faulty ingress (i1) is applied with group g1 and host entry h1 -> reconcilation fails -> alb is not allocated -> apply another faulty ingress (i2) with group g1 and host entry h2 You will notice that ingress record i2 now has host entry as h1, I thought this is a reconciliation issue and might get fixed post fixing the fault in ingress, but on fixing the fault it kept the host h1 in ingress i2 💀 PS: by fault above I mean, set |
We also encountered this issue. An Annotations in use for the 3 ingresses that had their LoadBalancers incorrectly deleted:
The This occurred on v2.4.5 (image version), helm chart v1.4.6 It is extremely alarming that this can happen. |
finally someone who can relate, we've had such outage twice and since there was no update on the conversation I had started thinking that I might've deleted it by mistake (somehow randomly). I tried reproducing it but couldn't, wau ? |
Controller logs for the sync that did the inappropriate deletion would be helpful. |
Unfortunately logs for this controller weren't being shipped at the time and the pods were restarted during troubleshooting so we lost them. I do have the CloudTrail events that show that the IRSA role the controller was using is what did the deletion, but not much other than that. |
I have container logs, lmk if you want me to send ya ? |
Oh also, I should note that deleting and recreating the
I waited > 10 hours for the default |
@blakebarnett, this is a separate issue, see: #3383 (comment) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Describe the bug
This has happened twice and all that happened earlier had happened this time as well.
We had few helm releases in
namespace=namespace1
in which we were creating ingress record. The group name attached to those ingress records weregroup1
.There were few other helm releases in
namespace=namespace2
in which we were creating ingress records. The group name attached to these ingress records weregroup2
.Now there was some error on ingress records on
namespace2
which were not able to reconcile due to following error.We never paid attention to this until today when we saw the logs.
Now since helm releases in
namespace1
were stale we went ahead and deleted all those helm releases resulting in all the ingress records to get deleted. (assuming this also triggers reconcilation of ingress records in the controller)This resulted in ingress records in
namespace2
to get delete (dont know how why).From debug I found a audit log where alb controller is setting finalizers for these ingress as null (not pasting here rn, lmk if its needed).
From alb controller logs I Found the following log lines
Steps to reproduce
Mentioned above ^
Expected outcome
Ingress / Loadbalancers of
group2
should not get deleted when deletion is triggered forgroup1
Environment
production
Additional Context:
The text was updated successfully, but these errors were encountered: