-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fluxcd suddenly deletes all resources though git is unchanged #3148
Comments
The issue just became a bit clearer (even though not less scary). The very same thing happened on a different cluster at almost the same point in time. The one thing the two occurrences had in common was that they synced with the same GIT repository. The question is, how can we safeguard against issues like this in the future? Any hints are welcome. |
OK... we tracked it down, and since it was such a pain for us, I'd like to share our findings here. The root cause was in the generator command in
The problem with pipelines is that the overall exit code is generally the exit code of the last command, and even though kustomize may fail due to invalid input or a failed/interrupted git checkout, envsubst will always succeed, even if the pipeline produces empty output. This can lead to the deletion of resources, if flux garbage collection is on. In order to fix it, we changed our generator command as follows: |
Ouch. I'm going to keep this open for a while, since I don't want to lose track of that resolution note @acm-073 (thanks for boiling it down!) maybe there is a doc mention that we can include in the release notes prominently, so new adopters of
|
We have his this issue over the weekend. But we do not use pipes/ envsusbt. https://github.com/hmcts/cnp-flux-config/blob/master/k8s/preview/cluster-00-overlay/.flux.yaml#L4 Thankfully, it occurred on our development clusters where all our admin apps like ingress controller also got deleted. Any thoughts on what could have caused this and what can be done to avoid it ? |
We had the same issue. |
What version of Flux are you both running? |
@kingdonb We are using Flux :1.20.2 |
For me was 1.21.0 and happend with |
I don't have any concrete reason to suggest this has been fixed in the latest version of Flux v1 (at this time, 1.22.2) However, I would still suggest upgrading, as we cannot very easily support older versions of Flux than the latest maintained release. I can't imagine a higher priority issue and I'm not exactly certain all affected individuals have been able to isolate and resolve it on their own clusters. I would love to hear a suggestion that pinpoints the source, since it seems that differing reports are coming in about which features are enabled on affected deployments. I have not seen this issue in action on any of my own repos as of yet, or any clear suggestion of how to reliably reproduce it. Are all affected individuals using kustomize with generators? If not envsubst, are we using |
We have now upgraded to latest version. Will report if we still hit any such issues. |
At this time, we hope you have already begun migration away from Flux v1 / legacy over to the current version of Flux. We hope you've been able to upgrade to Flux v2, and note that Flux v1 remains in maintenance mode. Bugs with Flux v1 can be addressed as long as it remains in maintenance, according to the plan laid out in the Flux Migration Timetable. 👍 Thanks for using Flux. |
Describe the bug
Flux suddenly deleted all resources it managed though no change was pushed to git.
This is a severe issue - has anybody else observed this?
To Reproduce
Since this seems to be a sporadic issue we have not seen before, I can't give a description how to reproduce the issue.
Expected behavior
If git remains unchanged, flux should not delete stuff.
Logs
The log below shows the last successful sync and then the start of the delete action. Please observe that the git commit id is unchanged between the last apply and the delete action.
Additional context
The text was updated successfully, but these errors were encountered: