-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: helm operator uninstall is not properly checking for existing release #3431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past (with Helm 2 at least) different drivers returned different errors (or not) so the len(h)
check was necessary to handle all the nuances of the different drivers.
I haven't checked recently, but it could be that all drivers now actually do use the same semantics when a release is not found.
Either way, I'm having trouble seeing how this would solve the problem. If there is an error decoding the release, won't History()
return that error, thus meaning we'll bubble that error up before getting to the if len(h) == 0
check (which only happens when err
is nil)?
/hold
pkg/helm/release/manager.go
Outdated
if errors.Is(err, driver.ErrReleaseNotFound) { | ||
return nil, err | ||
} | ||
return nil, fmt.Errorf("failed to get release history: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be necessary since we're wrapping the error and using errors.Is()
on the calling side to see if it was a driver.ErrReleaseNotFound
error.
@joelanford thanks for your quick reply.
https://github.com/helm/helm/blob/v3.2.4/pkg/storage/driver/secrets.go#L133-L138 Other drivers like memory https://github.com/helm/helm/blob/v3.2.4/pkg/storage/driver/memory.go#L145-L146 |
Interesting. Of course the one I looked at (helm/[email protected]/pkg/storage/driver/cfgmaps.go#L79) returns an error on decode rather than just logging and continuing. |
Opened the Helm issue helm/helm#8458
I think that's the So it seems like Get and Query behaves differently as well. I am going to mention that in the Helm. Thanks. |
@joelanford Hi Joe, could you please take a second look at this PR. I've checked all the drivers (cfgmaps,sql,secret,memory) under https://github.com/helm/helm/tree/0ad800ef43d3b826f31a5ad8dfbb4fe05d143688/pkg/storage/driver |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
/remove-lifecycle stale |
Hi @mikeshng, I understand that it still a valid pr. Am I right? If yes, could you please add a change log entry for this one and rebase it with the master and push again? @joelanford since you was doing this review wdyt? |
As far as I can tell, the current state of things is that:
We're handling case 1 correctly right now. If case 2 happens we fall into our One thing I'm still confused on though. Are there any outstanding PRs that will have case 2 return the decode error rather than logging and ignoring? I got the impression that was happening in some of the comments. So having said all that, I'm in agreement that this should fix it. @mikeshng Before we merge, could you post the logs of runs before and after this fix. If I'm understanding this correctly, before should show successful finalizer removal and CR deletion (without an actual uninstall), and after should show an Uninstall attempt (but it isn't clear to me if uninstall will succeed or fail if the release can't be decoded). |
@camilamacedo86 I rebased, added a change log and made an additional fix commit. @joelanford nice call on insisting that I double check the log output, I found a problem with the fix and added another commit. This was hard to reproduce so I had to hack around the code a bit to ensure everything was fine until the point where it performs the History() call. What I did was added a sleep which gave me enough time to modify the secret storage release field with some bad data before the uninstall call take place. I hope you are ok with that and don't consider it invalid testing. Here are my logs: before the fix:
after the fix:
endless spam of the above so I am not sure if this is the desire behaviour. I was running off |
I created a new PR for the possible nil pointer in uninstall: #4288 |
rebased now that #4288 is merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one more suggestion, and I think we can get this merged. Thanks for the patience on this!
internal/helm/release/manager.go
Outdated
@@ -352,17 +352,10 @@ func createJSONMergePatch(existingJSON, expectedJSON []byte) ([]byte, error) { | |||
// UninstallRelease performs a Helm release uninstall. | |||
func (m manager) UninstallRelease(ctx context.Context, opts ...UninstallOption) (*rpb.Release, error) { | |||
// Get history of this release | |||
h, err := m.storageBackend.History(m.releaseName) | |||
if err != nil { | |||
if _, err := m.storageBackend.History(m.releaseName); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this check as well. When we call uninstall.Run()
, Helm will check history. If no release exists, it will return a wrapped driver.ErrReleaseNotFound
, which the callee will be able to detect with errors.Is()
.
TL;DR: This check is duplicative since uninstall.Run()
calls this already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased. Done as suggested. Tested locally with a normal delete and it seems fine. Tested with a delete with a 0 release and it seems fine as well:
{"level":"info","ts":1611587596.7093048,"logger":"helm.controller","msg":"Release not found, removing finalizer","namespace":"default","name":"nginx-sample","apiVersion":"example.com/v1alpha1","kind":"Nginx","release":"nginx-sample"}
Signed-off-by: Mike Ng <[email protected]>
Signed-off-by: Mike Ng <[email protected]>
Signed-off-by: Mike Ng <[email protected]>
…dy taken care of by helm uninstall library call Signed-off-by: Mike Ng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/cherry-pick v1.3.x |
@estroz: new pull request created: #4457 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…g release (operator-framework#3431) Signed-off-by: Rein van 't Veer <[email protected]>
…g release (operator-framework#3431) Signed-off-by: Rein van 't Veer <[email protected]>
…g release (operator-framework#3431) Signed-off-by: Rein van 't Veer <[email protected]>
…g release (operator-framework#3431) Signed-off-by: reinvantveer <[email protected]>
…g release (operator-framework#3431) Signed-off-by: Rein van 't Veer <[email protected]> Signed-off-by: reinvantveer <[email protected]>
…g release (operator-framework#3431) Signed-off-by: rearl <[email protected]>
Signed-off-by: Mike Ng [email protected]
Description of the change:
The Helm
History()
function already returns a ErrReleaseNotFound error if no such release name exists.see https://github.com/helm/helm/blob/v3.2.4/pkg/storage/storage.go#L148-L154
The current additional check of
if len(h) == 0
might cause an issue where the release might actually exist but during decoding of the release, it ran into an error and not append it to the list of return result.see https://github.com/helm/helm/blob/v3.2.4/pkg/storage/driver/secrets.go#L125-L138
Which leads to the Helm operator incorrectly determines that the release doesn't exist and skips the actual helm uninstall call.
Motivation for the change:
In some cases, the Helm-operator doesn't call the equivalent of the
helm uninstall
command before deleting the CR.Some resources are removed due to Kubernetes garbage collection. But it leaves behind (at least) cluster scope resources that doesn't have the CR as the owner and can't be garbage collected.
Checklist
If the pull request includes user-facing changes, extra documentation is required:
changelog/fragments
(seechangelog/fragments/00-template.yaml
)website/content/en/docs