Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
There are many reasons why
qhub destroy
fails, including timeouts in terraform which are hard to set globally.Sometimes the Kubernetes cluster starts to be destroyed before Terraform has formally had a chance to destroy some of the software deployed within Kubernetes, resulting in the error that the cluster is not accessible to destroy that software.
Since the Keycloak provider needs to make API calls to Keycloak within the cluster, terraform command can fail terminally if the provider is configured with a Keycloak URL that no longer exists.
Other Solutions
A more robust definition system as proposed in Split infrastructure into components might help since it would allow destroy to be done in stages, so we can be sure one stage is complete before we advance to the next.
Maybe a move to Terraform CDK would also change this.
But these are long-term ideas.
Current Solution
In this PR I have changed
qhub destroy
just to run the reverse of the targettedqhub deploy
stages that we already have. One difficulty is in listing out all items that need to be destroyed at each stage - since deploy's final stage is just 'everything else', we need to maintain a long list of all items. I don't believe my list is complete, but it doesn't really matter - the destruction of Kubernetes should destroy everything anyway, and removing most items from the cluster in a separate stage gives breathing room so there is less to remove during the cluster destroy.The main thing that we need to avoid telling Terraform to destroy e.g. "Kubernetes" and "keycloak-configuration" in the same stage, which of course happens when we just run a straight
terraform destroy
with no targetting.An alternative approach to all this may have been to strengthen the 'depends_on' tree but this is split across multiple files and is still likely to run into the timeout problem.
Note it is not possible to 'refresh state' once Keycloak is inaccessible. We should consider removing the
terraform refresh
command from the destroy procedure anyway.It is possible to run the old style of
qhub destroy
without targets by using the flag--full-only
.Types of changes
What types of changes does your code introduce?
Put an
x
in the boxes that applyTesting
Requires testing
In case you checked yes, did you write tests?