Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

qhub destroy using targets #948

Merged
merged 1 commit into from
Nov 30, 2021
Merged

qhub destroy using targets #948

merged 1 commit into from
Nov 30, 2021

Conversation

danlester
Copy link
Contributor

Problem

There are many reasons why qhub destroy fails, including timeouts in terraform which are hard to set globally.

Sometimes the Kubernetes cluster starts to be destroyed before Terraform has formally had a chance to destroy some of the software deployed within Kubernetes, resulting in the error that the cluster is not accessible to destroy that software.

Since the Keycloak provider needs to make API calls to Keycloak within the cluster, terraform command can fail terminally if the provider is configured with a Keycloak URL that no longer exists.

Other Solutions

A more robust definition system as proposed in Split infrastructure into components might help since it would allow destroy to be done in stages, so we can be sure one stage is complete before we advance to the next.

Maybe a move to Terraform CDK would also change this.

But these are long-term ideas.

Current Solution

In this PR I have changed qhub destroy just to run the reverse of the targetted qhub deploy stages that we already have. One difficulty is in listing out all items that need to be destroyed at each stage - since deploy's final stage is just 'everything else', we need to maintain a long list of all items. I don't believe my list is complete, but it doesn't really matter - the destruction of Kubernetes should destroy everything anyway, and removing most items from the cluster in a separate stage gives breathing room so there is less to remove during the cluster destroy.

The main thing that we need to avoid telling Terraform to destroy e.g. "Kubernetes" and "keycloak-configuration" in the same stage, which of course happens when we just run a straight terraform destroy with no targetting.

An alternative approach to all this may have been to strengthen the 'depends_on' tree but this is split across multiple files and is still likely to run into the timeout problem.

Note it is not possible to 'refresh state' once Keycloak is inaccessible. We should consider removing the terraform refresh command from the destroy procedure anyway.

It is possible to run the old style of qhub destroy without targets by using the flag --full-only.

Types of changes

What types of changes does your code introduce?

Put an x in the boxes that apply

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds a feature)
  • Breaking change (fix or feature that would cause existing features to not work as expected)
  • Documentation Update
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no API changes)
  • Build related changes
  • Other (please describe):

Testing

Requires testing

  • Yes
  • No

In case you checked yes, did you write tests?

  • Yes
  • No

Copy link
Member

@iameskild iameskild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iameskild iameskild merged commit a5d0190 into main Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants