Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for force deleting VM when its provisioning state is FAILED #115

Closed
unmarshall opened this issue Nov 23, 2023 · 1 comment · Fixed by #117
Closed

Add support for force deleting VM when its provisioning state is FAILED #115

unmarshall opened this issue Nov 23, 2023 · 1 comment · Fixed by #117
Assignees
Labels
kind/enhancement Enhancement, improvement, extension status/closed Issue is closed (either delivered or triaged)

Comments

@unmarshall
Copy link
Contributor

unmarshall commented Nov 23, 2023

What would you like to be added:

In Azure if a Virtual Machine has ProvisionState set to Failed then it neither be updated or deleted. In this case the VM is stuck in this state. If the associated resources (NIC, OSDisk and DataDisk) have to be updated to set cascade delete then that will fail as in this state the VM updates are not allowed. Azure will return the following:

E1121 11:07:51.116477   26301 machine_util.go:1242] Error while deleting machine --REDACTED--: machine codes error: code = [Internal] message = [Failed to update cascade delete of associated resources for VM: [ResourceGroup: --REDACTED--, Name: --REDACTED--], Err: PATCH https://management.azure.com/subscriptions/--REDACTED--/resourceGroups/--REDACTED--/providers/Microsoft.Compute/virtualMachines/--REDACTED--
--------------------------------------------------------------------------------
RESPONSE 409: 409 Conflict
ERROR CODE: OperationNotAllowed
--------------------------------------------------------------------------------
{
  "error": {
    "code": "OperationNotAllowed",
    "message": "Operation 'Update VM' is not allowed on VM '--REDACTED--' since the VM is marked for deletion. You can only retry the Delete operation (or wait for an ongoing one to complete)."
  }
}
--------------------------------------------------------------------------------
]

In these situations, the VM should be deleted, followed by explicit deletion of all associated resources (NIC, OSDisk and DataDisk(s)).

Why is this needed:
This ensures that VM and its associated resources are cleaned up properly.
We have seen multiple issues in Canary [Issue #4358, #4389, #4390, #4377] where VM's were stuck with ProvisioningState = Failed for days and nothing could be done to clean them up. Operators would have to manually go and issue delete for the VMs. With this issue we attempt to clean up all resources automatically.

@himanshu-kun
Copy link
Contributor

/close as fixed
Patch PR is raised as well #120

@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension status/closed Issue is closed (either delivered or triaged)
Projects
None yet
3 participants