Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Compute] Dedicated Host DELETE API is broken #8137

Closed
magodo opened this issue Jan 14, 2020 · 8 comments
Closed

[Compute] Dedicated Host DELETE API is broken #8137

magodo opened this issue Jan 14, 2020 · 8 comments
Labels
ARM - Core bug This issue requires a change to an existing behavior in the product in order to be resolved. Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@magodo
Copy link
Contributor

magodo commented Jan 14, 2020

When attempting to delete an dedicated host group and its containing dedicated host via Azure API, the normal control flow failed as illustrated below.

Firstly delete the dedicated host as below:

DELETE https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/magodo-rg/providers/Microsoft.Compute/hostGroups/magodo_host_group/hosts/magodo_host?api-version=2019-03-01

which returns a Long Running Request as shown below:

...
Azure-AsyncOperation: https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Compute/locations/eastus2/operations/a786a0e0-7581-4d69-9007-855970f3a2fe?api-version=2019-03-01
...

Then send one/several GET request against this URL:

GET https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Compute/locations/eastus2/operations/a786a0e0-7581-4d69-9007-855970f3a2fe?api-version=2019-03-01

which eventually returns Succeeded (which should mean that the Dedicated Host no longer exists):

{
    "startTime": "2020-01-14T01:25:27.7609504+00:00",
    "endTime": "2020-01-14T01:25:27.8390712+00:00",
    "status": "Succeeded",
    "name": "a786a0e0-7581-4d69-9007-855970f3a2fe"
}

At this time - it should be possible to delete the Dedicated Host Group (since there's no items left within it). Attempting to do so:

DELETE https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/magodo-rg/providers/Microsoft.Compute/hostGroups/magodo_host_group?api-version=2019-03-01

which fails though:

{
    "error": {
        "code": "CannotDeleteResource",
        "message": "Can not delete resource before nested resources are deleted."
    }
}

Would it be possible for someone to look into these bugs?

Thanks!

@akning-ms akning-ms added the Service Attention Workflow: This issue is responsible by Azure service team. label Jan 17, 2020
@Drewm3
Copy link
Member

Drewm3 commented Feb 9, 2020

@zivraf, could you take a look at this issue?

@lilyjma lilyjma added Compute and removed Compute labels Feb 19, 2020
@zivraf
Copy link

zivraf commented Mar 6, 2020

Thank you for this feedback. We've looked into the issue as reported and acknowledge the behavior as described. While the fix is not simple, we would like to propose a work around. Assuming you're using the REST calls in order to script the deletion of host and the host group, we recommend making one GET call which should buy you sufficient time. Basically, you can check that the GET returns 404 before attempting to delete the host group.
Regards

@Drewm3
Copy link
Member

Drewm3 commented Jun 25, 2020

@magodo, are you still seeing this issue? From discussions with the team, it looks like this issue should either be fixed, or the problem is something that will need to be debugged by the Azure Resource Manager team as the failure may be occurring due to interactions between the compute resource provider and the resource manager.

@magodo
Copy link
Contributor Author

magodo commented Jun 28, 2020

@Drewm3 I'm actually doing the thing as @zivraf suggested. While let's keep this issue open to tracking the official solution.

@Drewm3 Drewm3 added the feature-request This issue requires a new behavior in the product in order be resolved. label Jul 20, 2020
@Drewm3 Drewm3 added bug This issue requires a change to an existing behavior in the product in order to be resolved. and removed feature-request This issue requires a new behavior in the product in order be resolved. labels Sep 9, 2020
@Drewm3 Drewm3 added ARM - Core and removed Compute labels Sep 9, 2020
@ghost
Copy link

ghost commented Sep 9, 2020

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @armleads-azure.

@Drewm3
Copy link
Member

Drewm3 commented Sep 9, 2020

Moving this issue over to ARM because it sounds like the compute resource provider is doing the right thing here and the issue lies in ARM.

@snaheth
Copy link
Contributor

snaheth commented Sep 24, 2020

@magodo This could be happening due to a few reasons. For example, if your DELETE calls hit different ARM regional endpoints, this issue could be due to a discrepancy in the resource state (deleted vs existing) that hasn't yet propagated across regions. It can also be the result of a similar issue with resource state with the Resource Provider, which is Microsoft.Compute in this case.

It's difficult for us to thoroughly debug something that occurred so long ago (because of what's easily query-able for us). If you run into this again, please re-open the case and share your correlationId (feel free to email me this info). It'd be great if we could get to the bottom of this edge case.

@snaheth snaheth closed this as completed Sep 24, 2020
@magodo
Copy link
Contributor Author

magodo commented Sep 25, 2020

@snaheth

The issue still exists and 100% reproducible, the correlationId you asked are listed below:

  1. delete dedicated host: 48c49482-b84f-4dc1-b450-d26cf274e7fa
  2. poll dedicated host deletion: bae672ea-833f-4ee0-b5d6-a1f2e13064c5
  3. deleting dedicated host group: 12a9c809-950c-4903-84bd-6fe0938b2e5e

Please keep this issue open until it is resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ARM - Core bug This issue requires a change to an existing behavior in the product in order to be resolved. Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

6 participants