You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IS throttling updates to ensure safety, prevents job approvals Impact MR jobs do not get approved if already executing updates exceed the pre-defined safe threshold. Indication Today None. Banner Mesage:
"Active updates count has exceeded the max allowed for safe rollout of updates. Once the existing updates complete, the pending updates will start automatically. Track status of Active: List<repairTask/MRJob>. Pending: List. To forcefully allow an MR job to go through, connect to the SF cluster and execute this command: Invoke-ServiceFabricInfrastructureCommand -ServiceName <InfrastructureService Name like fabric:/System/InfrastructureService/nt1> -Command AllowAction:<MR_job-id_guid>:*:Prepare"
Long running MR jobs Impact Due to the long running jobs, other jobs get throttled and do not get approved Indication Today Repair task executing for a very long time (more than 2hrs) Banner Message:
"Repair task X has been executing for Y amount of time, which doesn’t seem normal. This update can prevent other updates from going through. Please reach out to the Azure Compute teams (“Compute Manager/Blackbird”) to figure out why the platform updates are not completing."
Safety checks blocking approvals (due to Service health/min replica config issues) Impact Repairs stuck in preparing and MR jobs are not approved. Indication Today
a. Repair tasks get stuck in the preparing state while disabling the nodes
b. The node lists the reason for the failing safety check, which prevents disabling Banner Message:
"Repair task X has been stuck in the preparing state for Y amount of time. This usually happens due to the following reasons:
a. Service health related issues. Please check the health of the service on the node: List and fix the service for the updates to get unblocked.
b. Service replica configuration for max/min replica count. Updates will not go through if the min replica configuration can’t be ensured"
Link to the public doc which talks about all this.
Safety checks blocking approvals (Seed node removal) Impact Repairs stuck in preparing and MR jobs are not approved Indication Today Seed node in disabling state and repair task in preparing state Banner Message:
“Repair task X has been stuck in the preparing state, to disable the seed node Y for removal. This is blocked by design to prevent any risk to the cluster availability. There are multiple options available to come out of this state. Please follow the doc for details:
{link to the public doc}”.
Health checks blocking updates Impact Repairs stuck in preparing/restoring health checks and MR jobs are not approved Indication Today Repair tasks stuck in preparing/restoring health check state Banner Message:
“Repair task X has been stuck in the preparing/restoring health check state, due to the cluster health related issues. This is expected when the preparing/restoring health checks have been enabled in this cluster and there is any entity which is not healthy. Please ensure all entities in the cluster like nodes & services are healthy for this check to pass and allow the updates to proceed."
Customers deploying less than 5 VMs with MR Impact MR jobs can’t execute reliably Indication Today None Banner Message:
“Nodetype name:{X} is deployed with less than 5 VMs. Atleast 5 VMs in the VMSS are required to be present for the platforms updates to work reliably. Please fix this misconfiguration as updates to such VMSS will be blocked soon and deployments will start failing. For details:
Blocks
IS throttling updates to ensure safety, prevents job approvals
Impact MR jobs do not get approved if already executing updates exceed the pre-defined safe threshold.
Indication Today None.
Banner Mesage:
Long running MR jobs
Impact Due to the long running jobs, other jobs get throttled and do not get approved
Indication Today Repair task executing for a very long time (more than 2hrs)
Banner Message:
Safety checks blocking approvals (due to Service health/min replica config issues)
Impact Repairs stuck in preparing and MR jobs are not approved.
Indication Today
a. Repair tasks get stuck in the preparing state while disabling the nodes
b. The node lists the reason for the failing safety check, which prevents disabling
Banner Message:
a. Service health related issues. Please check the health of the service on the node: List and fix the service for the updates to get unblocked.
b. Service replica configuration for max/min replica count. Updates will not go through if the min replica configuration can’t be ensured"
Safety checks blocking approvals (Seed node removal)
Impact Repairs stuck in preparing and MR jobs are not approved
Indication Today Seed node in disabling state and repair task in preparing state
Banner Message:
Health checks blocking updates
Impact Repairs stuck in preparing/restoring health checks and MR jobs are not approved
Indication Today Repair tasks stuck in preparing/restoring health check state
Banner Message:
Customers deploying less than 5 VMs with MR
Impact MR jobs can’t execute reliably
Indication Today None
Banner Message:
The text was updated successfully, but these errors were encountered: