From 602833f1bf37579eafa1ce4605b3337004820724 Mon Sep 17 00:00:00 2001 From: prnaraya Date: Wed, 15 Nov 2023 21:47:44 -0600 Subject: [PATCH] Update VirtHandlerRESTErrorsHigh Diagnosis and Mitigation sections Make Diagnosis more clear: specifying to identify the failing pod (in case there are many) and checking the logs of that failing pod as well as providing example error message. Update reference to to $NAMESPACE in Mitigation. Signed-off-by: prnaraya --- docs/runbooks/VirtHandlerRESTErrorsHigh.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/docs/runbooks/VirtHandlerRESTErrorsHigh.md b/docs/runbooks/VirtHandlerRESTErrorsHigh.md index b9d16e97..9c6937fd 100644 --- a/docs/runbooks/VirtHandlerRESTErrorsHigh.md +++ b/docs/runbooks/VirtHandlerRESTErrorsHigh.md @@ -23,24 +23,30 @@ Node-related actions, such as starting and migrating workloads, are delayed on t $ export NAMESPACE="$(kubectl get kubevirt -A -o custom-columns="":.metadata.namespace)" ``` -2. Check the status of the `virt-handler` pod: +2. List the available `virt-handler` pods to identify the failing `virt-handler` pod: ```bash $ kubectl get pods -n $NAMESPACE -l=kubevirt.io=virt-handler ``` -3. Check the `virt-handler` logs for error messages when connecting to the API server: +3. Check the failing `virt-handler` pod log for error messages when connecting to the API server: ```bash $ kubectl logs -n $NAMESPACE ``` + Example error message: + + ``` + {"component":"virt-handler","level":"error","msg":"Can't patch node my-node","pos":"heartbeat.go:96","reason":"the server has received too many API requests and has asked us to try again later","timestamp":"2023-11-06T11:11:41.099883Z","uid":"132c50c2-8d82-4e49-8857-dc737adcd6cc"} + ``` + ## Mitigation If the `virt-handler` cannot connect to the API server, delete the pod to force a restart: ```bash -$ kubectl delete -n +$ kubectl delete -n $NAMESPACE ```