Velero 1.15.0 - DataUpload CR sometimes not updated / finalized or Bytes missing? #8492
-
Hi everyone, we have noticed that after upgrading to Velero 1.15.0, DataUploads sometimes “stop” at a certain number of bytes and then still complete successfully. It can also be seen that the typical events “Data-Path-Completed / Stopped” are missing in these CR resources. From the logs of the DataUpload pod you can only recognize a “Timeout waiting for assured events processed” error as a difference to the logs of a “correct” DataUpload pod. I am attaching some excerpts from the same backup run as an example. Can anyone say whether bytes were really not uploaded here or whether the CR resource in this example is simply not being updated / finalized correctly? Another difference was that an error was ignored due to “unknown or unsupported entry” (see screenshot). However, I could also partially see this with DataUpload CRs, which appear correct, i.e. bytes_done = total_bytes. Correctly appearing DataUpload CR:Correctly appearing DataUpload - Logs from DataUpload Pod:DataUpload CR not appearing correctly:DataUpload CR not appearing correctly - Logs from DataUpload Pod:Addition: I have checked this for all DataUploads and it always seems to be the case that the corresponding timeout is visible. The DataUpload pods are then still visible as Completed for 1-2 minutes and then the status of the CR also changes from InProgress to Completed, but again with bytes_done < total_bytes.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
@realjump Please open an issue for the same. Please also clarify how many DUs are there when the problem happened. And also clarify your cluster nodes' capacity, i.e., what is the CPU usage in the node where the problem happened. |
Beta Was this translation helpful? Give feedback.
@realjump Please open an issue for the same. Please also clarify how many DUs are there when the problem happened. And also clarify your cluster nodes' capacity, i.e., what is the CPU usage in the node where the problem happened.