Velero backups and maintainece jobs are failing due to active indexes "logger name="[index-blob-manager]" sublevel=error" #8469

kkavin · 2024-11-30T15:54:07Z

What steps did you take and what happened:
We have installed Velero version 1.14.0, and while a few backups were successful initially, we started facing issues after a few days. The Velero pod is restarting frequently with the error below and continues to restart.

We tried reinstalling Velero, but the issue persists.

"time="2024-11-30T07:44:23Z" level=warning msg="active indexes [xr0_7_6b451ba1676853b054d654945c4dc313-sa2a25f6e3408dec5-c1 xr8_15_c98e297efe6656d29b1445b9b2c50c77-s35ffedab2d2740df-c1 xr16_23_5593a3523b7ad2f088bd2d63898871b0-s25f2e987014a3db8-c1 xr24_31_b91b484ebb9347da800bdcd999c4a164-s913ff0b4246609a8-c1 xr32_39_d26b900b73417c156a07383d92b1703d-s5d983913d7607ac3-c1 xr40_47_df3ac0f426b61822cf418a1cb4630bc1-s1616139ab5f8080a-c1 xr48_55_30c8353b7fadeb9e186317d253004c69-s648992548c90b115-c1 xr56_63_653de373a82c1b9f62c99c0551ac1b2d-s69142ccec46a6aae-c1 xr64_71_2a1cbeaa64b7c246489f337dd1093fa3-sb3a9c84a1aca4b4d-c1 xr72_79_4f4d20f63413d4c6e7795d115806f5d6-se0daa36cd2506608-c1 xr80_87_fbbe510445b61a131a974009599ce44b-sd768227d3173f00d-c1 xs88_c96080dbfa121038a2a00cdc4ba09b9f-s44dd9fd141cd25ed-c1 xs89_f6d1ce4ec56b77462613739ac17a950a-s81fd250e0c1db6a9-c1 xs90_0864a650dc2c67f4269d3209869637ae-s89980c56a835fe28-c1 xs91 .....
deletion watermark 2024-08-08 03:35:02 +0000 UTC" logModule=kopia/kopia/format logSource="pkg/kopia/kopia_log.go:101" logger name="[index-blob-manager]" sublevel=error"

velero.log

Environment:

Velero version (use velero version): v1.14.0
Velero features (use velero client config get features): v1.12.3
Kubernetes version (use kubectl version): v1.30.1
Kubernetes installer & version: v1,30.5-gke.1014003
Cloud provider or hardware configuration: GCP
OS (e.g. from /etc/os-release): Ubuntu 22.04

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

👍 for "I would like to see this bug fixed as soon as possible"
👎 for "There are more important bugs to focus on right now"

The text was updated successfully, but these errors were encountered:

Lyndon-Li · 2024-12-02T02:29:14Z

level=warning msg="active indexes

These warning logs are expected and are not the cause of the restart.

The errors in the 1st screenshot are errors for the backup, it won't cause Velero server pod to restart.

There is no errors in the attached velero.log either.

Lyndon-Li · 2024-12-02T02:37:31Z

We need below extra infos to further troubleshoot for:

Run velero debug to collect the full velero bundle
When the restart happens, run kubectl logs -n velero <velero server pod name> --previous, and collect the output
Before and after the restart happens, run kubectl describe pod -n velero <velero server pod name> and collect the output
Find the failed maintenance job pods under velero namesapce and run kubectl describe pod -n velero <maintenance job pod name> and kubectl logs -n velero <maintenance job pod name>

kkavin · 2024-12-02T07:05:51Z

Hi @Lyndon-Li,
Please find the necessary logs below.

bundle-2024-12-02-05-05-34~.zip
velero_describe.txt
velero_log.log
velero_maintenance.log
velero_maintenance.txt
velero_previous.log

kkavin · 2024-12-06T03:41:41Z

@Lyndon-Li
Any update on this

Lyndon-Li · 2024-12-06T05:10:00Z

From the log, Velero server restarted due to out-of-memory during repo snapshot deletion.
For now, you could fix the problem by increasing the memory request/limit of Velero server.

Let me check how to improve this in future releases.

Lyndon-Li · 2024-12-06T05:31:25Z

To help me understand the scale of your repo data, please help to run below Kopia commands and share the output:

kopia repo status
kopia maintenance info --json
kopia snapshot list --all
kopia content stats
kopia blob stats
kopia index list --json
kopia content list --deleted-only
kopia content list --prefix x
kopia index epoch list

Before running these commands, you need to connect Kopia repo first by running:
kopia repository connect gcs --readonly --bucket=<bucket name> --credentials-file=<credentials file> --override-username=default --override-hostname=default --disable-tls --prefix=kopia/<namespace being backed up>

By the end, disconnect the repo by kopia repository disconnect

Lyndon-Li · 2024-12-06T16:10:26Z

Another quick questions:

What is the size of your volume being backed up?
How many files in the volume?
How many backups have you run when the problem happened?

kkavin · 2024-12-11T08:58:42Z

Another quick questions:

What is the size of your volume being backed up? More than 100 gb

How many files in the volume? More than 500 files

How many backups have you run when the problem happened? Every 4 hrs for few cluster and 24 hr for few cluster

Lyndon-Li added the Needs info Waiting for information label Dec 2, 2024

Lyndon-Li self-assigned this Dec 2, 2024

Lyndon-Li added repository scalability and removed Needs info Waiting for information labels Dec 6, 2024

Lyndon-Li mentioned this issue Dec 11, 2024

Velero backups fail due to failing to connect to kopia repo: mmap error: cannot allocate memory #8502

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Velero backups and maintainece jobs are failing due to active indexes "logger name="[index-blob-manager]" sublevel=error" #8469

Velero backups and maintainece jobs are failing due to active indexes "logger name="[index-blob-manager]" sublevel=error" #8469

kkavin commented Nov 30, 2024

Lyndon-Li commented Dec 2, 2024 •

edited

Loading

Lyndon-Li commented Dec 2, 2024 •

edited

Loading

kkavin commented Dec 2, 2024 •

edited

Loading

kkavin commented Dec 6, 2024

Lyndon-Li commented Dec 6, 2024

Lyndon-Li commented Dec 6, 2024

Lyndon-Li commented Dec 6, 2024

kkavin commented Dec 11, 2024

Velero backups and maintainece jobs are failing due to active indexes "logger name="[index-blob-manager]" sublevel=error" #8469

Velero backups and maintainece jobs are failing due to active indexes "logger name="[index-blob-manager]" sublevel=error" #8469

Comments

kkavin commented Nov 30, 2024

Lyndon-Li commented Dec 2, 2024 • edited Loading

Lyndon-Li commented Dec 2, 2024 • edited Loading

kkavin commented Dec 2, 2024 • edited Loading

kkavin commented Dec 6, 2024

Lyndon-Li commented Dec 6, 2024

Lyndon-Li commented Dec 6, 2024

Lyndon-Li commented Dec 6, 2024

kkavin commented Dec 11, 2024

Lyndon-Li commented Dec 2, 2024 •

edited

Loading

Lyndon-Li commented Dec 2, 2024 •

edited

Loading

kkavin commented Dec 2, 2024 •

edited

Loading