Skip to content
This repository has been archived by the owner on Jan 5, 2022. It is now read-only.

[stable/influxdb] Liveness probe fails while in WAL recovery #75

Open
aelbarkani opened this issue Jan 11, 2019 · 1 comment
Open

[stable/influxdb] Liveness probe fails while in WAL recovery #75

aelbarkani opened this issue Jan 11, 2019 · 1 comment

Comments

@aelbarkani
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Version of Helm and Kubernetes: 1.10

Which chart: stable/influxdb

What happened: When a WAL recovery lasts too much the liveness probe fails, causing a CrashLoopbackOff error.

What you expected to happen: The liveness probe shouldn't fail while the db is recovering (only readiness probe). Otherwise the DB will never be able to recover.

How to reproduce it (as minimally and precisely as possible):

Install stable/influxdb
Feed the DB with a large amount of data, and terminate the DB pod abruptly while feeding the DB

Anything else we need to know: duplicate of helm/charts#10405

@sergioisidoro
Copy link

I've bumped into similar problem,

Back-off restarting keeps happening because liveness probe returns connection refused. This leads the container to restart and from scratch the WAL recovery.

Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution

If it's not too much data (<5 min) the WAL will be completed, but then the container restarts after the wait period.

At least increasing the default initialDelaySeconds seems to be necessary for even basic use cases...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants