[stable/influxdb] Liveness probe fails while in WAL recovery #75

aelbarkani · 2019-01-11T18:34:27Z

Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT

Version of Helm and Kubernetes: 1.10

Which chart: stable/influxdb

What happened: When a WAL recovery lasts too much the liveness probe fails, causing a CrashLoopbackOff error.

What you expected to happen: The liveness probe shouldn't fail while the db is recovering (only readiness probe). Otherwise the DB will never be able to recover.

How to reproduce it (as minimally and precisely as possible):

Install stable/influxdb
Feed the DB with a large amount of data, and terminate the DB pod abruptly while feeding the DB

Anything else we need to know: duplicate of helm/charts#10405

The text was updated successfully, but these errors were encountered:

sergioisidoro · 2020-04-24T11:29:16Z

I've bumped into similar problem,

Back-off restarting keeps happening because liveness probe returns connection refused. This leads the container to restart and from scratch the WAL recovery.

Exited Containers that are restarted by the kubelet are restarted with an exponential back-off delay (10s, 20s, 40s …) capped at five minutes, and is reset after ten minutes of successful execution

If it's not too much data (<5 min) the WAL will be completed, but then the container restarts after the wait period.

At least increasing the default initialDelaySeconds seems to be necessary for even basic use cases...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stable/influxdb] Liveness probe fails while in WAL recovery #75

[stable/influxdb] Liveness probe fails while in WAL recovery #75

aelbarkani commented Jan 11, 2019

sergioisidoro commented Apr 24, 2020

[stable/influxdb] Liveness probe fails while in WAL recovery #75

[stable/influxdb] Liveness probe fails while in WAL recovery #75

Comments

aelbarkani commented Jan 11, 2019

sergioisidoro commented Apr 24, 2020