Deadlock situation when Cassandra disk is full #49

etiennedi · 2019-02-23T09:07:48Z

Last night on 3 of my 10 cassandra pods, the attached PVs ran completely full. Unfortunately that left us in a bit of a deadlock situation:

Cassandra starts crashing when the disk is entirely full, doesn't come up again
We cannot add new cassandra pods (to increase the overall disk space) because the StatefulSet is configured with OrderedReady. This means new pods won't be scheduled, because existing pods are crashing

I think I just solved this situation manually like this:

Use the small time between container start and container crash to kubectl exec in there
then manually delete the entire cassandra commit log in /var/lib/cassandra/commitlog
wait for the pod to come up healthy again
repeat for other pods until all are up again
now new pods can be scheduled again
hope that cassandra will move the data around in a way where the usage is even again over all pods

However, this is not something we can do in a production setting, because I literally just deleted 8GB of data per crashing pod. In real life we'd have no way of knowing which data would be deleted and also couldn't import it again.

So I think for production we either need:

Very strict Monitoring of the free space on the PVs or
A mechanism to auto-scale the StatefulSet based on the available disk space on the attached PVs. Is there such a thing, @idcrosby ?

The text was updated successfully, but these errors were encountered:

etiennedi · 2019-02-23T10:49:37Z

It seems because of Cassandra's compaction strategy we should aim to never go above 50% disk usage, see:

etiennedi · 2019-02-23T12:15:39Z

A guideline is to keep disk size per node at around 500GB (which means we can effectively use 250GB): https://wikitech.wikimedia.org/wiki/Cassandra/Hardware

bobvanluijt · 2019-02-23T15:40:27Z

Ha, interesting.

On Sat, 23 Feb 2019 at 13:15, Etienne Dilocker ***@***.***> wrote: A guideline is to keep disk size per node at around 500GB (which means we can effectively use 250GB): https://wikitech.wikimedia.org/wiki/Cassandra/Hardware — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#49 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFQQKqX-mAGZVzWDbY_H8fdwPa_qwEbLks5vQTDrgaJpZM4bN1pQ> .

-- +31 (0)610 935 031 PO Box 95263 1090HG Amsterdam

etiennedi · 2019-02-23T16:22:24Z

New Learning about scaling Cassandra clusters: When ownership of partitions is lost (because a new node has joined), this space is not freed up automatically. One has to manually run nodetool cleanup on all nodes. This process is considered so expensive that it is manual by design. In turn this means, scaling up a Cassandra cluster will always be a semi-manual process. (Which is OK, because so much can go wrong. Trying to automate for every edge case would be an insane task).

idcrosby · 2019-02-25T13:11:35Z

Interesting findings @etiennedi. I agree with your recommendation, scaling Cassandra (or pretty much any database) should be a manual task. In addition to the complexity of auto-scaling a database, usually the point at which a database would trigger an autoscale is when it's busiest and therefore a bad time to scale.

As you mention above, the crucial piece is having proper monitoring in place. If we know which metrics to trigger on (disk space) it is straightforward to set up an alert at a certain value (e.g. 50%) to have someone scale the cluster.

etiennedi changed the title ~~Deadline situation when Cassandra disk is full~~ Deadlock situation when Cassandra disk is full Feb 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlock situation when Cassandra disk is full #49

Deadlock situation when Cassandra disk is full #49

etiennedi commented Feb 23, 2019 •

edited

Loading

etiennedi commented Feb 23, 2019 •

edited

Loading

etiennedi commented Feb 23, 2019

bobvanluijt commented Feb 23, 2019 via email

etiennedi commented Feb 23, 2019

idcrosby commented Feb 25, 2019

Deadlock situation when Cassandra disk is full #49

Deadlock situation when Cassandra disk is full #49

Comments

etiennedi commented Feb 23, 2019 • edited Loading

etiennedi commented Feb 23, 2019 • edited Loading

etiennedi commented Feb 23, 2019

bobvanluijt commented Feb 23, 2019 via email

etiennedi commented Feb 23, 2019

idcrosby commented Feb 25, 2019

etiennedi commented Feb 23, 2019 •

edited

Loading

etiennedi commented Feb 23, 2019 •

edited

Loading