New or rescheduled weaviate pods can never start up #37

etiennedi · 2019-01-09T08:20:38Z

Summary

This is not really an issue with the Helm setup, but rather an issue of Weaviate lacking the support for horizontal scaling: When the pod that weaviate itself runs in is deleted (or otherwise rescheduled), the new pod will be in a CrashLoop stating something like "Cannot apply initial schema to Janus"

Background

Weaviate is currently not capable of Horizontal Scaling for several reasons:

The app saves state into to the local file-system. This state is required to connect to a Janus db that was initialized by another instance of weaviate.
Some actions require a database lock that is currently implemented in the weaviate process itself. This lock would have to be transformed to some sort of a distributed lock

This issue does not touch upon bullet 2, but is rather a symptom of bullet 1: The first pod will initialize the Janus schema and then save the required state to use that schema into a file. Restarts of the same pod are fine (as the file-system will still be present). A new pod, however, will not have the same file system.

Why this an issue

Even though we explicitly don't support horizontal scaling at the moment because of the tech debt preventing us from doing so, I believe we need to address this: A rescheduled pod, due to underlying node maintenance or just manually deleting (and recreating) a pod in a debugging situation is very common.

Long-Term solution

We need to make weaviate capable of horizontal scaling, this involves solving the distributed lock problem, but also not relying on the local file-system.

Short-Term solution

Suggestion: Mount the two files that weaviate will write into as a configMap (rw). (cc @idcrosby this should circumvent the problem for now, correct?)

How to reproduce

Install Helm chart as normal
Delete pod that runs weaviate (kubectl delete pod weaviate-<hash>)
Wait for new weaviate pod to be scheduled

What should happen

The new pod should behave the same as the old pod

What actually happens

The new pod crashes, logging that it cannot initialize the Janus schema, because it is already initialized.

The text was updated successfully, but these errors were encountered:

idcrosby · 2019-01-09T20:42:26Z

@etiennedi ConfigMaps are intentionally read-only. This can be overridden, but it is not recommended. For sharing data between pods the recommended approach would be to use a persistent volume and mount it (ReadWriteMany) to all weaviate pods.

etiennedi · 2019-01-09T21:31:59Z

Ok, thanks for the feedback. I was hoping we could avoid the full-blown PV, but if that's better practice than writing into the config map, then I'm all for it. Since we're already deploying Elasticsearch and Cassandra with PVs we won't add a new requirement to the clusters. (My initial thought was not every cluster will be able to provide PVs ... but then the datastores won't work anyway 🙂)

We could of course also use custom resources, but then we have something very kubernetes-specific and I assume we want something more generic.

Long term we'll probably need some persistent Key-Value store (like etcd), but that's for another day.

idcrosby · 2019-01-10T17:38:28Z

Yeah, to be clear, it's possible to do this with ConfigMaps, just not recommended. So depending on when the long term solution (etcd, redis, or consul, etc) is planned for, you could get away with it.

etiennedi · 2019-02-22T18:17:10Z

Closing. No longer an issue since we know manage this state in etcd rather than in files. This means the problem is also no longer relevant.

etiennedi added Kubernetes Helm Architecture Cloud Native labels Jan 9, 2019

etiennedi mentioned this issue Jan 9, 2019

Test updates #17

Open

etiennedi mentioned this issue Jan 15, 2019

Reach 5bn vertices #40

Open

etiennedi mentioned this issue Feb 1, 2019

Possible Race when using high number of replicas of Elastic and Cassandra #38

Open

etiennedi closed this as completed Feb 22, 2019

sibblegp mentioned this issue Dec 4, 2023

Is this going to be upgraded to store metadata in etcd? weaviate/weaviate-helm#194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New or rescheduled weaviate pods can never start up #37

New or rescheduled weaviate pods can never start up #37

etiennedi commented Jan 9, 2019 •

edited

Loading

idcrosby commented Jan 9, 2019

etiennedi commented Jan 9, 2019

idcrosby commented Jan 10, 2019

etiennedi commented Feb 22, 2019

New or rescheduled weaviate pods can never start up #37

New or rescheduled weaviate pods can never start up #37

Comments

etiennedi commented Jan 9, 2019 • edited Loading

Summary

Background

Why this an issue

Long-Term solution

Short-Term solution

How to reproduce

What should happen

What actually happens

idcrosby commented Jan 9, 2019

etiennedi commented Jan 9, 2019

idcrosby commented Jan 10, 2019

etiennedi commented Feb 22, 2019

etiennedi commented Jan 9, 2019 •

edited

Loading