-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New or rescheduled weaviate pods can never start up #37
Comments
@etiennedi ConfigMaps are intentionally read-only. This can be overridden, but it is not recommended. For sharing data between pods the recommended approach would be to use a persistent volume and mount it (ReadWriteMany) to all weaviate pods. |
Ok, thanks for the feedback. I was hoping we could avoid the full-blown PV, but if that's better practice than writing into the config map, then I'm all for it. Since we're already deploying Elasticsearch and Cassandra with PVs we won't add a new requirement to the clusters. (My initial thought was not every cluster will be able to provide PVs ... but then the datastores won't work anyway 🙂) We could of course also use custom resources, but then we have something very kubernetes-specific and I assume we want something more generic. Long term we'll probably need some persistent Key-Value store (like etcd), but that's for another day. |
Yeah, to be clear, it's possible to do this with ConfigMaps, just not recommended. So depending on when the long term solution (etcd, redis, or consul, etc) is planned for, you could get away with it. |
Closing. No longer an issue since we know manage this state in |
Summary
This is not really an issue with the Helm setup, but rather an issue of Weaviate lacking the support for horizontal scaling: When the pod that weaviate itself runs in is deleted (or otherwise rescheduled), the new pod will be in a CrashLoop stating something like "Cannot apply initial schema to Janus"
Background
Weaviate is currently not capable of Horizontal Scaling for several reasons:
This issue does not touch upon bullet 2, but is rather a symptom of bullet 1: The first pod will initialize the Janus schema and then save the required state to use that schema into a file. Restarts of the same pod are fine (as the file-system will still be present). A new pod, however, will not have the same file system.
Why this an issue
Even though we explicitly don't support horizontal scaling at the moment because of the tech debt preventing us from doing so, I believe we need to address this: A rescheduled pod, due to underlying node maintenance or just manually deleting (and recreating) a pod in a debugging situation is very common.
Long-Term solution
We need to make weaviate capable of horizontal scaling, this involves solving the distributed lock problem, but also not relying on the local file-system.
Short-Term solution
Suggestion: Mount the two files that weaviate will write into as a
configMap (rw)
. (cc @idcrosby this should circumvent the problem for now, correct?)How to reproduce
kubectl delete pod weaviate-<hash>
)What should happen
The new pod should behave the same as the old pod
What actually happens
The new pod crashes, logging that it cannot initialize the Janus schema, because it is already initialized.
The text was updated successfully, but these errors were encountered: