-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Run neurons using Kubernetes Jobs #347
Comments
This would be awesome! |
PR #349 seems to do steps 1-3 of my outline. I took notes on how I set up for building and running (step 4a), but they are not published anywhere yet. |
Notes published. Sorry it has a certificate error! 💔 |
Great stuff, thanks for this! Can we revisit the persistent disks? Since k8s jobs are pretty much ephemeral, and that carrying state around is generally challenging in larger deployments, why not prefer an ephemeral volume? While In my use case(s) and experience so far, deploying NFS or another storage engine likely carries more debt than running Cortex in production on a VM. For what it's worth, if you can access the Docker socket on the host machine, the current runner works just fine. I'm running jobs at home in a lab environment, here are some k8s manifests, and the container (ymmv 😄 ) |
@jonpulsifer, thanks for your suggestion! I'd love to see this capability without a ReadWriteMany volume. I'll tell you why I haven't found it a problem so far, and why I thought it necessary. This is my first code contribution to Cortex itself; any code improvements you see to be possible could be yours! My initial tests for this functionality were carried out on a single-machine cluster, where, as I noted above, a path on the host filesystem can become a ReadWriteMany persistent volume. On my multinode test cluster, I'm upgrading to Longhorn 1.1, where ReadWriteMany persistent volumes will reputedly Just Work. Fingers crossed. Neither of these are in a public cloud; but I was able to rustle up some blog article with some kind of directions for Amazon, DigitalOcean and Google clouds. In most Kubernetes Job examples, including the one you cited, the results of the Job are retrieved by inspecting the log output from a successfully completed container. But Cortex has moved away from the use of stdin/stdout, and toward input and output files. As far as I understand, any files written during a container's execution go away when the container finishes executing, unless they were written to a persistent volume. Only the log output remains, but it conflates stdout and stderr, and so it's impossible to reliably parse. I hope that's a compelling case for the need for some kind of persistence. But why not use a ReadWriteOnce volume, which has many more implementations to choose from? As far as I know, the set of volumes available to a Kubernetes Pod is fixed when the Pod is created. But the Pod running Cortex is long-lived. So any job output files Cortex will read have to be stored on a volume that exists when the Cortex Pod is started, yet is writable by the job Pod. But why use files at all? One Kubernetes Job guide I found (and have since lost) suggested that production Jobs needing to persist their output might store it in a database or message queue. But either of those would be an even bigger dependency than a shared filesystem, and I didn't want to presume to redesign the form of communication between Cortex and the jobs a third time. |
Hi @jaredjennings |
@TheMatrix97 thanks for filling in just what the consequences are on AWS. You can certainly run jobs as subprocesses - that was the original way to do it! You can run them all inside the Cortex container, but:
If it costs a lot less to run Cortex in a container like this, rather than in a virtual machine, it might be worth it. But it's (tongue in cheek) all the simplicity of containers, with all the scalability of VMs. Definitely you don't need my code to do it, and that is a plus: this pull request is a year and a half old. |
I see... As you pointed out It's not that easy to make it work in a single container, it requires some adhoc patches and having a high resource consuming variable container could bring some scalability problems... I'd rather use EFS.... |
That is a promising route, indeed. I think TheHive added support for storing attachments using MinIO, and I think that code is not in TheHive but in underlying libraries. So perhaps it could be easily co-opted. As for "reopening," I'm quite glad for your interest! I've just taken this back up because of an issue someone filed on my related Helm charts. What I meant by pointing out the age of this pull request is that it does not appear to be a priority for the current maintainers of Cortex to integrate this code. So you take on some risk, and some work, if you use it. |
Request Type
Feature Request
Problem Description
Cortex can run analyzers and responders (collectively, neurons, if I'm using the term properly) as subprocesses (
ProcessJobRunnerSrv
) or using Docker (DockerJobRunnerSrv
). When processes are used, all the neuron code ends up in the same filesystem, process, and network namespace as Cortex itself. When Docker is used, both Cortex itself and each neuron's code can run in their own containers. This is a maintainability triumph.But in order to do this, Cortex's container has to have access to the Docker socket, and it has to share a directory with the neuron containers it runs. (With the
DockerJobRunnerSrv
, this filesystem sharing happens via access to a directory in the host OS' filesystem.) Access to the Docker socket is not a security best practice, because it's equivalent to local root; also it hinders scalability and flexibility in running neurons, and it means depending specifically on Docker, not generally any software that can make a container.Kubernetes
Kubernetes offers APIs for running containers which are scalable to multi-node clusters, securable, and supported by multiple implementations. Some live natively in public clouds (EKS, AKS, GKE, etc.), some are trivial single-node clusters for development (minikube, KIND), and some are lightweight but production-grade (k3s, MicroK8S). Kubernetes has extension points for storage, networking, and container running, so these functions can be provided by plugins.
The net effect is that while there is a dizzying array of choices about how to set up a Kubernetes cluster, applications consuming the Kubernetes APIs don't need to make those choices, only to signify what they need. The cluster will make the right thing appear, subject to the choices of its operators. And the people using the cluster need not be the same people as the operators: public clouds support quick deployments with but few questions, I've heard.
Jobs
One of the patterns Kubernetes supports for using containers to get work done is the Job. (I'll capitalize it here to avoid confusion with Cortex jobs.) You create a Job with a Pod spec (which in turn specifies some volumes and some containers), and the Job will create and run the Pod, retrying until it succeeds, subject to limits on time, number of tries, rate limits, and the like. Upon succeeding it remains until deleted, or until a configured timeout expires, etc.
Running a Job with the Kubernetes API would be not unlike running a container with the Docker API, except that it would be done using a different client, and filesystem sharing would be accomplished in a different way.
Sharing files with a job
With the Docker job runner, a directory on the host is specified; it's mounted into the Cortex container; and when Cortex creates a neuron container, it's mounted into that neuron container. This implicitly assumes that the Cortex container and the neuron container are both run on the same host, they can have access to the same filesystem at the same time, and it provides persistent storage. None of these are necessarily true under Kubernetes.
Under Kubernetes, a
PersistentVolumeClaim
can be created, which backs a volume that can be mounted into a container (as signified in the spec for the container). That claim can haveReadWriteMany
as itsaccessModes
setting, which signifies to the cluster a requirement that multiple nodes should be able to read and write files in thePersistentVolume
which the cluster provides to satisfy the claim. On a trivial, single-node cluster, the persistent volume can be ahostPath
: the same way everything happens now with Docker, but more complicated. But different clusters can provide other kinds of volumes to satisfy such a claim: self-hosted clusters may use Longhorn or Rook to provide redundant, fault-tolerant storage; or public clouds may provide volumes of other types they have devised themselves (Elastic Filesystem, Azure Shared Disks, etc). ThePersistentVolumeClaim
doesn't care.So the Cortex container is created with a
ReadWriteMany
PersistentVolumeClaim
backed volume mounted as its job base directory. When running a neuron, it creates a directory for the job, and creates aJob
, with volumes which are job-specificsubPath
s of the samePersistentVolumeClaim
mounted as the/job/input
(readOnly: true
) and/job/output
directories. How the files get shared is up to the cluster. TheJob
can only see and use the input and output directories for the Cortex job it's about. When theJob
is finished, the output files will be visible in the Cortex container under the job base directory, as with other job running methods.How to implement
KubernetesJobRunnerSrv
, which takes information about a persistent volume claim passed in, and uses it to create a Kubernetes Job for a Cortex job, hewing closely to theDockerJobRunnerSrv
.The text was updated successfully, but these errors were encountered: