Note
This is a sample reference implementation of the architecture we used to finetune-evaluate NeurIPS Large Language Model Efficiency Challenge submissions. It doesn't include Dockerfile image paths and is missing some K8s deployment-level values. If you'd like to deploy in your own environment, you'll need to add those values.
In order to evaluate NeurIPS 2023 submissions, we're running a Kubernetes deployment with two pods:
- a model pod (
Dockerfile.model
) and - a HELM server pod (
Dockerfile.helm
).
For evaluation, we run HELM from the server pod to hit the model endpoint on the model pod.
We build both from a Dockerfile locally and push to the Coreweave Docker Registry. Our Kubernetes cluster then starts a deployment with two containers. For this, we're creating a custom Helm chart.
- Build a custom Docker image, either locally if you have enough memory, or on a Docker server. We use a Coreweave Virtual Server for this. Build the Docker image for Helm-Server based on instructions at HELM with this config file.
If you're building locally on MacOS with arm
, make sure to specify the platform so we can run it on Coreweave
docker build --platform linux/amd64 -t {docker-registry}/helm-server:1 .
- Push to the registry
docker push {docker-registry}/helm-server:1
- Build with the instructions specified by the contestant and also push to the registry.
- Make sure the Dockerfile for your model container exposes port
8080
to serve the model. You can run it from within the pod,CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]
as the last line; alternatively to start manually for troubleshooting:uvicorn main:app --host 0.0.0.0 --port 8080
- Train the model using the steps provided in the contestant's repo to repro.
- The model artifact should be generally in HuggingFace-style, with a repo full of artifacts and several checkpoints. Save all checkpoints and artifacts to shared storage.
-
Once the two Docker images are in the registry, install via Helm chart:
helm install {deployment name} .
-
Test that the model container works by hitting the endpoints locally:
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "The capital of france is "}' http://{model_container_ip}:8080/process
-
The two containers should see each other through the internal IP network on the Pod via port
8080
. -
To run HELM evaluation, run the sample script from the eval server. Specify
nohup
or usescreen
ortmux
to run the long-running job without network interruptions.
helm-run --conf-paths run_specs_full_coarse_600_budget.conf --suite v1 --max-eval-instances 1