NeurIPS Model Server and HELM Eval Server

Note

This is a sample reference implementation of the architecture we used to finetune-evaluate NeurIPS Large Language Model Efficiency Challenge submissions. It doesn't include Dockerfile image paths and is missing some K8s deployment-level values. If you'd like to deploy in your own environment, you'll need to add those values.

In order to evaluate NeurIPS 2023 submissions, we're running a Kubernetes deployment with two pods:

a model pod (Dockerfile.model) and
a HELM server pod (Dockerfile.helm).

For evaluation, we run HELM from the server pod to hit the model endpoint on the model pod.

We build both from a Dockerfile locally and push to the Coreweave Docker Registry. Our Kubernetes cluster then starts a deployment with two containers. For this, we're creating a custom Helm chart.

Building:

Helm Image

Build a custom Docker image, either locally if you have enough memory, or on a Docker server. We use a Coreweave Virtual Server for this. Build the Docker image for Helm-Server based on instructions at HELM with this config file.

If you're building locally on MacOS with arm, make sure to specify the platform so we can run it on Coreweave

docker build --platform linux/amd64 -t {docker-registry}/helm-server:1 .

Push to the registry

docker push {docker-registry}/helm-server:1

Model Image

Build with the instructions specified by the contestant and also push to the registry.
Make sure the Dockerfile for your model container exposes port 8080 to serve the model. You can run it from within the pod, CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"] as the last line; alternatively to start manually for troubleshooting: uvicorn main:app --host 0.0.0.0 --port 8080

Run fine-tuning

Train the model using the steps provided in the contestant's repo to repro.
The model artifact should be generally in HuggingFace-style, with a repo full of artifacts and several checkpoints. Save all checkpoints and artifacts to shared storage.

Running in K8s

Once the two Docker images are in the registry, install via Helm chart: helm install {deployment name} .
Test that the model container works by hitting the endpoints locally: curl -X POST -H "Content-Type: application/json" -d '{"prompt": "The capital of france is "}' http://{model_container_ip}:8080/process
The two containers should see each other through the internal IP network on the Pod via port 8080.
To run HELM evaluation, run the sample script from the eval server. Specify nohup or use screen or tmux to run the long-running job without network interruptions.

helm-run --conf-paths run_specs_full_coarse_600_budget.conf --suite v1 --max-eval-instances 1

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
neurips-eval-app		neurips-eval-app
.gitignore		.gitignore
Dockerfile.helm		Dockerfile.helm
Dockerfile.model		Dockerfile.model
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeurIPS Model Server and HELM Eval Server

Building:

Helm Image

Model Image

Run fine-tuning

Running in K8s

About

Releases

Packages

License

mozilla-ai/neurips-eval-infra

Folders and files

Latest commit

History

Repository files navigation

NeurIPS Model Server and HELM Eval Server

Building:

Helm Image

Model Image

Run fine-tuning

Running in K8s

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages