Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(ai): trim external container docs #688

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 45 additions & 34 deletions ai/orchestrators/models-config.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -97,16 +97,17 @@ currently **recommended** models and their respective prices.
Optional flags to enhance performance (details below).
</ParamField>
<ParamField path="url" type="string" optional="true">
Optional URL and port where the model container or custom container manager software is running.
[See External Containers](#external-containers)
Optional URL and port where the model container or custom container manager
software is running. [See External Containers](#external-containers)
</ParamField>
<ParamField path="token" type="string">
Optional token required to interact with the model container or custom container manager software.
[See External Containers](#external-containers)
Optional token required to interact with the model container or custom
container manager software. [See External Containers](#external-containers)
</ParamField>
<ParamField path="capacity" type="integer">
Optional capacity of the model. This is the number of inference tasks the model can handle at the same time. This defaults to 1.
[See External Containers](#external-containers)
Optional capacity of the model. This is the number of inference tasks the
model can handle at the same time. This defaults to 1. [See External
Containers](#external-containers)
</ParamField>

### Optimization Flags
Expand Down Expand Up @@ -153,33 +154,43 @@ are available:
### External Containers

<Warning>
This feature is intended for advanced users. Incorrect setup can lead to a
lower orchestrator score and reduced fees. If external containers are used,
it is the Orchestrator's responsibility to ensure the correct container with
the correct endpoints is running behind the specified `url`.
This feature is intended for **advanced** users. Misconfiguration can reduce
orchestrator scores and earnings. Orchestrators are responsible for ensuring
the specified `url` points to a properly configured and operational container
with the correct endpoints.
</Warning>

External containers can be for one model to stack on top of managed model containers,
an auto-scaling GPU cluster behind a load balancer or anything in between. Orchestrators
can use external containers to extend the models served or fully replace the AI Worker managed model containers
using the [Docker client Go library](https://pkg.go.dev/github.com/docker/docker/client)
to start and stop containers specified at startup of the AI Worker.

External containers can be used by specifying the `url`, `capacity` and `token` fields in the
model configuration. The only requirement is that the `url` specified responds as expected to the AI Worker same
as the managed containers would respond (including http error codes). As long as the container management software
acts as a pass through to the model container you can use any container management software to implement the custom
management of the runner containers including [Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
[Docker Swarm](https://docs.docker.com/engine/swarm/), [Nomad](https://www.nomadproject.io/), or custom scripts to
manage container lifecycles based on request volume


- The `url` set will be used to confirm a model container is running at startup of the AI Worker using the `/health` endpoint.
Inference requests will be forwarded to the `url` same as they are to the managed containers after startup.
- The `capacity` should be set to the maximum amount of requests that can be processed concurrently for the pipeline/model id (default is 1).
If auto scaling containers, take care that the startup time is fast if setting `warm: true` because slow response time will
negatively impact your selection by Gateways for future requests.
- The `token` field is used to secure the model container `url` from unauthorized access and is strongly
suggested to use if the containers are exposed to external networks.

We welcome feedback to improve this feature, so please reach out to us if you have suggestions to enable better experience running external containers.
The
[AI Worker](/ai/orchestrators/start-orchestrator#orchestrator-node-architecture)
typically manages model containers automatically using a
[Docker client](https://pkg.go.dev/github.com/docker/docker/client) to start and
stop containers at startup. However, orchestrators with unique infrastructure
needs can use external containers to extend or replace managed containers. These
setups can range from individual models to more complex configurations, such as
an auto-scaling GPU cluster behind a load balancer.

To configure external containers, include the `url`, `capacity`, and optionally
the `token` fields in the model configuration.

- The `url` is used to confirm that the model container is running during AI
Worker startup via the `/health` endpoint. After validation, inference
requests are forwarded to the `url` for processing, just like with managed
containers.
- The `capacity` determines the maximum number of concurrent requests the
container can handle, with a default value of 1. For auto-scaling setups, it
is essential to ensure containers start quickly by setting `warm: true`
because slow startups can negatively impact Gateway selection for future
requests.
- The `token` is an optional field used to secure the `url`. It is strongly
recommended for protecting endpoints exposed to external networks from
unauthorized access.

As long as the custom container management logic acts as a pass-through to the
model container, orchestrators can use container management software like
[Kubernetes](https://kubernetes.io/), [Podman](https://podman.io/),
[Docker Swarm](https://docs.docker.com/engine/swarm/),
[Nomad](https://www.nomadproject.io/), or custom scripts designed to manage
container lifecycles based on request volume.

We welcome feedback to improve this feature, so please reach out to us if you
have suggestions to enable better experience running external containers.