Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(worker): auto pull images if not found #200

Merged
merged 15 commits into from
Nov 28, 2024
Merged

Conversation

rickstaa
Copy link
Member

This pull request ensures that the worker tries to pull the docker images if they are not found locally.

@rickstaa rickstaa marked this pull request as draft September 12, 2024 21:06
@rickstaa
Copy link
Member Author

@ad-astra-video, @victorges only blocker for this is that we should return no capacity error if people have a container as a cold container and it starts to download. I will look at that before merging.

This commit ensures that the worker tries to pull the docker images if
they are not found locally.
@rickstaa rickstaa force-pushed the auto_pull_containers branch from 6138eec to 9d724df Compare November 22, 2024 11:26
This commit restructures the container pulling logic to enable asynchronous
image fetching during startup. The orchestrator will report a capacity of 0
until all required containers are successfully pulled. Tests are not yet
working due to mocking issues.
@rickstaa rickstaa force-pushed the auto_pull_containers branch from 9d724df to 62a5d05 Compare November 22, 2024 11:27
This commit replaces the container pulling logic so that the containers
are only pulled on warmup or when first requested.
This commit removes the broken Docker tests as the current mocking approach
is not working correctly. Additionally, it includes the following improvements:
@rickstaa rickstaa requested a review from leszko November 22, 2024 15:53
@rickstaa rickstaa marked this pull request as ready for review November 22, 2024 15:53
@rickstaa
Copy link
Member Author

rickstaa commented Nov 22, 2024

@leszko, this pull request is ready for review. Unfortunately, I wasn't able to add clean tests for the warm and hasCapacity methods. The reason is that the docker.client(https://github.com/moby/moby/blob/1832afcf69f66b1ec55f7c7c127465e4eb7d962e/client/client.go#L104) is implemented as a struct rather than an interface.

To enable mocking the client, we would need to modify the docker.go code. However, I couldn't identify a way to achieve this without altering the main codebase. If adding tests for these functions is a priority, we could discuss potential approaches to implement this effectively. Let me know how you'd like to proceed.

@victorges
Copy link
Member

victorges commented Nov 22, 2024

@rickstaa we can still create an interface ourselves for the struct client (maybe just for the methods we use) and use it as an interface in our code! If you want to implement some mocked tests I can help you out with that.

e.g.
lib provides

type Client struct {}

func (c *Client) DoSomething() {}

func NewClient() *Client {
    ...
}

we can create the interface on our side

type DockerIface interface {
  DoSomething()
}

and use in our code like an interface

func NewDockerManager(client DockerIface) {
    ...
}

func main() {
    var client DockerIface = docker.NewClient() // or mock.NewDockerClient()
    docker := NewDockerManager(client)
}

@rickstaa
Copy link
Member Author

@rickstaa we can still create an interface ourselves for the struct client (maybe just for the methods we use) and use it as an interface in our code! If you want to implement some mocked tests I can help you out with that.

e.g. lib provides

type Client struct {}

func (c *Client) DoSomething() {}

func NewClient() *Client {
    ...
}

we can create the interface on our side

type DockerIface interface {
  DoSomething()
}

and use in our code like an interface

func NewDockerManager(client DockerIface) {
    ...
}

func main() {
    var client DockerIface = docker.NewClient() // or mock.NewDockerClient()
    docker := NewDockerManager(client)
}

Thank you, @victorges, for your response. Your suggestion is indeed a solid option. Initially, I aimed to avoid adding extra code to the docker.go file, but it turned out to be more challenging than expected. Specifically, I experimented with two approaches:

  1. Mocking the client in docker_test.go using a mocking package, a stub server, or a combination of both.
  2. Using a global variable to replace the client.

Unfortunately, both approaches ran into significant hurdles, primarily type errors at various points.

That said, if you're comfortable with adding some extra code to the docker.go side, we could leverage it to create tests for the HasCapacity and warm methods. The main behavior I wanted to validate was ensuring that if m.dockerClient.ImageInspectWithRaw returns an error, the code properly calls m.dockerClient.ImagePull.

However, I’m beginning to wonder if I might be over-designing these tests. Could you advise on which tests you'd prefer to see? Alternatively, do you think we should proceed without these tests and merge as is?

Mock Client Example
// MockDockerClient is a mock implementation of the Docker client.
type MockDockerClient struct {
    docker.Client
    mock.Mock
}

func (m *MockDockerClient) ImagePull(ctx context.Context, ref string, options image.PullOptions) (io.ReadCloser, error) {
    args := m.Called(ctx, ref, options)
    return args.Get(0).(io.ReadCloser), args.Error(1)
}

// Other mocked methods omitted for brevity

func TestPullImageAsync(t *testing.T) {
    mockDockerClient := new(MockDockerClient)
    dockerManager := &DockerManager{
        dockerClient:    mockDockerClient,
        imagePullStatus: &sync.Map{},
    }

    ctx := context.Background()
    imageName := "test-image"

    // Mock ImageInspectWithRaw to simulate the image being available locally
    mockDockerClient.On("ImageInspectWithRaw", ctx, imageName).Return(types.ImageInspect{}, nil, nil)
    dockerManager.pullImageAsync(ctx, imageName)
    mockDockerClient.AssertNotCalled(t, "ImagePull", ctx, imageName, mock.Anything)

    // Simulate image not being available locally
    mockDockerClient.On("ImageInspectWithRaw", ctx, imageName).Return(types.ImageInspect{}, nil, errors.New("not found"))
    mockDockerClient.On("ImagePull", ctx, imageName, mock.Anything).Return(io.NopCloser(strings.NewReader("")), nil)

    dockerManager.pullImageAsync(ctx, imageName)
    mockDockerClient.AssertCalled(t, "ImagePull", ctx, imageName, mock.Anything)
}
Stubbed Docker Manager and mocked https client
func StubDockerManager(serverURL string) (*DockerManager, error) {
    manager, err := NewDockerManager("default-image", []string{"gpu0"}, "/models")
    if err != nil {
        return nil, err
    }

    httpClient := &http.Client{}
    dockerClient, err := client.NewClientWithOpts(
        client.WithHost(serverURL),
        client.WithHTTPClient(httpClient),
        client.WithAPIVersionNegotiation(),
    )
    if err != nil {
        return nil, err
    }

    manager.dockerClient = dockerClient
    return manager, nil
}

func newMockServer() *httptest.Server {
    return httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        switch r.URL.Path {
        case "/v1.41/images/test-image/json":
            w.WriteHeader(http.StatusNotFound)
        case "/v1.41/images/create":
            w.WriteHeader(http.StatusOK)
            io.WriteString(w, `{"status":"Pulling from library/test-image","progressDetail":{},"id":"test-image"}`)
        case "/v1.41/containers/create":
            resp := container.CreateResponse{ID: "container_id"}
            body, _ := json.Marshal(resp)
            w.WriteHeader(http.StatusCreated)
            w.Write(body)
        case "/v1.41/containers/container_id/start":
            w.WriteHeader(http.StatusNoContent)
        case "/v1.41/containers/container_id/json":
            resp := types.ContainerJSON{
                ContainerJSONBase: &types.ContainerJSONBase{
                    State: &types.ContainerState{Running: true},
                },
            }
            body, _ := json.Marshal(resp)
            w.WriteHeader(http.StatusOK)
            w.Write(body)
        default:
            http.Error(w, "unexpected request", http.StatusBadRequest)
        }
    }))
}

This commit adds the ability to mock the docker.client so that we can
perform tests on the methods in the `docker.go` file.
@rickstaa
Copy link
Member Author

rickstaa commented Nov 22, 2024

Hi @victorges,

I believe this aligns with what you had in mind: #287. While the code isn't fully cleaned up yet, and the mocks aren't entirely accurate, I wanted to check if we agree on adding these kinds of tests.

Initially, I attempted to avoid using an interface in the docker.go file, but it seems either very difficult or not possible.

If you're confident in my approach and logic, we can proceed with merging and address the tests later. Let me know what you think! 👍🏻
cc. @leszko

@pwilczynskiclearcode
Copy link
Contributor

pwilczynskiclearcode commented Nov 25, 2024

@rickstaa @leszko @victorges
Could we discuss dl_checkpoints.sh that downloads models using huggingface-cli download. Should we tie it in the ai-worker and ai-models.json so only actually needed models are downloaded and it's done automatically (like with the docker images in this PR).
huggingface-cli download by default downloads from the latest commit on the main branch. To download from a specific revision (commit hash, branch name or tag) we should use the --revision option if we want to manage the rollout precisely.

This commit renames the docker client interface so that it is more clear
what it does.
This commit ensures the docker.warm and docker.hasCapacity methods are
tested correctly. It also cleans up the earlier added tests.
This commit adds the ability to mock the docker.client so that we can
perform tests on the methods in the `docker.go` file. It also ensures the docker.warm and docker.hasCapacity methods are tested correctly.
Simplify image pull logic with EnsureImageAvailable method so that
images can be pulled in advance by external scripts.
@rickstaa
Copy link
Member Author

@rickstaa @leszko @victorges Could we discuss dl_checkpoints.sh that downloads models using huggingface-cli download. Should we tie it in the ai-worker and ai-models.json so only actually needed models are downloaded and it's done automatically (like with the docker images in this PR). huggingface-cli download by default downloads from the latest commit on the main branch. To download from a specific revision (commit hash, branch name or tag) we should use the --revision option if we want to manage the rollout precisely.

@pwilczynskiclearcode we decided to tackle the model downloading in a seperate pull request later (See https://discord.com/channels/423160867534929930/1309252301935870036/1310633568833376388).

Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments, but looks good in general. I wonder if there is any related go-livepeer PR (where this functions are actually used?) ?

worker/worker.go Show resolved Hide resolved
worker/docker.go Outdated Show resolved Hide resolved
worker/docker.go Show resolved Hide resolved
rickstaa added a commit to livepeer/go-livepeer that referenced this pull request Nov 26, 2024
This commit ensures that the docker containers the worker uses are
pulled during startup. They use the new changes implemented in
livepeer/ai-runner#200.
rickstaa added a commit to livepeer/go-livepeer that referenced this pull request Nov 26, 2024
This commit ensures that the docker containers the worker uses are
pulled during startup. They use the new changes implemented in
livepeer/ai-runner#200.
This improves the image status pull loging.
Copy link
Contributor

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

worker/docker.go Show resolved Hide resolved
@rickstaa rickstaa merged commit b872da9 into main Nov 28, 2024
1 check passed
@rickstaa rickstaa deleted the auto_pull_containers branch November 28, 2024 09:13
rickstaa added a commit to livepeer/go-livepeer that referenced this pull request Nov 28, 2024
This commit ensures that the docker containers the worker uses are
pulled during startup. They use the new changes implemented in
livepeer/ai-runner#200.
rickstaa added a commit to livepeer/go-livepeer that referenced this pull request Dec 4, 2024
This commit ensures that the docker containers the worker uses are
pulled during startup. They use the new changes implemented in
livepeer/ai-runner#200.
rickstaa added a commit to livepeer/go-livepeer that referenced this pull request Dec 4, 2024
This commit ensures that the docker containers the worker uses are
pulled during startup. They use the new changes implemented in
livepeer/ai-runner#200.
@rickstaa rickstaa mentioned this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants