Skip to content

Commit

Permalink
driver/docker: protect against nil container
Browse files Browse the repository at this point in the history
Protect against a panic when we attempt to start a container with a name
that conflicts with an existing one.  If the existing one is being
deleted while nomad first attempts to create the container, the
createContainer will fail with `container already exists`, but we get
nil container reference from the `containerByName` lookup, and cause a
crash.

I'm not certain how we get into the state, except for being very
unlucky.  I suspect that this case may be the result of a concurrent
restart or the docker engine API not being fully consistent (e.g. an
earlier call purged the container, but docker didn't free up resources
yet to create a new container with the same name immediately yet).

If that's the case, then re-attempting creation will hopefully succeed,
or we'd at least fail enough times for the alloc to be rescheduled to
another node.
  • Loading branch information
Mahmood Ali committed Apr 19, 2020
1 parent 71744bc commit 9db46fd
Showing 1 changed file with 15 additions and 10 deletions.
25 changes: 15 additions & 10 deletions drivers/docker/driver.go
Original file line number Diff line number Diff line change
Expand Up @@ -439,16 +439,21 @@ CREATE:
return container, nil
}

// Delete matching containers
err = client.RemoveContainer(docker.RemoveContainerOptions{
ID: container.ID,
Force: true,
})
if err != nil {
d.logger.Error("failed to purge container", "container_id", container.ID)
return nil, recoverableErrTimeouts(fmt.Errorf("Failed to purge container %s: %s", container.ID, err))
} else {
d.logger.Info("purged container", "container_id", container.ID)
// Purge conflicting container if found.
// If container is nil here, the conflicting container was
// deleted in our check here, so retry again.
if container != nil {
// Delete matching containers
err = client.RemoveContainer(docker.RemoveContainerOptions{
ID: container.ID,
Force: true,
})
if err != nil {
d.logger.Error("failed to purge container", "container_id", container.ID)
return nil, recoverableErrTimeouts(fmt.Errorf("Failed to purge container %s: %s", container.ID, err))
} else {
d.logger.Info("purged container", "container_id", container.ID)
}
}

if attempted < 5 {
Expand Down

0 comments on commit 9db46fd

Please sign in to comment.