Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple podman sockets #371

Merged
merged 28 commits into from
Oct 3, 2024

Conversation

adriaandens
Copy link
Contributor

@adriaandens adriaandens commented Jul 24, 2024

Problem description: I'm using nomad with this driver to run workloads on several VPSs and I want to have support to run containers in rootless mode under different low-privilege users. The current driver only allows you to specify one rootless socket in the driver configuration. Having different host users run containers makes sure that the container UIDs don't map to the same user on the host.

Fix: I have added support for multiple "socket" blocks in the driver configuration. The Tasks can reference a socket using the name of the socket. As far as I have manually tested I have made it backwards compatible (but mutually exclusive) to be combined with the existing socket_path. I've added some code so there's always a default podman client option even if it's not specified by the user (preferably this is tackled in the HCL parsing but I didn't find how I could it handle this as an error early in the spec parsing).

Would resolve #284 (you can have a root socket and a low-priv user socket), resolve #84 (it implements exactly as the title asks)

In the driver config you can make multiple socket blocks mentioning the socket path, the host user and give it a name :

plugin "nomad-driver-podman" {
  config {
    socket {
      name = "default"
      socket_path = "unix://run/user/1000/podman/podman.sock"
    }
    socket {
      name = "anotherName"
      socket_path = "unix://run/user/1001/podman/podman.sock"
    }
    # Errors with a log entry in journald if we set both the socket and the socket_path
    #socket_path = "unix://run/user/1000/podman/podman.sock"
    #disable_log_collection = true
  }
}

In the task HCL we can specify the socket name from the driver:

job "deploy_registry" {
  datacenters = ["GRA"]

  group "reggie" {
    network {
      port "registry_port" {
        to = 5000
      }
      port "another_port" {
        to = 5001
      }
    }   

    task "deploy_registry_container" {
      driver = "podman"
      config {
        image = "localhost/mijnreggie:latest"
        ports = ["registry_port"]
        privileged = false
        socket = "default" # <-------------------------------------- HERE
        logging = {
          driver = "journald"
          options = [
            {
              "tag" = "reggieeeee"
            }
          ]
        }
      }
    }   

    task "deploy_registry_container_as_ansible" {
      driver = "podman"
      config {
        image = "localhost/mijnreggie:latest"
        ports = ["another_port"]
        privileged = false
        socket = "anotherName" # <-------------------------------------- AND HERE
        logging = {
          driver = "journald"
          options = [
            {
              "tag" = "reggieeeee"
            }
          ]
        }
      }
    }   
  }
}

The image localhost/mijnreggie being referenced above was built with podman from the Dockerfile below:

FROM registry:latest
RUN /usr/sbin/addgroup owla
RUN /usr/sbin/adduser -D -u 1337 -G owla owla
USER owla

Since podman reads containers from the local user registry, it needs to be available on each user that will run the container.

Output of nomad running the job:

$ nomad job run registry.hcl
==> 2024-07-24T19:36:52Z: Monitoring evaluation "69ce44dd"
    2024-07-24T19:36:52Z: Evaluation triggered by job "deploy_registry"
    2024-07-24T19:36:53Z: Evaluation within deployment: "6c924bcd"
    2024-07-24T19:36:53Z: Allocation "622cb564" created: node "c4616d09", group "reggie"
    2024-07-24T19:36:53Z: Evaluation status changed: "pending" -> "complete"
==> 2024-07-24T19:36:53Z: Evaluation "69ce44dd" finished with status "complete"
==> 2024-07-24T19:36:53Z: Monitoring deployment "6c924bcd"
  ✓ Deployment "6c924bcd" successful
    
    2024-07-24T19:37:06Z
    ID          = 6c924bcd
    Job ID      = deploy_registry
    Job Version = 46
    Status      = successful
    Description = Deployment completed successfully
    
    Deployed
    Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
    reggie      1        1       1        0          2024-07-24T19:47:04Z

The processes as seen with ps:

# ps -ef | grep "registry serve"
101336    271845  271843  0 19:36 ?        00:00:00 registry serve /etc/docker/registry/config.yml
166872    271868  271866  0 19:36 ?        00:00:00 registry serve /etc/docker/registry/config.yml

/etc/subuid maps the two container users to subordinate users on the host (so the UID above is begin of range + uid 1337 - 1):

# cat /etc/subuid
debian:100000:65536
ansible:165536:65536
...

You can also inspect the containers with podman ps under the user running the container:

# su - debian
$ podman ps
CONTAINER ID  IMAGE                        COMMAND               CREATED        STATUS            PORTS                                                       NAMES
336f1a262f77  localhost/mijnreggie:latest  /etc/docker/regis...  6 minutes ago  Up 6 minutes ago  x.x.x.x:29351->5000/tcp, x.x.x.x:29351->5000/udp  deploy_registry_container-622cb564-624c-ec67-4b19-e2452459406c

To do:

  • Gather feedback on whether this request is mergeable in the future? Or whether it conflicts with the roadmap of this plugin.
  • Verify backward compatibility
  • Implement further tests
  • Squat more bugs
  • Update documentation
  • Path to deprecate previous single socket_path implementation
  • Other feedback

Feedback needed around using the "user" field: The user field in Task HCL is currently used to decide what user is run inside the container but this would better used to specify the host user that runs the container task. An additional parameter container_user could be added to give to podman commands to specify which user to run the CMD as inside the container. I understood this is for compatibility with the docker driver but it's confusing. If wanted, I can add this implementation to this PR too but it'll break backwards compatibility.

Not in scope of this PR: Fixing the open issue with FIFO logging #189 . This to me looks like a problem in nomad itself that creates a FIFO without looking at the task "user" field to set the owner correctly (newLogRotatorMapper calls the mkfifo here ). When specifying journald the driver does not crash because it avoids the FIFO issues. If we remap the user key to be the host user, I can make a PR to fix the FIFO problem (since then we know the podman user and can change the rights on the pipe)

Any feedback is welcome.

Copy link

hashicorp-cla-app bot commented Jul 24, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

Have you signed the CLA already but the status is still pending? Recheck it.

@adriaandens adriaandens marked this pull request as ready for review July 28, 2024 20:30
@adriaandens adriaandens requested a review from a team as a code owner July 28, 2024 20:30
@adriaandens adriaandens changed the title [DRAFT] Support for multiple podman sockets Support for multiple podman sockets Jul 31, 2024
@adriaandens
Copy link
Contributor Author

@tgross Do you have any feedback on this PR? Or time in the coming weeks to review this? It's not urgent but I don't want it to die a silent dead either.

@tgross
Copy link
Member

tgross commented Sep 3, 2024

Hi @adriaandens, I've been on PTO the last couple weeks. I've added this to our triage board though.

@skoppe
Copy link

skoppe commented Sep 26, 2024

Awesome stuff, and would remove an ugly workaround we have for running both rootful and rootless podman containers on the same node.

@tgross tgross self-requested a review September 27, 2024 14:55
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @adriaandens! I've left some comments around configuration especially that I think could use a little redesign, but overall this is looking really good!

Also, can you make sure all your commits are using the email address you signed the CLA with? That'll make the compliance people happy. (Feel free to squash)

api/api.go Outdated Show resolved Hide resolved
api/container_delete.go Show resolved Hide resolved
driver_test.go Outdated Show resolved Hide resolved
driver_test.go Outdated Show resolved Hide resolved
driver.go Outdated
taskConfig: taskState.TaskConfig,
procState: drivers.TaskStateUnknown,
startedAt: taskState.StartedAt,
exitResult: &drivers.ExitResult{},
logger: d.logger.Named("podmanHandle"),
logger: d.logger.Named("podmanHandle"), // TODO: does this need to be podmanClient aware?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice!

Copy link
Contributor Author

@adriaandens adriaandens Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to podman.<socket name> but I don't see it reflected in journald/journalctl output?

driver_test.go Outdated Show resolved Hide resolved
driver.go Show resolved Hide resolved
driver.go Outdated Show resolved Hide resolved
driver.go Outdated Show resolved Hide resolved
driver.go Show resolved Hide resolved
@tgross tgross self-assigned this Sep 27, 2024
@adriaandens
Copy link
Contributor Author

@tgross I made several new commits to this PR to take into account your comments. I'll test drive the podman driver in my dev setup to see if I hit any new bugs. The current tests in driver_test.go don't really test the multi podman setup right now since it creates a driver that uses the default podman (which will connect to a rootless podman if run as non-root, and the root podman socket if tests are ran as root), but there's no test that launches containers under different podman sockets.

If you're OK with the new HCL config implementation (a socket path and name (if omitted, resolves to "default")), I can update the README, docs, and also write some examples so it's easy for people to switch over their configs (goal is still to be backwards compatible with the old single socket_path, so there's no impact in upgrading the podman driver).

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriaandens there's a few items the linter just picked up and a couple of as-yet-unresolved comments from before. Another minor item I didn't notice before is that you made all the source files executable (i.e. changed from 0644 to 0755). Can you fix that too?

If you're OK with the new HCL config implementation (a socket path and name (if omitted, resolves to "default")), I can update the README, docs, and also write some examples so it's easy for people to switch over their configs

Sounds good! We'll want a changelog entry as well.

@tgross
Copy link
Member

tgross commented Oct 1, 2024

Sorry @adriaandens, the new HCL files need hclfmt'ing so the tests run, too.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
driver.go Show resolved Hide resolved
@adriaandens
Copy link
Contributor Author

@tgross I ran into the issue of the /dev/init vs. /run/podman-init because I'm running Podman 4.3.1 (Debian stable podman package on bookworm) whilst the Github Action uses 3.4.4. From going through the versions on the Podman docs, the /run/podman-init is mentioned from v4.2 onwards.

In the code, I've added an API version to the api.API struct, which I use in the TestPodmanDriver_Init test. But I imagine that the apiVersion can be more widely used to support more versions of Podman, and take different code paths based on the version.

For the Github Actions, it would (although outside of scope for this PR) make sense to run all the tests with multiple Podman versions: the minimal supported by this driver, latest, stable, latest from each major version (3.4.4, 4.9.3, 5.2.3), ... so that there's coverage of the most used versions.

@tgross
Copy link
Member

tgross commented Oct 2, 2024

I ran into the issue of the /dev/init vs. /run/podman-init because I'm running Podman 4.3.1 (Debian stable podman package on bookworm) whilst the Github Action uses 3.4.4. From going through the versions on the Podman docs, the /run/podman-init is mentioned from v4.2 onwards.

Yeah this really isn't part of the API per se but our test is making assertions on internal implementation details that we probably shouldn't be. Out of scope for this PR to worry about.

For the Github Actions, it would (although outside of scope for this PR) make sense to run all the tests with multiple Podman versions: the minimal supported by this driver, latest, stable, latest from each major version (3.4.4, 4.9.3, 5.2.3), ... so that there's coverage of the most used versions.

Probably a good idea, but yeah outside of the scope of this PR for sure.

I think we're getting close here, but TestPodmanDriver_Pull_Timeout is currently failing.

@adriaandens
Copy link
Contributor Author

I think we're getting close here, but TestPodmanDriver_Pull_Timeout is currently failing.

My PR removes the slowPodman driver (a driver with no http timeout) since having multiple podman sockets would mean additional slowPodman drivers for each. I fixed the issue now by using the streaming http client in the existing driver implementation if a context is passed with a deadline higher than the timeout of the http client in the api.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Spending a bit of time looking at StartTask is making me think there are some investments we could make in refactoring here. And as noted in the discussion, we should follow-up with improving the test matrix to cover versions of podman we intend to support (it's not clear to me what we've decided there). But this is good-to-merge as-is.

@tgross tgross merged commit e4a6006 into hashicorp:main Oct 3, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

Rootless in combination with rootful Run rootless containers while running nomad as root
3 participants