Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support network bridge mode #36

Closed
Tracked by #220
drewbailey opened this issue Jun 29, 2020 · 12 comments
Closed
Tracked by #220

Support network bridge mode #36

drewbailey opened this issue Jun 29, 2020 · 12 comments

Comments

@drewbailey
Copy link
Contributor

Driver currently supports bridge network mode via task config but not from a driver and task group perspective.

Support the connect demo

job "countdash" {
  datacenters = ["dc1"]

  group "api" {
    network {
      mode = "bridge"
    }

    service {
      name = "count-api"
      port = "9001"

      connect {
        sidecar_service {}
      }
    }

    task "web" {
      driver = "podman"

      config {
        image = "hashicorpnomad/counter-api:v1"
      }
    }
  }

  group "dashboard" {
    network {
      mode = "bridge"

      port "http" {
        static = 9002
        to     = 9002
      }
    }

    service {
      name = "count-dashboard"
      port = "9002"

      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8081
            }
          }
        }
      }
    }

    task "dashboard" {
      driver = "podman"

      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
      }

      config {
        image = "hashicorpnomad/counter-dashboard:v1"
      }
    }
  }
}
@maartenbeeckmans
Copy link

due to this change it does not seem possible to use consul connect with the podman driver

@deepbluemussel
Copy link

due to this change it does not seem possible to use consul connect with the podman driver

@maartenbeeckmans Have you been able to use podman and connect? I'm only facing missing drivers issue whenever I use podman with connect.

@maartenbeeckmans
Copy link

I was not able to use it, tried to modify the example by setting the task driver to podman and the sidecar_task.driver to podman but had several issues.

Biggest issue was the support for bridge mode on group level instead of task level, which is a requirement for consul connect iirc.

@deepbluemussel
Copy link

I struggled too with making this example work. Everything is correctly set up because if I use the docker driver it works but not when I use Podman (which is correctly working for regular jobs).
Well, let's hope for our maintainers to have time to work on this big improvement.

@primeos-work
Copy link

Driver currently supports bridge network mode via task config but not from a driver and task group perspective.

If I didn't miss anything this should be / is supported (or did you get any error messages related to this or is it documented somewhere?).
From the README:

By default the task uses the network stack defined in the task group, see network Stanza. If the groups network behavior is also undefined, it will fallback to bridge in rootful mode or slirp4netns for rootless containers.

  • bridge: create a network stack on the default podman bridge.

And the "Features" section even claims that Consul Connect is supported (which conflicts with this issue and my experience - at least in terms of a practical setup):

Support for nomad shared network namespaces and consul connect

However, it currently seems to be broken because the loopback interface ("lo") doesn't get initizlized properly. This bug is tracked via hashicorp/nomad#10014 and affects at least the "exec" and "podman" drivers. There are two hacky workarounds though:

  1. Executing ip -n "$NS" link set lo up from the host (from the root network namespace / outside the Podman container). This can be automated via scripts (e.g., using inotify (example)). I did sucessfully test this approach with the "exec" driver and a Python script. (Note: The best way to get the Envoy binary is apparently to copy it from the container image.)
  2. Using an additional task with the "raw_exec" driver (less isolated but apparently it also runs in the same network namespace). An example can be seen here: https://discuss.hashicorp.com/t/consul-connect-envoy-without-docker/4824/7 (but one needs to enable the "raw_exec" driver first and I didn't test it).

I'm only facing missing drivers issue whenever I use podman with connect.

That should be because of the following:

      connect {
        sidecar_service {}
      }

This uses the Docker driver by default:

The default Envoy task is equivalent to the configuration shown here: https://www.nomadproject.io/docs/job-specification/sidecar_task#default-envoy-configuration

The solution is to use the following:

      connect {
        sidecar_service {}
        sidecar_task {
          driver = "podman"
          config {
            image = "docker.io/envoyproxy/envoy:v1.21.1"
            # image = "localhost/envoy-podman:v1.21.1"
            command = "/docker-entrypoint.sh"
            args = [
              "-c",
              "${NOMAD_SECRETS_DIR}/envoy_bootstrap.json",
              "-l",
              "${meta.connect.log_level}",
              "--concurrency",
              "${meta.connect.proxy_concurrency}",
              "--disable-hot-restart"
            ]
          }
        }
      }

However, in addition to the aforementioned issue with the loopback interface, I hit two more issues when using the Envoy image with Podman through Nomad:

  1. I got the chown: changing ownership of '/dev/std{out,err}': Permission denied errors (there are multiple upstream issues regarding this, e.g., In Podman or Docker: Permission denied on SELinux machines envoyproxy/envoy#14787). I worked around this by simply removing the two commands from docker-entrypoint.sh (the permissions are already fine).
  2. On "Rocky Linux 8.5 (Green Obsidian)" with SELinux enabled the container user couldn't access /secrets/envoy_bootstrap.json.

Anyway, the tl;dr is that I cannot recommend trying to get Consul Connect working with the "podman" (or "exec") driver at this point. It should be possible (I at least managed to get it working with the "exec" driver) but it isn't pretty/practical at all.

The most important blocker is hashicorp/nomad#10014. And after that the Envoy container image needs to be improved to work with SELinux+Nomad+Podman (but IIRC it was working fine without Nomad (i.e., SELinux+Podman) so this might rather need fixes in nomad-driver-podman than a docker-entrypoint.sh workaround.

@l-monninger
Copy link

@primeos-work Tagged you in a comment on a dicuss.hashicorp thread as well. Sorry if that's annoying.

Anyway, the tl;dr is that I cannot recommend trying to get Consul Connect working with the "podman" (or "exec") driver at this point. It should be possible (I at least managed to get it working with the "exec" driver) but it isn't pretty/practical at all.

Just to be sure, are you saying that it's not worthwhile to try and use the exec driver when in bridge mode?

@primeos-work
Copy link

No, I was just saying that IMO the Nomad + Podman + Consul Connect seemed too much trouble at the time to use it (especially in production). Now, with hashicorp/nomad#10014 resolved, it should be fine(-ish). I guess we can close this issue too now? I did at least manage to get the Consul Connect demo working with the podman driver (not that pretty yet but doable). And the network bridge mode should work with all drivers now (since/with hashicorp/nomad#13428).

@l-monninger
Copy link

Huh, weird. In the situation referenced in that discuss.hashicorp post, I could not do anything network-based in the exec task. Do you have an example set of configs you could send my way?

@ZackaryWelch
Copy link

Also running into the same driver issue when trying to use consul as the service provider.

@p1u3o
Copy link

p1u3o commented Mar 30, 2023

Also running into the same driver issue when trying to use consul as the service provider.

You need to grab the sidecar_task block here and change the driver to Podman.
https://developer.hashicorp.com/nomad/docs/job-specification/sidecar_task

Consul connec and bridge on a group works for me in the latest Nomad versions on Rocky 9.

@shoenig shoenig self-assigned this May 3, 2023
@shoenig
Copy link
Member

shoenig commented May 3, 2023

Hi folks, starting with Nomad v1.6 (and a nomad-driver-podman release shortly thereafter) (ETA mid-June-ish), the Connect with Podman story should be significantly improved. The goal of hashicorp/nomad#17042 is to make sure we can use podman and Connect jobs without extra configuration (other than specifying driver = "podman" and installing the podman task driver).

So far things look good with ubuntu 22.04 and podman v3.4.4. I still need to verify the RHEL and podman 4 side of things. If there are additional issues we can track them in the ticket above. I believe hashicorp/nomad#13428 / hashicorp/nomad#10014 resolved the bridge mode issues, so I'll go ahead and close out this ticket.

PS - Thanks to everyone who has helped by investigating or fixing issues - this driver has been a monumental community effort and wouldn't be possible without ya'll!

@shoenig shoenig closed this as completed May 3, 2023
@Allan-Nava
Copy link

Only with Nomad v1.6 it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests