Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: hostname not populated for /etc/hosts in bridge networks. #105

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shishir-a412ed
Copy link
Contributor

Signed-off-by: Shishir Mahajan [email protected]

@shishir-a412ed
Copy link
Contributor Author

@notnoop @tgross

I am trying to address Issue #103 in containerd-driver. This is the approach I am taking based on hashicorp/nomad#10766 (Let me know if I missed anything)

  • Currently, there is already a check in containerd-driver which doesn't allow the user to run the job in both host network mode (host_network=true) and bridge network mode (network { mode = "bridge" }) [Already exists in current codebase]
  1. If user sets host_network=true in containerd-driver config: I use the host's /etc/hosts and add any extra_hosts to it, and mount that into the container. [This exists in the codebase today]

  2. If the user sets network mode bridge (cfg.NetworkIsolation != nil), I am using this function hostnames.GenerateEtcHostsMount to generate the /etc/hosts mount. However, this is not working as expected for some reason. I did some debugging and looks like cfg.NetworkIsolation.HostConfig is coming as nil. I am not sure why, but this might be causing the /etc/hosts to not get populated correctly. [New Addition]

  3. Default case: User doesn't set anything. containerd-driver builds a default version of /etc/hosts file, add any extra_hosts that user is asking for, and mounts that into the container [This exists in the codebase today]

Any pointers in how to integrate (2) [https://github.com/hashicorp/nomad/pull/10766] in containerd-driver?

@shishir-a412ed
Copy link
Contributor Author

shishir-a412ed commented Jul 29, 2021

@tgross @notnoop
So I did a little more debugging. Looks like with docker driver (where everything is working correctly) with Nomad 1.1.3 which includes this patch (Where you moved the hosts file from task dir to alloc dir so that it can be shared by all the tasks).

The hosts file is created correctly under the alloc directory.

root@vagrant:/tmp/NomadClient203041423/aa5e1bc3-4b5b-0c53-9f7e-6cb8b3237408# ls -lt
total 12
-rw-r--r-- 1 root   root     346 Jul 29 23:35 hosts
drwxrwxrwx 5 nobody nogroup 4096 Jul 29 23:35 redis-task
drwxrwxrwx 5 nobody nogroup 4096 Jul 29 23:35 alloc
root@vagrant:/tmp/NomadClient203041423/aa5e1bc3-4b5b-0c53-9f7e-6cb8b3237408# cat hosts
# this file was generated by Nomad
127.0.0.1 localhost
::1 localhost
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

# this entry is the IP address and hostname of the allocation
# shared with tasks in the task group's network
<ip_address> 6f3c7ffa7b44

However, if I switch to containerd-driver, the hosts file is not even created.

root@vagrant:/tmp/NomadClient203041423/2b33c69f-c13a-2f0b-4f75-71941425065b# ls
alloc  redis-task

This is the job I am using:

job "redis" {
  datacenters = ["dc1"]

  group "redis-group" {
    network {
      mode = "bridge"
    }
    task "redis-task" {
      driver = "containerd-driver"

      config {
        image    = "redis:alpine"
      }

      resources {
        cpu        = 500
        memory     = 256
        memory_max = 512
      }
    }
  }
}

@notnoop
Copy link
Contributor

notnoop commented Aug 2, 2021

Thanks @shishir-a412ed for the ping. I'm investigating the expected network invariants here and will follow up with concrete suggestions (or a proposed changes).

So /etc/hosts should be populated with the hostname for each container. My understanding is that Docker required some custom handling that necessities having Docker create the network using a pause container and joining the task containers with the pause container network namespace. In that setup, all containers will have the same hostname, so we chose to create a single /etc/hosts file and share it with all containers.

Examining containerd behavior, it seems we don't have the same limitation, and each container has its own hostname despite being in the same namespace, and without requiring a pause container. If that works, then we probably should have a new unique /etc/hosts for each task containerd container, that includes the container hostname, the container ID, and bind mount that.

So the question I have is: Does having having containers with different hostnames yet in the same namespace and with same ip address has adverse effects workloads? Let me know if you have any opinions, I'll double check and follow up.

@tgross
Copy link

tgross commented Aug 2, 2021

My understanding from when I did the Docker implementation and looked at exec was that the network_manager_linux code will call into the Docker driver to create the network namespace whenever we're in network.mode = "bridge" on Linux. Because that network namespace can be shared across multiple tasks, each of which can use a different task driver, the task drivers can't assume that they "own" the network namespace. A task driver can ignore that network namespace entirely, but then it won't be in the same netns as other tasks.

As for the impact of multiple hostnames within a single network namespace, I suspect where things will get weird is in application metrics, because tasks that ship with metrics libraries will report different container hostnames, none of which will match even though they're all in the same alloc.

(Also, just a heads up @shishir-a412ed that I'm not at HashiCorp anymore, although I'm always happy to take a look at what y'all are up to!)

@shishir-a412ed
Copy link
Contributor Author

@notnoop Apologies for the delayed response. After thinking a bit more about this, I am still convoluted on what would be the best way to address this?

In the docker driver, a pause container is setup (per allocation/task group) whose /etc/hosts can then be shared by all the tasks in the allocation. Each task in the allocation get the same hostname which is the container ID of the pause container.

As you mentioned this is not a constraint in the containerd task driver, since there is no pause container and each task can have a unique hostname. We can assign the container ID (which will be unique for the container) as the hostname for each task.
(Not sure if this will conflict with the IP, since all containers have a common IP)

I think my question was specific to, when I call hostnames.GenerateEtcHostsMount with the TaskDir() it should return me an /etc/hosts mount which I can then bind-mount into the containerd container. However, this was not working so I was wondering if my approach is right.

As in, If the task driver is responsible for creating the /etc/hosts even in the case of bridge network or Nomad will be creating one and supplying it to the task drivers which can then be bind-mounted into the container.

@tgross Sad to see you go. Thank you for all the contributions and quick responses/reviews. Being a maintainer is hard! and you did a phenomenal job. Looking forward to what you create next!

@shishir-a412ed shishir-a412ed force-pushed the fix_103 branch 2 times, most recently from 8eb8bf9 to 298308d Compare September 13, 2021 21:32
@Oloremo
Copy link

Oloremo commented Apr 13, 2022

@shishir-a412ed any updates on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants