Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client network_interface config doesn't parse sockaddr templates #3675

Closed
sepulworld opened this issue Dec 20, 2017 · 23 comments · Fixed by #10404
Closed

client network_interface config doesn't parse sockaddr templates #3675

sepulworld opened this issue Dec 20, 2017 · 23 comments · Fixed by #10404
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/client theme/networking type/enhancement
Milestone

Comments

@sepulworld
Copy link
Contributor

sepulworld commented Dec 20, 2017

Nomad version

0.7.1

Operating system and Environment details

Ubuntu 16.04

Issue

Unable to specify network_interface option for an alias interface, eth0:1

Reproduction steps

Assign an interface eth0:1 (Linode uses these for private address space)

eth0:1    Link encap:Ethernet  HWaddr f2:3c:23:b1:45:ff
          inet addr:192.168.122.21  Bcast:0.0.0.0  Mask:255.255.128.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
/usr/local/bin/nomad agent -config=/etc/nomad -log-level=DEBUG -network-interface=eth0:1
    Loaded configuration from /etc/nomad/config.json
==> Starting Nomad agent...
==> Error starting agent: client setup failed: fingerprinting failed: Error while detecting network interface during fingerprinting: route ip+net: no such network interface
    2017/12/19 23:25:33.387588 [INFO] client: using state directory /opt/nomad/client
    2017/12/19 23:25:33.387751 [INFO] client: using alloc directory /opt/nomad/alloc
    2017/12/19 23:25:33.390788 [DEBUG] client: built-in fingerprints: [arch cgroup consul cpu host memory network nomad signal storage vault env_gce env_aws]
    2017/12/19 23:25:33.390971 [INFO] fingerprint.cgroups: cgroups are available
    2017/12/19 23:25:33.391097 [DEBUG] client: fingerprinting cgroup every 15s
    2017/12/19 23:25:33.392795 [INFO] fingerprint.consul: consul agent is available
    2017/12/19 23:25:33.392955 [DEBUG] client: fingerprinting consul every 15s
    2017/12/19 23:25:33.392966 [DEBUG] fingerprint.cpu: frequency: 2799 MHz
    2017/12/19 23:25:33.392970 [DEBUG] fingerprint.cpu: core count: 2
@sepulworld
Copy link
Contributor Author

Same behavior on 0.6.0 and 0.7.1

@HanSooloo
Copy link

Having a similar problem on DigitalOcean. They add a 10.x.x.x control IP address to eth0, which gets picked up by Nomad causing all sorts of problems.

Would be nice to be able to blacklist an IP or have more granular controls over interfaces/IPs.

@sepulworld
Copy link
Contributor Author

Yes, this bug makes it hard to rollout Nomad on tier 2 cloud providers like Digital Ocean and Linode. These providers use network interface alias on their VMs.

@sepulworld
Copy link
Contributor Author

Error comes from net library here: https://golang.org/src/net/interface.go?s=4532:4585#L153 falls through to line 169

@sepulworld
Copy link
Contributor Author

sepulworld commented Jan 11, 2018

Looking through net library and I don't see a way to reference a network alias as its own interface. eth1:0 is the eth1 interface according to the net library.

// Add the network resources to the node
node.Resources.Networks = nwResources
for _, nwResource := range nwResources {
f.logger.Printf("[DEBUG] fingerprint.network: Detected interface %v with IP: %v", intf.Name, nwResource.IP)
}
// Deprecated, setting the first IP as unique IP for the node
if len(nwResources) > 0 {
node.Attributes["unique.network.ip-address"] = nwResources[0].IP
}

I think we might need to add an additional option

network_interface = eth1
network_interface_alias_number = 0

Knowing the alias number perhaps we can ask for it in nwResources

@dmitrif
Copy link

dmitrif commented Jan 22, 2018

Having same issue with Linode. Does anyone have a workaround by chance?

@sean-
Copy link
Contributor

sean- commented Jan 22, 2018

@dmitrif You can run your config through sockaddr for the time being: https://github.com/hashicorp/go-sockaddr/

Consul already has native support for this functionality.

@c4milo
Copy link
Contributor

c4milo commented Mar 14, 2018

I've been bitten by this as well, is there any update?

@vasekboch
Copy link

I've been researching similar issue, related to this. I have same setup. On adapter eth0, I have public ip and aliased internal one on eth:100. Network fingerprinter pickes this as 2 separate network resources.

I would be cool to have some control over this. Because almounst all the time I want to bind everything to internal network. Because all the services are made public by ReverseProxy. But currently everything binds to public IP because its first. So I need to listen on 0.0.0.0 as workaround, so the service is available over internal network. And block everyhing public on firewall.

But this also causes issue, that logically I can have two allocations with same port on same machine. One with Public IP and one with Private IP. The second one logically fails to start because of the 0.0.0.0.

The simple possisble solution add some part of network resouces to blacklist. Simillarly as https://www.nomadproject.io/docs/configuration/client.html#quot-fingerprint-network-disallow_link_local-quot-
Or you could specify IP adresses in https://www.nomadproject.io/docs/configuration/client.html#reserved-resources
The best solution would be to categorize the network resources and in each job you could specify what network do you want.
I'm newbie to GoLang, but I could try to send a PR (for the blacklist IP option). But there are multiple different approaches. What do you recommend? Do you have some plans for supporting this?

@dradtke
Copy link

dradtke commented Jun 20, 2019

I've run into this when on Linode, too. My node has eth0 set up as something like this:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether f2:3c:91:7e:55:1a brd ff:ff:ff:ff:ff:ff
    inet 72.14.190.210/24 brd 72.14.190.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.167.74/17 brd 192.168.255.255 scope global eth0:1
       valid_lft forever preferred_lft forever

My service then binds to NOMAD_ADDR_http, which uses the public IP 72.14.190.210. However, Linode NodeBalancers need the service to be listening on the private IP 192.168.167.74, but setting client.network_interface to eth0 defaults to the public IP, and eth0:1 doesn't work.

@Gurpartap
Copy link

Gurpartap commented Jun 22, 2019

Since Linode does not offer a separate network interface device with their private networking setup, the private IP, by default, is added as an alias to public network device, eth0.

> cat /etc/network/interfaces
# Generated by Linode Network Helper
# Fri Jun 21 12:44:06 2019 UTC
#
# This file is automatically generated on each boot with your Linode's
# current network configuration. If you need to modify this file, please
# first disable the 'Auto-configure Networking' setting within your Linode's
# configuration profile:
#  - https://manager.linode.com/linodes/config/workerpool1-node2?id=15820662
#
# For more information on Network Helper:
#  - https://www.linode.com/docs/platform/network-helper
#
# A backup of the previous config is at /etc/network/.interfaces.linode-last
# A backup of the original config is at /etc/network/.interfaces.linode-orig
#
# /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
allow-hotplug eth0

iface eth0 inet6 auto

iface eth0 inet static
    address 72.14.190.210/24
    gateway 72.14.190.1
    up   ip addr add 192.168.167.74/17 dev eth0 label eth0:1
    down ip addr del 192.168.167.74/17 dev eth0 label eth0:1

This means that Nomad (+ private ip for services) is not usable on Linode as is (cc @angrycub), unless we're able to tell nomad to use aliased ip based on either their label, like network_interface = "eth0:1", or by using various matchers as available with go-sockaddr library.

Until such a facility is built into nomad, as a workaround, the whole cluster would require a new dummy interface for nomad to pick up the private address from, which is undesirable. Is there another workaround?

@Gurpartap
Copy link

Here's what I ended up doing to get this working. Requires that you know the IP you want to assign for your nomad scheduled tasks.

Add a dummy interface with private ip cidr

> ip link add dummy10 type dummy
> ip addr add 192.168.x.x/17 dev dummy10 # Linode uses /17 for private network

Edit nomad config to have the scheduler use ip from dummy10 for allocating tasks

> vim /opt/nomad/config/default.hcl

# ...
log_level = "DEBUG"

client {
  enabled = true
  network_interface = "dummy10"
}
# ...

Read the debug logs to ensure expected behaviour

2019-06-22T11:16:11.553Z [DEBUG] client.fingerprint_mgr.network:
detected interface IP: interface=dummy10 IP=192.168.167.74
...
2019-06-22T11:16:49.197Z [DEBUG] client.driver_mgr.docker:
allocated static port: driver=docker task_name=haproxy ip=192.168.167.74 port=443

Looks like we got it right. Our nomad scheduled job (haproxy) is serving on private ip set in dummy10 interface.

This was easier than I thought it would be. Lovely.

@Gurpartap
Copy link

In order to persist the above mentioned dummy interface across restarts, etc., I used ansible to create a systemd managed network configuration across all of the nomad client nodes.

The result was something equivalent of this on each client node:

> cat /etc/systemd/network/10-dummy10.netdev 
[NetDev]
Name=dummy10
Kind=dummy
> cat /etc/systemd/network/20-dummy10.network 
[Match]
Name=dummy10

[Network]
Address=192.168.x.x/17
> systemctl daemon-reload
> systemctl restart systemd-networkd

@dmitrif
Copy link

dmitrif commented Jun 23, 2019

@Gurpartap Seeing as there is only one private IP, and dummy interfaces drop all packets sent to them, this doesn't work as networkd then disables the eth0:1 alias.

@dmitrif
Copy link

dmitrif commented Jun 23, 2019

@sean- Not sure I follow.. Which param would we be using in this case?

@Gurpartap
Copy link

Gurpartap commented Jun 23, 2019

@Gurpartap Seeing as there is only one private IP, and dummy interfaces drop all packets sent to them, this doesn't work as networkd then disables the eth0:1 alias.

Works as intended on my consul nomad cluster on Linode.

Nomad does not send any packets on the network_interface. Afaict, this config is only used for determining the nomad client's IP addresses (which is also assigned to tasks).

// Interface to use for network fingerprinting
NetworkInterface string `hcl:"network_interface"`

func (f *NetworkFingerprint) Fingerprint(req *FingerprintRequest, resp *FingerprintResponse) error {
cfg := req.Config
// Find the named interface
intf, err := f.findInterface(cfg.NetworkInterface)

@dmitrif
Copy link

dmitrif commented Jun 23, 2019 via email

@stale
Copy link

stale bot commented Sep 21, 2019

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

@Gurpartap
Copy link

In order to persist the above mentioned dummy interface across restarts, etc., I used ansible to create a systemd managed network configuration across all of the nomad client nodes.

The result was something equivalent of this on each client node:

> cat /etc/systemd/network/10-dummy10.netdev 
[NetDev]
Name=dummy10
Kind=dummy
> cat /etc/systemd/network/20-dummy10.network 
[Match]
Name=dummy10

[Network]
Address=192.168.x.x/17
> systemctl daemon-reload
> systemctl restart systemd-networkd

I have to systemctl restart systemd-networkd on a reboot to bring dummy10 interface up.

If someone knows a way to ensure dummy10 comes up automatically on boot, kindly let us know.

@stale
Copy link

stale bot commented Dec 21, 2019

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

@stale
Copy link

stale bot commented Jan 20, 2020

This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem 👍

@stale stale bot closed this as completed Jan 20, 2020
@nickethier nickethier reopened this Jan 20, 2020
@tgross tgross added the stage/accepted Confirmed, and intend to work on. No timeline committment though. label Aug 24, 2020
@tgross tgross self-assigned this Sep 14, 2020
@tgross
Copy link
Member

tgross commented Sep 15, 2020

I wanted to do some follow-up on this to clarify the issue. As others have noted, Nomad's network fingerprinting relies on the golang stdlib to parse the network interfaces. At network.go#L52 we call into net.InterfaceByName.

We can see the results of this if we spin up a DO droplet with private networking and IPv6 enabled. Networking configuration on the host:

root@ubuntu-s-1vcpu-1gb-nyc1-01:~# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 5a:a2:71:26:57:e3 brd ff:ff:ff:ff:ff:ff
    inet 157.230.14.68/20 brd 157.230.15.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.10.0.5/16 brd 10.10.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::58a2:71ff:fe26:57e3/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether e2:c0:c0:8d:d5:39 brd ff:ff:ff:ff:ff:ff
    inet 10.116.0.2/20 brd 10.116.15.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::e0c0:c0ff:fe8d:d539/64 scope link
       valid_lft forever preferred_lft forever

A simple golang program to read out the interfaces:

package main

import (
        "fmt"
        "net"
)

func main() {
        ifaces, err := net.Interfaces()
        if err != nil {
                panic(err)
        }
        for _, iface := range ifaces {
                fmt.Printf("%#v\n", iface)
        }
}

And the results:

root@ubuntu-s-1vcpu-1gb-nyc1-01:~# go run ./main.go
net.Interface{Index:1, MTU:65536, Name:"lo", HardwareAddr:net.HardwareAddr(nil), Flags:0x5}
net.Interface{Index:2, MTU:1500, Name:"eth0", HardwareAddr:net.HardwareAddr{0x5a, 0xa2, 0x71, 0x26, 0x57, 0xe3}, Flags:0x13}
net.Interface{Index:3, MTU:1500, Name:"eth1", HardwareAddr:net.HardwareAddr{0xe2, 0xc0, 0xc0, 0x8d, 0xd5, 0x39}, Flags:0x13}

Using a sockaddr template is the way to get the configuration we want when we have this sort of situation where a single interface has multiple IPs:

log_level = "DEBUG"

data_dir = "/var/nomad"

# this will bind correctly
bind_addr = "{{ GetAllInterfaces | include \"name\" \"eth0\" | exclude \"type\" \"IPv6\" | sort \"-p\
rivate\" | limit 1 | attr \"address\" }}"

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  # this will not... see below
  network_interface = "{{ GetAllInterfaces | limit 1 }}"
  enabled = true
}

Unfortunately although sockaddr templates work just fine with bind_addr, it looks like we aren't parsing them at all when it comes to the client.network_interface configuration. In the config file above, if we omit the network_interface it works and binds to the public IP address on eth0, if we include it we get the following error:

==> Error starting agent: client setup failed: fingerprinting failed: Error while detecting network interface {{ GetAllInterfaces | limit 1 }} during fingerprinting: route ip+net: no such network interface

For clarity I'm going to rename this issue title so it can be properly triaged for future work. cc @galeep

@tgross tgross changed the title Unable to specify network_interface option for an alias interface, eth0:1 client network_interface config doesn't parse sockaddr templates Sep 15, 2020
@tgross tgross removed their assignment Sep 15, 2020
@tgross tgross added this to the 1.1.0 milestone Apr 20, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/client theme/networking type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.