DHCP not working - Transmit (Tx) checksum offload problem #1

gregoryolsen · 2016-06-07T06:04:52Z

Synopsis: Container can't get DHCP assigned IP address

The failed sequence looks like this:

Client DHCPDISCOVER(lxcbr0), followed by Server DHCPOFFER(lxcbr0)
Client does not follow with DHCPREQUEST as required per protocol
Consequently neither will DHCPACK be sent from server to acknowledge

Don't blame Devuan for this.

It's an old issue related to Linux transmit (Tx) checksum offload
handling for virtual devices such as:

bridges, veth devices, etc.

For a more in-depth explanation see Root Cause below.

Solution

Here's the solution I've used reliably for many years with my Xen VMs,
and more recently with LXC:

Install ethtool:
```
$ sudo apt-get install ethtool
```

Turn off Tx checksum offloading on the LXC bridge:

There's more than one way to do this, but the cleanest way
I've found is to simply add an "up" command to the bridge
interface definition.

Example 'up' command ($IFACE = bridge interface):

up /sbin/ethtool -K $IFACE tx off  # <== TURN OFF TX CHECKSUM OFFLOAD

I manually define all my bridges.

Example bridge definition in /etc/network/interfaces:

auto lxcbr0
iface lxcbr0 inet static
        pre-up    brctl addbr $IFACE
        address   10.0.0.1
        netmask   255.255.0.0
        network   10.0.0.0
        broadcast 10.0.255.255
        bridge_stp off                    # disable Spanning Tree Protocol
        bridge_waitport 0                 # no delay before a port becomes available
        bridge_fd 0                       # no forwarding delay
        up        ip link set $IFACE up
        up        /sbin/ethtool -K $IFACE tx off  # <== TURN OFF TX CHECKSUM OFFLOAD
        down      ip link set $IFACE down
        post-down brctl delbr $IFACE

CentOS/RHEL/SUSE example (untested):

Add to interface config /etc/sysconfig/network/ifcfg-lxcbr0:

ETHTOOL_OPTIONS='-K iface tx off'

Other solutions include, setting an iptables rule on either the
POSTROUTING or OUTPUT chains of the mangle table. At this time
I don't do this, therefore I don't have an example to provide.

IMHO, the iptables rule solution might be prone to breakage as
subsequent rule changes can lose the checksum rule.

However if maximum performance on a prod server is important,
I suggest using the OUTPUT chain of the mangle table to
calculate checksums for DHCP.

Root Cause

The root cause of the issue is a missing or incorrect checksum for packets
transmitted from virtual devices, and a DHCLIENT that rejects packets with
bad checksums.

There's a lot of confusion about the cause.

First and foremost, know this is not a problem with Devuan,
and nor is it a problem with LXC.

It's an old issue related to Linux transmit (Tx) checksum offload handling
for virtual devices:

bridge's, veth's, etc.

The kernel Linux defers calculating checksums until packet egress on
physical devices only.

Therefore this is not a kernel bug, but a design decision.

As I understand it, it's because there's no need to calculate checksum's
when packets transmit via in-memory copy, as is the case for virtual devices.
Makes sense. Why consume CPU performing the calculation when there's no
physical network?

There's also the partial checksum offload feature which was introduced
as a performance optimization. I don't fully understand it. Apparently it
has resulted in incorrect checksums that can cause DHCP to reject packets.

Disabling Tx checksum offload circumvents the issue.

On the surface it may seem like there's a problem with the DHCLIENT, especially
if other VMs/Containers work fine on the same bridge. However all this
implies is some patching has been done to accept packets with bad checksums.
Depending on ones perspective this is either good, or it's bad. IMHO, both
perspectives have some merit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DHCP not working - Transmit (Tx) checksum offload problem #1

DHCP not working - Transmit (Tx) checksum offload problem #1

gregoryolsen commented Jun 7, 2016

gregoryolsen commented Jun 7, 2016 •

edited

Loading

DHCP not working - Transmit (Tx) checksum offload problem #1

DHCP not working - Transmit (Tx) checksum offload problem #1

Comments

gregoryolsen commented Jun 7, 2016

Synopsis: Container can't get DHCP assigned IP address

Solution

Root Cause

Related links

gregoryolsen commented Jun 7, 2016 • edited Loading

gregoryolsen commented Jun 7, 2016 •

edited

Loading