Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
test: Run tests with UDP proxy redirections last
For the past three months [1], we've had a flake in the connectivity checks where DNS resolutions sometimes fail. Two weeks ago, Louis noticed that the last trace we saw before the DNS packet was lost was a redirect to the proxy, even though no L7/FQDN rules were defined. In [2], we confirmed that the incorrect proxy redirect was caused by a stale conntrack entry. A previous test runs with a DNS redirect in policies. During that test, a conntrack entry is created for a DNS connection ipA:portA -> ipDNS:53, with a bit set to require reply traffic to be redirected to the DNS proxy. Then, during a subsequent test without DNS redirectis in policies, a new DNS connection is made with the same portA as previously. It therefore matches the conntrack entry from the previous test and traffic is redirected to the proxy. At this point, the DNS proxy seems to drop the packets likely because it's not aware of any DNS redirection policy. For this to occur, the two connections must happen in less than 6m from each other: the conntrack garbage collector for UDP runs every 5m and UDP connections have a timeout of 1m. In practice, this happens very often. As an example, while running connectivity tests in a loop a hundred times, the flake was measured to happen 49 times. This commit implements a workaround for the flake, by reordering tests in the CLI, such that tests with DNS redirects to the proxy are executed last. This is of course not a long term fix since the issue can still happen in production when users often change their DNS policies. To fix it properly, we would probably need to implement a grace period in the DNS proxy during which incorrectly redirected packets are simply forwarded. 1 - #367 2 - #367 (comment) Signed-off-by: Paul Chaignon <[email protected]>
- Loading branch information