Skip to content

Commit

Permalink
test: Run tests with UDP proxy redirections last
Browse files Browse the repository at this point in the history
For the past three months [1], we've had a flake in the connectivity
checks where DNS resolutions sometimes fail. Two weeks ago, Louis noticed
that the last trace we saw before the DNS packet was lost was a redirect
to the proxy, even though no L7/FQDN rules were defined.

In [2], we confirmed that the incorrect proxy redirect was caused by a
stale conntrack entry. A previous test runs with a DNS redirect in
policies. During that test, a conntrack entry is created for a DNS
connection ipA:portA -> ipDNS:53, with a bit set to require reply traffic
to be redirected to the DNS proxy.

Then, during a subsequent test without DNS redirectis in policies, a new
DNS connection is made with the same portA as previously. It therefore
matches the conntrack entry from the previous test and traffic is
redirected to the proxy. At this point, the DNS proxy seems to drop the
packets likely because it's not aware of any DNS redirection policy.

For this to occur, the two connections must happen in less than 6m from
each other: the conntrack garbage collector for UDP runs every 5m and UDP
connections have a timeout of 1m. In practice, this happens very often.
As an example, while running connectivity tests in a loop a hundred times,
the flake was measured to happen 49 times.

This commit implements a workaround for the flake, by reordering tests
in the CLI, such that tests with DNS redirects to the proxy are executed
last.

This is of course not a long term fix since the issue can still happen
in production when users often change their DNS policies. To fix it
properly, we would probably need to implement a grace period in the DNS
proxy during which incorrectly redirected packets are simply forwarded.

1 - #367
2 - #367 (comment)
Signed-off-by: Paul Chaignon <[email protected]>
  • Loading branch information
pchaigno authored and tklauser committed Sep 23, 2021
1 parent 669214a commit 7f84818
Showing 1 changed file with 48 additions and 47 deletions.
95 changes: 48 additions & 47 deletions connectivity/suite.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,17 +70,6 @@ func Run(ctx context.Context, ct *check.ConnectivityTest) error {
tests.PodToExternalWorkload(""),
)

// Only allow UDP:53 to kube-dns, no DNS proxy enabled.
ct.NewTest("dns-only").WithPolicy(clientEgressOnlyDNSPolicyYAML).
WithScenarios(
tests.PodToPod(""), // connects to other Pods directly, no DNS
tests.PodToWorld(""), // resolves one.one.one.one
).
WithExpectations(
func(a *check.Action) (egress check.Result, ingress check.Result) {
return check.ResultDropCurlTimeout, check.ResultNone
})

// This policy only allows ingress into client from client2.
ct.NewTest("client-ingress").WithPolicy(clientIngressFromClient2PolicyYAML).
WithScenarios(
Expand Down Expand Up @@ -112,28 +101,6 @@ func Run(ctx context.Context, ct *check.ConnectivityTest) error {
tests.PodToPod(""),
)

// This policy only allows port 80 to "one.one.one.one". DNS proxy enabled.
ct.NewTest("to-fqdns").WithPolicy(clientEgressToFQDNsCiliumIOPolicyYAML).
WithScenarios(
tests.PodToWorld(""),
).
WithExpectations(func(a *check.Action) (egress, ingress check.Result) {
if a.Destination().Port() == 80 && a.Destination().Address() == "one.one.one.one" {
if a.Destination().Path() == "/" || a.Destination().Path() == "" {
egress = check.ResultDNSOK
egress.HTTP = check.HTTP{
Method: "GET",
URL: "http://one.one.one.one/",
}
return egress, check.ResultNone
}
// Else expect HTTP drop by proxy
return check.ResultDNSOKDropCurlHTTPError, check.ResultNone
}
// No HTTP proxy on other ports
return check.ResultDNSOKDropCurlTimeout, check.ResultNone
})

// This policy allows UDP to kube-dns and port 80 TCP to all 'world' endpoints.
ct.NewTest("to-entities-world").
WithPolicy(clientEgressToEntitiesWorldPolicyYAML).
Expand Down Expand Up @@ -163,6 +130,26 @@ func Run(ctx context.Context, ct *check.ConnectivityTest) error {
return check.ResultOK, check.ResultNone
})

// Test L7 HTTP introspection using an ingress policy on echo pods.
ct.NewTest("echo-ingress-l7").
WithPolicy(echoIngressL7HTTPPolicyYAML). // L7 allow policy with HTTP introspection
WithScenarios(
tests.PodToPod(""),
).
WithExpectations(func(a *check.Action) (egress, ingress check.Result) {
if a.Source().HasLabel("other", "client") { // Only client2 is allowed to make HTTP calls.
egress = check.ResultOK
// Expect all curls from client2 to be proxied and to be GET calls.
egress.HTTP = check.HTTP{
Method: "GET",
}
return egress, check.ResultNone
}
return check.ResultDrop, check.ResultNone
})

// The following tests have DNS redirect policies. They should be executed last.

// Test L7 HTTP introspection using an egress policy on the clients.
ct.NewTest("client-egress-l7").
WithPolicy(clientEgressOnlyDNSPolicyYAML). // DNS resolution only
Expand Down Expand Up @@ -190,27 +177,41 @@ func Run(ctx context.Context, ct *check.ConnectivityTest) error {
return check.ResultDrop, check.ResultNone
})

// Test L7 HTTP introspection using an ingress policy on echo pods.
ct.NewTest("echo-ingress-l7").
WithPolicy(echoIngressL7HTTPPolicyYAML). // L7 allow policy with HTTP introspection
// Only allow UDP:53 to kube-dns, no DNS proxy enabled.
ct.NewTest("dns-only").WithPolicy(clientEgressOnlyDNSPolicyYAML).
WithScenarios(
tests.PodToPod(""),
tests.PodToPod(""), // connects to other Pods directly, no DNS
tests.PodToWorld(""), // resolves one.one.one.one
).
WithExpectations(
func(a *check.Action) (egress check.Result, ingress check.Result) {
return check.ResultDropCurlTimeout, check.ResultNone
})

// This policy only allows port 80 to "one.one.one.one". DNS proxy enabled.
ct.NewTest("to-fqdns").WithPolicy(clientEgressToFQDNsCiliumIOPolicyYAML).
WithScenarios(
tests.PodToWorld(""),
).
WithExpectations(func(a *check.Action) (egress, ingress check.Result) {
if a.Source().HasLabel("other", "client") { // Only client2 is allowed to make HTTP calls.
egress = check.ResultOK
// Expect all curls from client2 to be proxied and to be GET calls.
egress.HTTP = check.HTTP{
Method: "GET",
if a.Destination().Port() == 80 && a.Destination().Address() == "one.one.one.one" {
if a.Destination().Path() == "/" || a.Destination().Path() == "" {
egress = check.ResultDNSOK
egress.HTTP = check.HTTP{
Method: "GET",
URL: "http://one.one.one.one/",
}
return egress, check.ResultNone
}
return egress, check.ResultNone
// Else expect HTTP drop by proxy
return check.ResultDNSOKDropCurlHTTPError, check.ResultNone
}
return check.ResultDrop, check.ResultNone
// No HTTP proxy on other ports
return check.ResultDNSOKDropCurlTimeout, check.ResultNone
})

// Dummy tests for debugging the testing harness.
// ct.NewTest("dummy-1").WithScenarios(tests.Dummy("dummy-scenario-1"))
// ct.NewTest("dummy-2").WithScenarios(tests.Dummy("dummy-scenario-2"))
// Tests with DNS redirects to the proxy (e.g., client-egress-l7, dns-only,
// and to-fqdns) should always be executed last. See #367 for details.

return ct.Run(ctx)
}

0 comments on commit 7f84818

Please sign in to comment.