-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support DNS names for egress network policy #13002
Support DNS names for egress network policy #13002
Conversation
@openshift/networking PTAL |
pkg/sdn/api/v1/types.go
Outdated
@@ -96,8 +96,12 @@ const ( | |||
|
|||
// EgressNetworkPolicyPeer specifies a target to apply egress network policy to | |||
type EgressNetworkPolicyPeer struct { | |||
// Either cidrSelector or dnsName should be specified but not both | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't get picked up in the docs. You should probably update the docs on each field instead. (eg, "cidrSelector is the CIDR range to allow/deny traffic to. If this is set, dnsName must be unset.") Or else move this comment to be part of the EgressNetworkPolicyPeer doc comment.
(Also, this commit has "Author: Ravi Sankar Penta <[email protected]>
")
pkg/sdn/api/validation/validation.go
Outdated
|
||
if len(rule.To.CIDRSelector) > 0 { | ||
if _, _, err := net.ParseCIDR(rule.To.CIDRSelector); err != nil { | ||
allErrs = append(allErrs, field.Invalid(field.NewPath("spec").Child("egress").Index(i).Child("to"), rule.To.CIDRSelector, err.Error())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pre-existing bug in the code, but that field descriptor probably should have .Child("cidrSelector")
(and likewise below for the dnsName error)
pkg/sdn/plugin/egress_dns.go
Outdated
} | ||
|
||
for name, ips := range dnsInfo.Get() { | ||
if name == dnsName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dnsInfo.Get() should take a name argument and only return that one record. (Or else call it Lookup() if you still need the get-everything version somewhere else.)
pkg/sdn/plugin/egress_dns.go
Outdated
|
||
for { | ||
<-t.C | ||
glog.V(8).Infof("Periodic egress dns sync: update policy rules") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally expect that telling people to use --loglevel=5 will get everything interesting... And this is way less verbose than most of what's at V(5) anyway, so I don't think there's any reason to make it V(8).
pkg/sdn/plugin/egress_dns.go
Outdated
} | ||
|
||
func (proxy *OsdnProxy) SyncEgressDNSProxyFirewall() { | ||
t := time.NewTicker(60 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry that the two loops (OVS and proxy) are going to end up getting out of sync...
If this was C#, I'd say that EgressDNS should have a single periodic update loop, and emit an event when DNS results changed, and then OsdnNode and OsdnProxy should each have handlers for that event and then update their own state. I'm not sure what the correspondingly idiomatic way of doing that in golang would be... something with a channel I suppose, but it's tricky because we need a single writer and multiple readers.
(Oh, and the plugin and the proxy each have their own EgressDNS anyway so that still wouldn't help.)
pkg/sdn/plugin/egress_dns.go
Outdated
glog.Error(err) | ||
continue | ||
} | ||
if equal(olddnsMap, newdnsMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equal() seems like it should be some operation on dnsInfo. Actually, since you don't even care about the contents of the maps here, you could just make dnsInfo.Update() return a bool (indicating whether anything changed) rather than a map (and not even call dnsInfo.Get())
pkg/sdn/plugin/egress_dns.go
Outdated
glog.Warningf("Could not find netid for namespace %q: %v", policy.Namespace, err) | ||
break | ||
} | ||
plugin.updateEgressNetworkPolicyRules(vnid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... plugin.updateEgressNetworkPolicyRules() is broken; it needs locking (eg, to protect against concurrent reads from/writes to plugin.egressPolicies[]). And then this will need to use the right locking as well.
pkg/sdn/plugin/egress_dns.go
Outdated
return cidrs | ||
} | ||
|
||
func (plugin *OsdnNode) SyncEgressDNSPolicyRules() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we're already terrible about this, but I think it's confusing to keep spreading OsdnNode methods through more and more files. In this case, I think this method should go into egress_network_policy.go.
pkg/sdn/plugin/egress_dns.go
Outdated
} | ||
} | ||
|
||
func (proxy *OsdnProxy) SyncEgressDNSProxyFirewall() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this should move to proxy.go...
pkg/sdn/plugin/egress_dns.go
Outdated
// found dns updates | ||
for _, policy := range policies.Items { | ||
if policy.UID == policyUID { | ||
proxy.updateNetworkPolicy(policy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with the OVS case, this needs locking, though fortunately the remaining proxy code is already correct. So, you need to grab proxy.lock here.
And you have to call proxy.updateEndpoints() afterward too at the end of the loop if you changed anything.
76e9235
to
999de59
Compare
@danwinship updated, ptal |
0db83df
to
c9de330
Compare
825a9cb
to
841733f
Compare
841733f
to
9bbfe1e
Compare
Reiterating the concern that I raised during the standup: |
I was going to say "yeah, I think most sites use a TTL of at least an hour normally anyway", but poking around, it seems that that's not true, especially for CDN-ed sites. I don't think syncing more aggressively is a good solution though (either by default or as a config option). But also there's no easy way to get TTL... So I'm not sure what to do. |
I think we need to get the TTL... why is that hard? Is it just that the library we are using can't provide it? |
pkg/sdn/plugin/egress_dns.go
Outdated
cidrs = append(cidrs, net.IPNet{IP: ip, Mask: net.CIDRMask(32, 32)}) | ||
} else if ip.To16() != nil { // IPv6 addr | ||
// Currently we only handle IPv4 addrs | ||
glog.Warningf("ignoring IPv6 addr: %s (domain: %s) for policy: %s", ip.String(), dnsName, policy.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This (and the "else" below it) is dead code now; dnsInfo.Get() can only return valid IPv4 addresses
@@ -25,12 +26,15 @@ func (plugin *OsdnNode) SetupEgressNetworkPolicy() error { | |||
continue | |||
} | |||
plugin.egressPolicies[vnid] = append(plugin.egressPolicies[vnid], policy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to grab the lock here (SetupEgressNetworkPolicy()) too. multiTenantPlugin.Start() calls vnids.Start() before node.SetupEgressNetworkPolicy(), and vnids.Start() spawns a goroutine which could theoretically end up calling UpdateEgressNetworkPolicyVNID().
} | ||
|
||
for _, policy := range policies.Items { | ||
dnsInfo, ok := plugin.egressDNS.pdMap[policy.UID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't thread-safe is it?
@@ -60,7 +61,11 @@ type OsdnNode struct { | |||
kubeletInitReady chan struct{} | |||
iptablesSyncPeriod time.Duration | |||
mtu uint32 | |||
|
|||
// Synchronizes operations on egressPolicies | |||
egressPoliciesLock sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it would be nice to see this as a separate commit since it's fixing a separate pre-existing bug)
pkg/sdn/plugin/proxy.go
Outdated
|
||
proxy.updateNetworkPolicy(policy) | ||
if proxy.allEndpoints != nil { | ||
proxy.updateEndpoints() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you want to do the updateEndpoints outside the range policies.Items
loop
libc doesn't provide it, so in order to get it you need to write your own resolver basically |
I know calling the normal library functions doesn't return it... but is there no dns resolver library for golang already? |
There isn't even a good way to do it in C. Among other things, if you want to do it right, you need to parse /etc/nsswitch.conf and make sure you let /etc/hosts override DNS where it's supposed to, etc. |
c734029
to
c291dc7
Compare
DNS updates are now based on domain TTL |
c291dc7
to
a1e9bd0
Compare
Just so that all the discussion is here in the PR: we decided that this doesn't really matter for our use case; this is primarily designed for dealing with site-external hostnames, so stuff like /etc/hosts and zeroconf names don't matter anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work Ravi.
|
||
func CheckDNSResolver() error { | ||
if _, err := exec.LookPath(DIG); err != nil { | ||
return fmt.Errorf("%s is not installed", DIG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you investigate if there are go resolver libraries we could use instead? Given that dig is the sort of thing that isn't just installed-by-default, I wonder if that would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some go resolver libraries for DNS but I didn't find a go library that actually returns TTL value for a domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we just log an error instead of returning the error here then this feature will not work and I worry that the cluster admin may not notice this issue until someone tries this feature. I thought, throwing user error here will alert the admin to install the missing library.
pkg/sdn/plugin/dns.go
Outdated
// Cached/Non-authoritative DNS servers only holds the remaining TTL for it to refresh but not the actual absolute TTL for the domain. | ||
// Authoritative DNS servers holds actual TTL for the domain. So we need to: | ||
// (a) Get all authorirative name servers for the given domain | ||
// (b) Query TTL for the domain against one of the name servers from (a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a not-really-necessary complication; yes, the first time you query a non-authoritative DNS server it might give you a lower-than-maximum TTL if it already has the name cached, but once that expires, you'll get the max TTL next time and every time after that. So I think you can safely get rid of the get-authoritative-ns step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not know exact TTL from non-auth server and may query few more times than required with this approach but yes, it definitely reduces the complexity.
pkg/sdn/plugin/dns.go
Outdated
} | ||
nameServer := strings.Trim(string(out[:]), "\n") | ||
|
||
out, err = exec.Command("sh", []string{"-c", fmt.Sprintf("echo $(%s | awk '{print $2}')", digShellCmdGetTTL(dns, nameServer))}...).Output() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should get both the TTL and the IPs from dig; right now there's no guarantee that the TTL you get here from the authoritative server actually goes with the IP you get in updateIPs() (which comes from the local server).
Also, the echo $()
there is doing nothing isn't it? You should be able to say just fmt.Sprintf("%s | awk '{print $2}'")
. (Though if we're going to get both IP and TTL it might end up making more sense to do the parsing in go after the fact rather than using awk here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Right now, current approach is fetch TTL once from auth server, Update IPs for every TTL from non-auth server and another routine that updates TTL periodically from auth servers.
We could simply this to fetch and update TTL and IP from non-auth server and perform the above step again based on last obtained TTL. In this case, we don't need to contact auth server and no need of another routine to update TTL.
pkg/sdn/plugin/dns.go
Outdated
if err != nil { | ||
glog.Errorf("Failed to get TTL value from authoritative name server for domain: %q, err: %v, defaulting ttl=%s", dns, err, defaultTTL.String()) | ||
} else { | ||
glog.Warningf("TTL value not set by authoritative name server for domain: %q, defaulting ttl=%s", dns, defaultTTL.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not possible is it? There's always a TTL field in the response...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this shouldn't happen. I will change this to:
if err != nil || len(out) == 0 {
glog.Errorf("Failed to get TTL value from authoritative name server for domain: %q, err: %v, defaulting ttl=%s", dns, err, defaultTTL.String())
}
@@ -13,6 +14,10 @@ import ( | |||
) | |||
|
|||
func (plugin *OsdnNode) SetupEgressNetworkPolicy() error { | |||
if err := CheckDNSResolver(); err != nil { | |||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want openshift to fail entirely if dig isn't present? (Rather than just making EgressNetworkPolicy not work?)
(Likewise in proxy.go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should just remove this from this commit rather than adding it here and then removing it in a later commit. (Likewise in proxy.go)
@@ -79,3 +91,57 @@ func (plugin *OsdnNode) UpdateEgressNetworkPolicyVNID(namespace string, oldVnid, | |||
plugin.updateEgressNetworkPolicyRules(newVnid) | |||
} | |||
} | |||
|
|||
func (plugin *OsdnNode) syncEgressDNSPolicyRules() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really complicated and confusing. You should make the EgressDNS code smarter and this code dumber. egress_network_policy.go shouldn't have to deal with the timers itself... EgressDNS should do that internally, and then only wake up egress_network_policy.go when something has actually changed. So this code would just be something like:
func (plugin *OsdnNode) syncEgressDNSPolicyRules() {
for {
policyNamespace <- plugin.egressDNS.changed
if vnid, err := plugin.policy.GetVNID(policyNamespace); err == nil {
plugin.updateEgressNetworkPolicyRules(vnid)
} else {
glog.Warningf("Could not find netid for namespace %q: %v", policyNamespace, err)
}
}
}
(Likewise in proxy.go)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will make proxy and egress network policy register for dns changes and the go routine in egress dns that updates TTL and IP will inform all the registered callers when there is an IP update.
pkg/sdn/plugin/egress_dns.go
Outdated
) | ||
|
||
const ( | ||
defaultEgressDNSTTLSyncDuration = 8 * time.Hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gets defined here, but used from a different file in a different commit. I think you could just have the "syncEgressDNSTTL" goroutine here rather than in egress_network_policy.go though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be removed with the new approach.
images/node/Dockerfile
Outdated
@@ -22,7 +22,7 @@ RUN curl -L -o /etc/yum.repos.d/origin-next-epel-7.repo https://copr.fedoraproje | |||
INSTALL_PKGS="libmnl libnetfilter_conntrack conntrack-tools openvswitch \ | |||
libnfnetlink iptables iproute bridge-utils procps-ng ethtool socat openssl \ | |||
binutils xz kmod-libs kmod sysvinit-tools device-mapper-libs dbus \ | |||
iscsi-initiator-utils" && \ | |||
iscsi-initiator-utils bind-utils" && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need similar changes anywhere else? dind images? RPM spec files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dind already has bind-utils package and may be I should also add to origin.spec? I will check
a1e9bd0
to
bd3b21e
Compare
@danwinship Updated, PTAL |
tm, ok := dnsInfo.GetMinQueryTime() | ||
if !ok { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long can e.lock be held here? Can this loop indefinite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this loop is not indefinite. e.lock is held till we scan the egress policy dns map once.
LGTM |
flaked again reopened #13318 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
[merge] |
caa8dc9
to
20b119a
Compare
Resolved merge conflicts |
failure not related, looks like a flake |
FYI (as the bot has already informed you), you can't just ask it to retest without a reason. If it actually is an infrastructure problem and there's no flake bug to link to, the easiest way around that is to rebase and repush. But anyway, this wasn't a flake, it's just that the bot doesn't do a good job of highlighting the right things when the tests fail to compile:
|
20b119a
to
1c12bcc
Compare
[test] |
Is this config supported on Atomic Host?
…On Apr 14, 2017 03:45, "OpenShift Bot" ***@***.***> wrote:
continuous-integration/openshift-jenkins/merge FAILURE (
https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/346/)
(Base Commit: 7044e57
<7044e57>
)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#13002 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGpw3CYrjMfMiOnDKJkCVGTWOc2Ql8svks5rvyQSgaJpZM4MEldx>
.
|
dnsName or cidrSelector could be specified to EgressNetworkPolicy.
…tion. - Stores all IPv4 addrs, TTL and next dns update time for the domain - Fetches dns TTL and IP addrs using dig and provides accessor method to periodically check for TTL/IP updates.
- Maintains map of egress policy and its associated dns info. - Provides Add/Delete/Update/Sync operations on egress DNS.
- DNS entries in the policies are verfied for any IP updates based on TTL for the domain and egress policy rules are updated if needed.
- DNS entries in the policies are verfied for any IP updates based on TTL for the domain and proxy firewall rules are updated if needed.
…resolving a domain in egress network policy
We use 'dig' to fetch TTL and IP updates for the egress domain and 'dig' is part of bind-utils package.
1c12bcc
to
fca780d
Compare
Evaluated for origin merge up to 7370c28 |
flake #13068 |
Evaluated for origin test up to 7370c28 |
continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/779/) (Base Commit: 16132e2) |
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/359/) (Base Commit: 16132e2) (Image: devenv-rhel7_6151) |
Trello card: https://trello.com/c/yH9l0ouV/289-8-use-dns-names-with-egress-firewall-demo-human-names