-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix tunnel interface name issue #2486
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2486 +/- ##
===========================================
- Coverage 61.70% 47.04% -14.67%
===========================================
Files 276 278 +2
Lines 21317 21364 +47
===========================================
- Hits 13154 10051 -3103
- Misses 6782 9985 +3203
+ Partials 1381 1328 -53
Flags with carried forward coverage won't be shown. Click here to find out more.
|
3a68e51
to
e35417e
Compare
if ok && (interfaceConfig.InterfaceName != portName || | ||
interfaceConfig.PSK != c.networkConfig.IPSecPSK || | ||
!interfaceConfig.RemoteIP.Equal(nodeIP) || | ||
interfaceConfig.TunnelInterfaceConfig.Type != c.networkConfig.TunnelType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be duplicated from removeStaleTunnelPorts
, can we define a separate function for tunnel config comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, will do.
} else { | ||
stalePortRemoved = true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious as to why we don't just return an error here and let the Node be requeued by the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will leave it to reconcile
17c50ac
to
0e978e4
Compare
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits
if !c.validateInterfaceConfig(interfaceConfig, nodeIP, portName) { | ||
return 0, fmt.Errorf("IPSec tunnel interface config mismatched with cached one,stale IPSec tunnel port %s", interfaceConfig.InterfaceName) | ||
} else { | ||
klog.V(2).Infof("find cached IPSec tunnel interface %s with port %d for Node %s", interfaceConfig.InterfaceName, interfaceConfig.OFPort, nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.V(2).Infof("find cached IPSec tunnel interface %s with port %d for Node %s", interfaceConfig.InterfaceName, interfaceConfig.OFPort, nodeName) | |
klog.V(2).Infof("Found cached IPSec tunnel interface %s with port %d for Node %s", interfaceConfig.InterfaceName, interfaceConfig.OFPort, nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return interfaceConfig.OFPort, nil | ||
if !c.validateInterfaceConfig(interfaceConfig, nodeIP, portName) { | ||
return 0, fmt.Errorf("IPSec tunnel interface config mismatched with cached one,stale IPSec tunnel port %s", interfaceConfig.InterfaceName) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else block is not needed since you return in the if statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
0e978e4
to
328af21
Compare
bb1581d
to
bfe6f25
Compare
@antoninbas @abhiraut Could you help to take a look again? thanks! |
@@ -258,6 +256,14 @@ func (c *Controller) removeStaleTunnelPorts() error { | |||
return nil | |||
} | |||
|
|||
func (c *Controller) validateInterfaceConfig(interfaceConfig *interfacestore.InterfaceConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/validateInterfaceConfig/compareInterfaceConfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
interfaceConfig, ok := c.interfaceStore.GetNodeTunnelInterface(nodeName) | ||
// check if Node IP, PSK, or tunnel type changes. This can | ||
// happen if removeStaleTunnelPorts fails to remove a "stale" | ||
// tunnel port for which the configuration has changed, return error to requeue the node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/node/Node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// happen if removeStaleTunnelPorts fails to remove a "stale" | ||
// tunnel port for which the configuration has changed. | ||
if !c.validateInterfaceConfig(interfaceConfig, nodeIP, portName) { | ||
return 0, fmt.Errorf("IPSec tunnel interface config mismatched with cached one, stale IPSec tunnel port %s", interfaceConfig.InterfaceName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/mismatched with cached one/doesn't match cached one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if !c.validateInterfaceConfig(interfaceConfig, nodeIP, portName) { | ||
return 0, fmt.Errorf("IPSec tunnel interface config mismatched with cached one, stale IPSec tunnel port %s", interfaceConfig.InterfaceName) | ||
} | ||
klog.V(2).Infof("Found cached IPSec tunnel interface %s with port %d for Node %s", interfaceConfig.InterfaceName, interfaceConfig.OFPort, nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use structured logging: klog.V(2).InfoS
@@ -23,6 +23,7 @@ import ( | |||
"github.com/containernetworking/plugins/pkg/ip" | |||
"github.com/golang/mock/gomock" | |||
"github.com/stretchr/testify/assert" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't typically split these into 2 separate import groups IIRC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/agent/util/net.go
Outdated
@@ -38,10 +39,14 @@ func generateInterfaceName(key string, name string, useHead bool) string { | |||
interfaceKey := hex.EncodeToString(hash.Sum(nil)) | |||
prefix := name | |||
if len(name) > interfacePrefixLength { | |||
// We use Node/Pod name to generate the interface name, | |||
// valid chars for Node/Pod name are ASCII letters from a to z, | |||
// the digits from 0 to 9, and the hyphen (-), hyphen (-) is the only char |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be a new sentence:
Hyphen (-) is the only char which will impact command-line interpretation if the interface name starts with one, so we remove it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated it.
c, closeFn := newController(t, &config.NetworkConfig{ | ||
TrafficEncapMode: 0, | ||
TunnelType: ovsconfig.TunnelType("vxlan"), | ||
EnableIPSecTunnel: true, | ||
IPSecPSK: "changeme", | ||
}) | ||
|
||
c.interfaceStore.AddInterface(&interfacestore.InterfaceConfig{ | ||
Type: interfacestore.TunnelInterface, | ||
InterfaceName: util.GenerateNodeTunnelInterfaceName("xyz-k8s-0-1"), | ||
TunnelInterfaceConfig: &interfacestore.TunnelInterfaceConfig{ | ||
NodeName: "xyz-k8s-0-1", | ||
Type: ovsconfig.TunnelType("vxlan"), | ||
PSK: "mismatchpsk", | ||
RemoteIP: nodeIP1, | ||
}, | ||
OVSPortConfig: &interfacestore.OVSPortConfig{ | ||
PortUUID: "123", | ||
}, | ||
}) | ||
|
||
defer closeFn() | ||
defer c.queue.ShutDown() | ||
stopCh := make(chan struct{}) | ||
defer close(stopCh) | ||
c.informerFactory.Start(stopCh) | ||
c.informerFactory.WaitForCacheSync(stopCh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe unify this set up code in a separate function:
func setup(ifaces ...*interfacestore.InterfaceConfig) cleanupFn {
...
}
then in the test:
cleanup := setup(ifaceConfig)
defer cleanup()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests := []struct { | ||
name string | ||
wantErr bool | ||
}{ | ||
{ | ||
name: "good path with IPSec enabled", | ||
wantErr: false, | ||
}, | ||
} | ||
for _, tt := range tests { | ||
t.Run(tt.name, func(t *testing.T) { | ||
if err := c.removeStaleTunnelPorts(); (err != nil) != tt.wantErr { | ||
t.Errorf("Controller.removeStaleTunnelPorts() error = %v, wantErr %v", err, tt.wantErr) | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think there will be more tests added in the future? maybe since you only have one test right now there is no need to use a table pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed table pattern
16a97ce
to
39e711d
Compare
if (err != nil) != tt.wantErr { | ||
t.Errorf("Controller.createIPSecTunnelPort() error = %v, wantErr %v", err, tt.wantErr) | ||
return | ||
} | ||
if got != tt.want { | ||
t.Errorf("Controller.createIPSecTunnelPort() got %v,want = %v", got, tt.want) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use testify assertions for these tests like we do in other Antrea unit tests? assert
assertions will continue the test on error, while require
assertions will abort the test on error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
want int32 | ||
}{ | ||
{ | ||
name: "good path", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"good path" is not a great name, maybe "create new port"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
the purpose of this commit is to fix a tunnel interface issue founded during some IPSec PoC verification: 1. when the node name is like 'lan-k8s-0-0', the IPsec tunnel interface name will be like '-k8s-0-2-e8dbe6', then it will failed to run command like `ipsec up '-k8s-0-2-e8dbe6'` with error `/usr/lib/ipsec/stroke: invalid option -- 'k'` due to the first char is '-', ipsec command interrupted it as an option. so changed the `generateInterfaceName` method to use `strings.TrimLeft()` to remove '-' in left. 2. `createIPSecTunnelPort` method can't handle the case when tunnel interface name changed when there is cache matched for the node. it will reuse the existing tunnel name without creating a new one with new name which means any change in `generateInterfaceName` won't take affect. and also add some unit test cases. Signed-off-by: Lan Luo <[email protected]>
39e711d
to
74e2a6c
Compare
/test-all |
/test-ipv6-only-e2e |
@luolanzone please cherry-pick to release-1.2 |
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late review, I notice this change via another PR and have a question.
interfaceConfig, ok := c.interfaceStore.GetNodeTunnelInterface(nodeName) | ||
// check if Node IP, PSK, or tunnel type changes. This can | ||
// happen if removeStaleTunnelPorts fails to remove a "stale" | ||
// tunnel port for which the configuration has changed, return error to requeue the Node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you removed the TODO, and according to the commit message, I suppose this is fixed. However, how requeuing the Node helps? I think the next attemp will still get here and requeue again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tnqn, I think the TODO is highlighting it doesn't check the configuration change here but return directly. it's an assumption that it's a temporary issue if last remove action failed in removeStaleTunnelPorts
, so re-queue will help because removeStaleTunnelPorts
also checks configuration difference. but it indeed will go back here if failed again, is there any case in your mind port can't be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that we would call c.ovsBridgeClient.DeletePort
and c.interfaceStore.DeleteInterface
here, and in case of error during port removal, return an error and let the Node be requeued. removeStaleTunnelPorts
is only called on controller start up and transient errors are always possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is not really fixed. @luolanzone would you have a follow-up PR to fix it? and maybe have a test covering this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I will fix this.
The purpose of this commit is to fix a tunnel interface issue founded during some IPSec PoC verification: 1. when the node name is like 'lan-k8s-0-0', the IPsec tunnel interface name will be like '-k8s-0-2-e8dbe6', then it will failed to run command like `ipsec up '-k8s-0-2-e8dbe6'` with error `/usr/lib/ipsec/stroke: invalid option -- 'k'` due to the first char is '-', ipsec command interrupted it as an option. so changed the `generateInterfaceName` method to use `strings.TrimLeft()` to remove '-' in left. 2. `createIPSecTunnelPort` method can't handle the case when tunnel interface name changed when there is cache matched for the node. it will reuse the existing tunnel name without creating a new one with new name which means any change in `generateInterfaceName` won't take affect. and also add some unit test cases. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on antrea-io#2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
this PR is based on #2486 and I verified all tunnel modes with IPSec in K8s Cluster, it all works fine now, so I remove the limitation on our docs and the check in the code. Signed-off-by: Lan Luo <[email protected]>
the purpose of this commit is to fix a tunnel interface issue founded during some IPSec PoC verification:
when the node name is like 'lan-k8s-0-0', the IPsec tunnel interface name will be
like '-k8s-0-2-e8dbe6', then it will failed to run command like
ipsec up '-k8s-0-2-e8dbe6'
inside a Antrea agent with error/usr/lib/ipsec/stroke: invalid option -- 'k'
due to the first char is '-', ipsec command interrupted it as an
option. so changed the
generateInterfaceName
method to usestrings.TrimLeft()
to remove '-' in left.createIPSecTunnelPort
method can't handle the case when tunnelinterface name changed when there is cache matched for the node. it
will reuse the existing tunnel name without creating a new one with new
name which means any change in
generateInterfaceName
won't takeaffect. so refine it with more checks.
and also add some unit test cases.
Signed-off-by: Lan Luo [email protected]