-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle restartability for Node Port Local #1758
Conversation
@hemantavi, you must sign our contributor license agreement before your changes are merged. Click here to sign the agreement. If you are a VMware employee, read this for further instruction. |
This pull request introduces 1 alert when merging 8b43640 into 741ad6d - view on LGTM.com new alerts:
|
Codecov Report
@@ Coverage Diff @@
## main #1758 +/- ##
=======================================
Coverage ? 33.68%
=======================================
Files ? 182
Lines ? 15612
Branches ? 0
=======================================
Hits ? 5259
Misses ? 9690
Partials ? 663
Flags with carried forward coverage won't be shown. Click here to find out more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't review this in details yet, but I have a question about the approach: isn't it possible to use iptables-restore
to sync all the rules in the ANTREA-NODE-PORT-LOCAL
chain at once? I know we use this approach in https://github.com/vmware-tanzu/antrea/blob/dc0b4d5e9dcc7bee0a13263e82f8ee01d9f102be/pkg/agent/route/route_linux.go#L186. Adding @tnqn to see if he has an opinion.
Hi @antoninbas, went through the approach in https://github.com/vmware-tanzu/antrea/blob/dc0b4d5e9dcc7bee0a13263e82f8ee01d9f102be/pkg/agent/route/route_linux.go#L186 . Yes, it makes sense to update the iptables in one shot. However, we still need to fetch the older rules and remove those rules for which the pods don't exist. This is what the updated approach would look like then:
|
@@ -29,6 +30,13 @@ import ( | |||
// NodePortLocalChain is the name of the chain in IPTABLES for Node Port Local | |||
const NodePortLocalChain = "ANTREA-NODE-PORT-LOCAL" | |||
|
|||
// DestinationPodIPPort stores the PodIP and Pod port parsed from the iptable rules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits:
PodIP -> Pod IP
iptable -> iptables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
@@ -29,6 +30,13 @@ import ( | |||
// NodePortLocalChain is the name of the chain in IPTABLES for Node Port Local | |||
const NodePortLocalChain = "ANTREA-NODE-PORT-LOCAL" | |||
|
|||
// DestinationPodIPPort stores the PodIP and Pod port parsed from the iptable rules | |||
// for NPL | |||
type DestinationPodIPPort struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be public? If not - destinationPodIPPort, or just podIPPort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are referring the members of this struct outside of this package. Do we still need to make the struct private?
continue | ||
} | ||
} | ||
podList, err := kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid a List call for every port.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thought about this. We can create a map of these pods and then use it subsequently. Nice catch!
// Pod exists, add it to podPortTable | ||
// error is already checked | ||
klog.V(2).Infof("adding an entry for Node port %d, Pod %s and Pod port: %d", k, v.PodIP, v.PodPort) | ||
portTable.AddUpdateEntry(k, v.PodPort, v.PodIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check if the Pod has NPL enabled, or you will handle this by the controller later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pending on a NPL annotation check PR here. Once that's in, we can then add a call to make sure NPL is needed for this Pod before allocating a port for this.
func rulesToPodPortMap(portTable *portcache.PortTable, kubeClient clientset.Interface) (map[string][]k8s.PodNodePort, error) { | ||
podPortRules, err := portTable.PodPortRules.GetAllRules() | ||
if err != nil { | ||
klog.Errorf("error in fetching the Pod port rules: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We capitalize the first letter for log messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change it here and at other places. Thanks.
Why we need to fetch old rules from iptables? Is it for restoring the port allocation of existing Pods? Could we restore the information from Pod annotation instead, assuming K8s API is our datastore? |
@jianjuns good point. However, since the pod annotations are editable by the users, do you think it is fine to be dependent on the annotations for the source of truth about the port allocations? |
continue | ||
} | ||
} | ||
podList, err := kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't scale well and may take some time to finish the initialization because:
- kube-apiserver doesn't have index for podIP, it will linear search to get the target Pod.
- each pod will do the same on apiserver.
I think you can list local Pods once and build a podIP to pod map then use it in this loop.
Or you can leverage the PodLister added in #1743.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. Yeah, will change it.
// for each node. If yes, it assigns a port for this Pod and adds that as a rule. | ||
func podsToPodPortMap(podCache map[string][]k8s.PodNodePort, kubeClient clientset.Interface, portTable *portcache.PortTable, | ||
nodeName string) error { | ||
podList, err := kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, we should avoid querying same data multiple times.
@@ -29,6 +30,13 @@ import ( | |||
// NodePortLocalChain is the name of the chain in IPTABLES for Node Port Local | |||
const NodePortLocalChain = "ANTREA-NODE-PORT-LOCAL" | |||
|
|||
// DestinationPodIPPort stores the PodIP and Pod port parsed from the iptable rules | |||
// for NPL | |||
type DestinationPodIPPort struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some structs having similar structs but a little difference, for instance, DestinationPodIPPort
has no NodePort
, PodNodePort
has no PodIP
. It seems just because they are using different fields as keys in different places. Maybe we could just have one struct and let the key redundant in value to avoid declaring multiple structs to keep code cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will change these to have only struct.
@hemantavi : I believe we should use Pod annotation as source of truth, instead of iptables state. But for your point that user might modify Pod annotation wrongly, maybe we can do some validation and remove invalid annotation? |
+1 to that. Plus in case of Node restart, you would only be able to rely on Pod annotations anyway. |
Ok. Sure, we can remove the annotations that are not required or are invalid.
I see. Makes sense. Unless there's any other way to persist the node port data reliably, we can go with the pod annotations. Will make the changes. |
This pull request introduces 1 alert when merging 8b43640 into e61cfbc - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple more comments
if addr := net.ParseIP(podIPPort[0]); addr == nil { | ||
klog.Warningf("failed to parse IP address from %s", podIPPort[1]) | ||
continue | ||
} | ||
p, err := strconv.Atoi(podIPPort[1]) | ||
if err != nil || p < 0 { | ||
klog.Warningf("failed to parse port from %s: %s", podIPPort[2], err.Error()) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an alert from our static analysis tool because of incorrect indexes when accessing podIPPort
in your log messages - it should be podIPPort[0]
for the address and podIPPort[1]
for the port. Please address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the reason it was left out was because of a format change. Anyway, its not required anymore. Have removed it. Thanks.
klog.Errorf("error in fetching the Pods for Node %s: %s", nodeName, err.Error()) | ||
return errors.New("error in fetching Pods for the Node" + nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, logging the error message and returning the same error is bad as it means the same error will get logged twice.
Why not just fmt.Errorf(error when fetching the Pods for Node %s: %v", nodeName, err)
? I expect that returning an error in this function will eventually lead to an error log message somewhere up in the call stack.
You should also use the %v
as the verb for errors, instead of calling the Error()
method on them. This method will be called automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Fixed it.
klog.Warningf("unable to update annotation for Pod %s/%s, error: %s", newPod.Namespace, newPod.Name, err.Error()) | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by the fact that you are logging a warning and returning an error. If I understand the code correctly, this error will eventually cause InitializeNPLAgent
to return an error, which will be logged, and the agent to exit with an error code. So is the warning useful here? See above comment on logging + returning the same error in a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fixed this too.
} | ||
|
||
annotations = append(annotations, NPLAnnotation{ | ||
PodPort: containerPort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is not added by this PR, but why we do not name the field in the annotation as "PodPort", but not "ContainerPort" to be consistent with the K8s convention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean, change the annotation field to have "ContainerPort" instead of "PodPort"? If yes, can we take care of that once the other PR (which handles the service annotations gets merged), so that we avoid any conflicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is what I mean. Another PR works.
err := json.Unmarshal([]byte(nplAnnotation), &nplData) | ||
if err != nil { | ||
// if there's an error in this NPL Annotation, clean it up | ||
err := cleanupNPLAnnotationForPod(kubeClient, podCopy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do patch here, instead of copying the Pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About this, do you have any pointers regarding the type of patching we can use here? Since we are removing an annotation here, StrategicMergePatchType
didn't work and ApplyPatchType
also failed citing that it was an unsupported type. JsonPatchType
was also not helpful as it didn't replace the annotations list.
From k8s docs: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/, it says if the fields don't have a patch strategy, the default is to replace, however I don't see that with Pod Annotations. @jianjuns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find way to delete a single annotation by patching. @tnqn : any idea?
But at least we should copy and patch only annotations? Also if we go this way, you need to handle conflicts. See here: https://github.com/vmware-tanzu/antrea/blob/e61cfbc6f42bd159faa5dd6a0ae04980382caa87/pkg/agent/controller/traceflow/packetin.go#L47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But at least we should copy and patch only annotations?
The problems that I mentioned before regarding specifying the patch strategy were for annotations (and for the whole pod as well). I tried patching only the annotations with StrategicMergePatchType
and ApplyPatchType
. They didn't seem to work.
Regarding https://github.com/vmware-tanzu/antrea/blob/e61cfbc6f42bd159faa5dd6a0ae04980382caa87/pkg/agent/controller/traceflow/packetin.go#L47 , it looks like conflicts can happen if multiple agents are updating TraceFlow resource. However, do you think we would hit this scenario if there's only one antrea agent updating the pod, since, each agent works on the pods that are on that corresponding node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I should revert what I said. I did not try it yet, but we used to use json merge path to delete a single annotation. Have you tried it does not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I tried the annotation-only patching this time with JSONPatchType
and it worked. Have updated the PR. Thanks!
err := json.Unmarshal([]byte(nplAnnotation), &nplData) | ||
if err != nil { | ||
// if there's an error in this NPL Annotation, clean it up | ||
err := cleanupNPLAnnotationForPod(kubeClient, podCopy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not find way to delete a single annotation by patching. @tnqn : any idea?
But at least we should copy and patch only annotations? Also if we go this way, you need to handle conflicts. See here: https://github.com/vmware-tanzu/antrea/blob/e61cfbc6f42bd159faa5dd6a0ae04980382caa87/pkg/agent/controller/traceflow/packetin.go#L47
// check if a valid NPL Annotation exists for this Pod: | ||
// if yes, verifiy validity of the Node port, update the port table and add a rule to the | ||
// rules buffer. | ||
podCopy := pod.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move it to line 95? And if we update only annotations, probably do not copy the whole Pod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the copying line. Thanks!
Regarding the annotations, I have commented below.
76698ca
to
c05052b
Compare
Rebased to the new main on #1743. |
} | ||
annotations := pod.Annotations | ||
delete(annotations, nplk8s.NPLAnnotationKey) | ||
payload := []patchStringValue{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can delete a single annotation by setting the annotation to nil.
Could you change the annotation update/setting code with patching too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From: https://tools.ietf.org/html/rfc7386
Changing the value of "a" and removing "f" can be achieved by
sending:
PATCH /target HTTP/1.1
Host: example.org
Content-Type: application/merge-patch+json
{
"a":"z",
"c": {
"f": null
}
}
When applied to the target resource, the value of the "a" member is
replaced with "z" and "f" is removed, leaving the remaining content
untouched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k8sClient.CoreV1().Pods("default").Patch(context.TODO(), "pod1", types.MergePatchType, []byte(`{"metadata":{"annotations":{"abc":null}}}`), metav1.PatchOptions{})
This can remove annotation "abc".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can remove annotation "abc".
Yes, this works. Thanks! Have updated the PR.
Could you change the annotation update/setting code with patching too?
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Let us see if @antoninbas or @tnqn has more comments.
klog.Warningf("Failed to get Node's name, NodePortLocal annotation cannot be removed for Pods scheduled to this Node") | ||
return | ||
func patchPod(value []NPLAnnotation, pod corev1.Pod, kubeClient clientset.Interface) error { | ||
type Metadata struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metaData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this, using a map instead.
if err != nil { | ||
klog.Warningf("Unable to list Pods, err: %v", err) | ||
return | ||
type PatchData struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patchData
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this, using a map instead.
} else { | ||
payloadValue[NPLAnnotationKey] = nil | ||
} | ||
payload := PatchData{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use anonymous structs here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are not using struct members anywhere, just changed the structs to maps instead.
We should merge #1826 first and unit tests in this PR should be updated accordingly |
payloadBytes, _ := json.Marshal(newPayload) | ||
if _, err := kubeClient.CoreV1().Pods(pod.Namespace).Patch(context.TODO(), pod.Name, types.MergePatchType, | ||
payloadBytes, metav1.PatchOptions{}); err != nil { | ||
return fmt.Errorf("unable to update Annotation for Pod %s/%s, %s", pod.Namespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("unable to update Annotation for Pod %s/%s, %s", pod.Namespace, | |
return fmt.Errorf("unable to update Annotation for Pod %s/%s: %s", pod.Namespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
klog.Warningf("Unable to update annotation for Pod %s/%s, error: %v", pod.Namespace, pod.Name, err) | ||
return err | ||
// CleanupNPLAnnotationForPod removes the NPL Annotation from the Pod's Annotations map entirely. | ||
func CleanupNPLAnnotationForPod(kubeClient clientset.Interface, pod corev1.Pod) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func CleanupNPLAnnotationForPod(kubeClient clientset.Interface, pod corev1.Pod) error { | |
func CleanupNPLAnnotationForPod(kubeClient clientset.Interface, pod *corev1.Pod) error { |
to avoid a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the idea was to not use a pointer because we weren't changing anything for that pod. But yeah, we should avoid a copy too. Changed it.
newPod := pod.DeepCopy() | ||
removePodAnnotation(newPod) | ||
return c.updatePodAnnotation(newPod) | ||
return CleanupNPLAnnotationForPod(c.kubeClient, *pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CleanupNPLAnnotationForPod has the same check as L348
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, removed it from here, not required.
} | ||
|
||
if err := portTable.PodPortRules.AddAllRules(allNPLPorts); err != nil { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err != nil, it ignores the error? why addRulesForNPLPorts
returns an error then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed to return an error here. Fixed it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
err := json.Unmarshal([]byte(nplAnnotation), &nplData) | ||
if err != nil { | ||
// if there's an error in this NPL Annotation, clean it up | ||
err := nplk8s.CleanupNPLAnnotationForPod(kubeClient, &pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may need to use &podList.Items[i] to avoid the gosec failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you rebase main and update the unit tests to use the same "framework"?
klog.Info("Will fetch Pods and generate NPL rules for these Pods") | ||
if err := getPodsAndGenRules(kubeClient, portTable, nodeName); err != nil { | ||
return nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this should be done in the Run
method of the controller, after the caches are synced and before starting the workers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. Moved it. Thanks.
klog.Warningf("Unable to patch NPL annotations for Pod %s/%s: %s", pod.Namespace, pod.Name, err.Error()) | ||
} | ||
klog.V(2).Infof("Successfully updated annotation for Pod %s/%s", pod.Namespace, pod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logs & error messages should be consistent:
- "NPL annotations" vs "annotation"
- sometimes annotation is capitalized, sometimes it's not
- sometimes you spell out NodePortLocal, sometimes you use NPL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it and made it consistent for the respective files. Thanks.
Currently we delete all the iptable rules when the antrea-agent is restarted. This commit does the following: - Evaluates the iptable rules in the Node Port Local chain and check if a rule is required. If a Pod got deleted while the antrea-agent was restarting, we don't need that rule and has to be deleted. - The above evaluation logic also creates a cache for all the pods retrived from the iptable rules. - Evaluates the kubernetes Pods, and for each Pod, it checks if its part of the above cache. If yes, only the Pod's annotations needs to be verified. If not, a new Node port has to be allocated and a rule has to be added. One of the TODOs as mentioned [here](antrea-io#1459).
* Simplified the flow to: - Get a list of all the Pods on this Node - Construct a list of NPL chain rules from the Pods' NPL Annotations - Restore the rules on the NPL chain * Took care of nits, capitalization and removed unnecessary blocks
* Fix a few nits. * Make IPTablesRules private.
* Moved the copying of the pod.
This is to update the NPL Annotations.
* Update the Pod NPL Annotations at all places via Patch. * Remove cleanup of annotations from InitController(). * Break if even one of the annotations is invalid for a Pod.
Also fixes the logs and comments.
01ed7b3
to
4a32225
Compare
err := json.Unmarshal([]byte(nplAnnotation), &nplData) | ||
if err != nil { | ||
// if there's an error in this NodePortLocal annotation, clean it up | ||
err := CleanupNPLAnnotationForPod(c.kubeClient, &podList.Items[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the code re-org, CleanupNPLAnnotationForPod
can become a method of NPLController
: there is no longer a need to pass the K8s client as a parameter directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, done.
klog.Warningf("Unable to unmarshal NodePortLocal annotation for Pod %s/%s", | ||
pod.Namespace, pod.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency, you can use key
directly in log messages to print <podNamespace>/<podName>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Thanks.
// is invalid or the NodePortLocal annotation is invalid, the NodePortLocal annotation is removed. The Pod | ||
// event handlers take care of allocating a new Node port if required. | ||
func (c *NPLController) GetPodsAndGenRules() error { | ||
podList, err := c.kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as pointed out earlier, you should use the podLister
member of NPLController
to list the Pods as it will used information which has already been fetched (and cached) from the K8s apiserver. There should be no need to do a new API request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, wasn't possible earlier because this func was getting called from InitializeNPLAgent()
. Since we have now moved it to the controller's Run()
function, I have changed it to use the podLister
. Thanks.
} | ||
var ok bool | ||
portTable, ok := portcache.NewPortTable(start, end) | ||
if !ok { | ||
return nil, errors.New("NPL port table could not be initialized") | ||
return nil, errors.New("error in initializing NodePortLocal port table") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/error in/error when/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
err = portTable.PodPortRules.Init() | ||
if err != nil { | ||
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we no longer wrapping the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. Fixed it. Thanks.
for k := range pt.Table { | ||
delete(pt.Table, k) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while this is valid Go, why not just build a new map: ptable.Table = make(map[int]NodePortData)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// TestInitInvalidPod simulates an agent reboot case. A Pod with an invalid NPL annotation is | ||
// added, this invalid annotation should get cleaned up. And a proper NPL annotation should get | ||
// added. | ||
func TestInitInvalidPod(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can the test pass without an EXPECT
statement for AddAllRules
on the mock object?
This seems to be testing an edge case, but I imagine that the actual functionality will be tested by e2e test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can the test pass without an EXPECT statement for AddAllRules on the mock object?
Yeah, weird, I think this test was passing because AddAllRules
function isn't getting called for the invalid annotation. Have added this now.
This seems to be testing an edge case, but I imagine that the actual functionality will be tested by e2e test cases.
Yes, that's right.
- Using Pod lister to get all the Pods for the current Node. - Add an EXPECT statement for AddAllRules(). - Fixed comments and other re-arrangements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please address CI failures
only a couple nits remaining on my side, otherwise I think all my comments have been addressed. @jianjuns do you want to take another look?
klog.Info("Will fetch Pods and generate NodePortLocal rules for these Pods") | ||
|
||
if err := c.GetPodsAndGenRules(); err != nil { | ||
klog.Errorf("Error in getting Pods and generating rules: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klog.Errorf("Error in getting Pods and generating rules: %s", err.Error()) | |
klog.Errorf("Error in getting Pods and generating rules: %v", err) |
would be good to fix any other occurrence of this in the rest of the NPL code as well if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, replaced at other places too.
err = portTable.PodPortRules.Init() | ||
if err != nil { | ||
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %v", err) | ||
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %s", err.Error()) | |
return nil, fmt.Errorf("NodePortLocal rules for Pod ports could not be initialized: %v", err) |
@@ -151,3 +148,16 @@ func (ipt *IPTableRules) DeleteAllRules() error { | |||
} | |||
return nil | |||
} | |||
|
|||
// Join all words with spaces, terminate with newline and write to buf. | |||
func writeLine(buf *bytes.Buffer, words ...string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this function is the same as the one in pkg/agent/route/route_linux.go, it should got into a common utility package, but that can be done in a future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, noted. Will take care of it in a subsequent PR.
Also, changed the printing of err.Error() to err.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
/test-conformance |
Currently we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
check if a rule is required. If a Pod got deleted while the
antrea-agent was restarting, we don't need that rule and has
to be deleted.
pods retrived from the iptable rules.
its part of the above cache. If yes, only the Pod's annotations
needs to be verified. If not, a new Node port has to be allocated
and a rule has to be added.
One of the TODOs as mentioned
here.