Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle restartability for Node Port Local #1758

Merged
merged 13 commits into from
Feb 11, 2021

Conversation

hemantavi
Copy link
Contributor

Currently we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:

  • Evaluates the iptable rules in the Node Port Local chain and
    check if a rule is required. If a Pod got deleted while the
    antrea-agent was restarting, we don't need that rule and has
    to be deleted.
  • The above evaluation logic also creates a cache for all the
    pods retrived from the iptable rules.
  • Evaluates the kubernetes Pods, and for each Pod, it checks if
    its part of the above cache. If yes, only the Pod's annotations
    needs to be verified. If not, a new Node port has to be allocated
    and a rule has to be added.

One of the TODOs as mentioned
here.

@vmwclabot
Copy link

@hemantavi, you must sign our contributor license agreement before your changes are merged. Click here to sign the agreement. If you are a VMware employee, read this for further instruction.

@lgtm-com
Copy link

lgtm-com bot commented Jan 18, 2021

This pull request introduces 1 alert when merging 8b43640 into 741ad6d - view on LGTM.com

new alerts:

  • 1 for Off-by-one comparison against length

@codecov-io
Copy link

codecov-io commented Jan 18, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@470e8af). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1758   +/-   ##
=======================================
  Coverage        ?   33.68%           
=======================================
  Files           ?      182           
  Lines           ?    15612           
  Branches        ?        0           
=======================================
  Hits            ?     5259           
  Misses          ?     9690           
  Partials        ?      663           
Flag Coverage Δ
e2e-tests 33.68% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't review this in details yet, but I have a question about the approach: isn't it possible to use iptables-restore to sync all the rules in the ANTREA-NODE-PORT-LOCAL chain at once? I know we use this approach in https://github.com/vmware-tanzu/antrea/blob/dc0b4d5e9dcc7bee0a13263e82f8ee01d9f102be/pkg/agent/route/route_linux.go#L186. Adding @tnqn to see if he has an opinion.

@hemantavi
Copy link
Contributor Author

Hi @antoninbas, went through the approach in https://github.com/vmware-tanzu/antrea/blob/dc0b4d5e9dcc7bee0a13263e82f8ee01d9f102be/pkg/agent/route/route_linux.go#L186 . Yes, it makes sense to update the iptables in one shot. However, we still need to fetch the older rules and remove those rules for which the pods don't exist. This is what the updated approach would look like then:

  • read rules from iptables
  • remove rules from buffer (for pods that got deleted)
  • add new rules in the buffer (for new pods)
  • update using iptables-restore
  • verify/update annotations of the pods

@@ -29,6 +30,13 @@ import (
// NodePortLocalChain is the name of the chain in IPTABLES for Node Port Local
const NodePortLocalChain = "ANTREA-NODE-PORT-LOCAL"

// DestinationPodIPPort stores the PodIP and Pod port parsed from the iptable rules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits:
PodIP -> Pod IP
iptable -> iptables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

@@ -29,6 +30,13 @@ import (
// NodePortLocalChain is the name of the chain in IPTABLES for Node Port Local
const NodePortLocalChain = "ANTREA-NODE-PORT-LOCAL"

// DestinationPodIPPort stores the PodIP and Pod port parsed from the iptable rules
// for NPL
type DestinationPodIPPort struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be public? If not - destinationPodIPPort, or just podIPPort.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are referring the members of this struct outside of this package. Do we still need to make the struct private?

pkg/agent/nodeportlocal/rules/iptable_rule.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/k8s/annotations.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/k8s/annotations.go Outdated Show resolved Hide resolved
continue
}
}
podList, err := kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid a List call for every port.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thought about this. We can create a map of these pods and then use it subsequently. Nice catch!

pkg/agent/nodeportlocal/npl_agent_init.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/npl_agent_init.go Outdated Show resolved Hide resolved
// Pod exists, add it to podPortTable
// error is already checked
klog.V(2).Infof("adding an entry for Node port %d, Pod %s and Pod port: %d", k, v.PodIP, v.PodPort)
portTable.AddUpdateEntry(k, v.PodPort, v.PodIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check if the Pod has NPL enabled, or you will handle this by the controller later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pending on a NPL annotation check PR here. Once that's in, we can then add a call to make sure NPL is needed for this Pod before allocating a port for this.

func rulesToPodPortMap(portTable *portcache.PortTable, kubeClient clientset.Interface) (map[string][]k8s.PodNodePort, error) {
podPortRules, err := portTable.PodPortRules.GetAllRules()
if err != nil {
klog.Errorf("error in fetching the Pod port rules: %s", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We capitalize the first letter for log messages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change it here and at other places. Thanks.

@jianjuns
Copy link
Contributor

. Yes, it makes sense to update the iptables in one shot. However, we still need to fetch the older rules and remove those rules for which the pods don't exist. This is what the updated approach would look like then:

Why we need to fetch old rules from iptables? Is it for restoring the port allocation of existing Pods? Could we restore the information from Pod annotation instead, assuming K8s API is our datastore?
iptables-retore will remove non-existing rules for you.

@hemantavi
Copy link
Contributor Author

Why we need to fetch old rules from iptables? Is it for restoring the port allocation of existing Pods? Could we restore the information from Pod annotation instead, assuming K8s API is our datastore?

@jianjuns good point. However, since the pod annotations are editable by the users, do you think it is fine to be dependent on the annotations for the source of truth about the port allocations?

continue
}
}
podList, err := kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't scale well and may take some time to finish the initialization because:

  1. kube-apiserver doesn't have index for podIP, it will linear search to get the target Pod.
  2. each pod will do the same on apiserver.

I think you can list local Pods once and build a podIP to pod map then use it in this loop.
Or you can leverage the PodLister added in #1743.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Yeah, will change it.

// for each node. If yes, it assigns a port for this Pod and adds that as a rule.
func podsToPodPortMap(podCache map[string][]k8s.PodNodePort, kubeClient clientset.Interface, portTable *portcache.PortTable,
nodeName string) error {
podList, err := kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, we should avoid querying same data multiple times.

@@ -29,6 +30,13 @@ import (
// NodePortLocalChain is the name of the chain in IPTABLES for Node Port Local
const NodePortLocalChain = "ANTREA-NODE-PORT-LOCAL"

// DestinationPodIPPort stores the PodIP and Pod port parsed from the iptable rules
// for NPL
type DestinationPodIPPort struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some structs having similar structs but a little difference, for instance, DestinationPodIPPort has no NodePort, PodNodePort has no PodIP. It seems just because they are using different fields as keys in different places. Maybe we could just have one struct and let the key redundant in value to avoid declaring multiple structs to keep code cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will change these to have only struct.

@jianjuns
Copy link
Contributor

@hemantavi : I believe we should use Pod annotation as source of truth, instead of iptables state. But for your point that user might modify Pod annotation wrongly, maybe we can do some validation and remove invalid annotation?

@antoninbas
Copy link
Contributor

@hemantavi : I believe we should use Pod annotation as source of truth, instead of iptables state. But for your point that user might modify Pod annotation wrongly, maybe we can do some validation and remove invalid annotation?

+1 to that. Plus in case of Node restart, you would only be able to rely on Pod annotations anyway.

@hemantavi
Copy link
Contributor Author

@hemantavi : I believe we should use Pod annotation as source of truth, instead of iptables state. But for your point that user might modify Pod annotation wrongly, maybe we can do some validation and remove invalid annotation?

Ok. Sure, we can remove the annotations that are not required or are invalid.

+1 to that. Plus in case of Node restart, you would only be able to rely on Pod annotations anyway.

I see. Makes sense. Unless there's any other way to persist the node port data reliably, we can go with the pod annotations.

Will make the changes.

Base automatically changed from master to main January 26, 2021 00:00
@lgtm-com
Copy link

lgtm-com bot commented Jan 26, 2021

This pull request introduces 1 alert when merging 8b43640 into e61cfbc - view on LGTM.com

new alerts:

  • 1 for Off-by-one comparison against length

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple more comments

Comment on lines 130 to 138
if addr := net.ParseIP(podIPPort[0]); addr == nil {
klog.Warningf("failed to parse IP address from %s", podIPPort[1])
continue
}
p, err := strconv.Atoi(podIPPort[1])
if err != nil || p < 0 {
klog.Warningf("failed to parse port from %s: %s", podIPPort[2], err.Error())
continue
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an alert from our static analysis tool because of incorrect indexes when accessing podIPPort in your log messages - it should be podIPPort[0] for the address and podIPPort[1] for the port. Please address.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the reason it was left out was because of a format change. Anyway, its not required anymore. Have removed it. Thanks.

Comment on lines 117 to 118
klog.Errorf("error in fetching the Pods for Node %s: %s", nodeName, err.Error())
return errors.New("error in fetching Pods for the Node" + nodeName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, logging the error message and returning the same error is bad as it means the same error will get logged twice.

Why not just fmt.Errorf(error when fetching the Pods for Node %s: %v", nodeName, err) ? I expect that returning an error in this function will eventually lead to an error log message somewhere up in the call stack.
You should also use the %v as the verb for errors, instead of calling the Error() method on them. This method will be called automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Fixed it.

Comment on lines 193 to 194
klog.Warningf("unable to update annotation for Pod %s/%s, error: %s", newPod.Namespace, newPod.Name, err.Error())
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by the fact that you are logging a warning and returning an error. If I understand the code correctly, this error will eventually cause InitializeNPLAgent to return an error, which will be logged, and the agent to exit with an error code. So is the warning useful here? See above comment on logging + returning the same error in a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed this too.

pkg/agent/nodeportlocal/k8s/annotations.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/k8s/annotations.go Outdated Show resolved Hide resolved
}

annotations = append(annotations, NPLAnnotation{
PodPort: containerPort,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is not added by this PR, but why we do not name the field in the annotation as "PodPort", but not "ContainerPort" to be consistent with the K8s convention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean, change the annotation field to have "ContainerPort" instead of "PodPort"? If yes, can we take care of that once the other PR (which handles the service annotations gets merged), so that we avoid any conflicts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what I mean. Another PR works.

pkg/agent/nodeportlocal/rules/iptable_rule.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/npl_agent_init.go Outdated Show resolved Hide resolved
pkg/agent/nodeportlocal/npl_agent_init.go Outdated Show resolved Hide resolved
err := json.Unmarshal([]byte(nplAnnotation), &nplData)
if err != nil {
// if there's an error in this NPL Annotation, clean it up
err := cleanupNPLAnnotationForPod(kubeClient, podCopy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we do patch here, instead of copying the Pod?

Copy link
Contributor Author

@hemantavi hemantavi Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this, do you have any pointers regarding the type of patching we can use here? Since we are removing an annotation here, StrategicMergePatchType didn't work and ApplyPatchType also failed citing that it was an unsupported type. JsonPatchType was also not helpful as it didn't replace the annotations list.

From k8s docs: https://kubernetes.io/docs/tasks/manage-kubernetes-objects/update-api-object-kubectl-patch/, it says if the fields don't have a patch strategy, the default is to replace, however I don't see that with Pod Annotations. @jianjuns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find way to delete a single annotation by patching. @tnqn : any idea?
But at least we should copy and patch only annotations? Also if we go this way, you need to handle conflicts. See here: https://github.com/vmware-tanzu/antrea/blob/e61cfbc6f42bd159faa5dd6a0ae04980382caa87/pkg/agent/controller/traceflow/packetin.go#L47

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But at least we should copy and patch only annotations?

The problems that I mentioned before regarding specifying the patch strategy were for annotations (and for the whole pod as well). I tried patching only the annotations with StrategicMergePatchType and ApplyPatchType. They didn't seem to work.

Regarding https://github.com/vmware-tanzu/antrea/blob/e61cfbc6f42bd159faa5dd6a0ae04980382caa87/pkg/agent/controller/traceflow/packetin.go#L47 , it looks like conflicts can happen if multiple agents are updating TraceFlow resource. However, do you think we would hit this scenario if there's only one antrea agent updating the pod, since, each agent works on the pods that are on that corresponding node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I should revert what I said. I did not try it yet, but we used to use json merge path to delete a single annotation. Have you tried it does not work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I tried the annotation-only patching this time with JSONPatchType and it worked. Have updated the PR. Thanks!

err := json.Unmarshal([]byte(nplAnnotation), &nplData)
if err != nil {
// if there's an error in this NPL Annotation, clean it up
err := cleanupNPLAnnotationForPod(kubeClient, podCopy)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find way to delete a single annotation by patching. @tnqn : any idea?
But at least we should copy and patch only annotations? Also if we go this way, you need to handle conflicts. See here: https://github.com/vmware-tanzu/antrea/blob/e61cfbc6f42bd159faa5dd6a0ae04980382caa87/pkg/agent/controller/traceflow/packetin.go#L47

// check if a valid NPL Annotation exists for this Pod:
// if yes, verifiy validity of the Node port, update the port table and add a rule to the
// rules buffer.
podCopy := pod.DeepCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move it to line 95? And if we update only annotations, probably do not copy the whole Pod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the copying line. Thanks!
Regarding the annotations, I have commented below.

@hemantavi
Copy link
Contributor Author

Rebased to the new main on #1743.

}
annotations := pod.Annotations
delete(annotations, nplk8s.NPLAnnotationKey)
payload := []patchStringValue{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can delete a single annotation by setting the annotation to nil.

Could you change the annotation update/setting code with patching too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From: https://tools.ietf.org/html/rfc7386

Changing the value of "a" and removing "f" can be achieved by
   sending:
       PATCH /target HTTP/1.1
       Host: example.org
       Content-Type: application/merge-patch+json
       {
         "a":"z",
         "c": {
       "f": null
         }
       }
   When applied to the target resource, the value of the "a" member is
   replaced with "z" and "f" is removed, leaving the remaining content
   untouched.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8sClient.CoreV1().Pods("default").Patch(context.TODO(), "pod1", types.MergePatchType, []byte(`{"metadata":{"annotations":{"abc":null}}}`), metav1.PatchOptions{})

This can remove annotation "abc".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can remove annotation "abc".

Yes, this works. Thanks! Have updated the PR.

Could you change the annotation update/setting code with patching too?

Done.

jianjuns
jianjuns previously approved these changes Feb 5, 2021
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Let us see if @antoninbas or @tnqn has more comments.

klog.Warningf("Failed to get Node's name, NodePortLocal annotation cannot be removed for Pods scheduled to this Node")
return
func patchPod(value []NPLAnnotation, pod corev1.Pod, kubeClient clientset.Interface) error {
type Metadata struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metaData

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this, using a map instead.

if err != nil {
klog.Warningf("Unable to list Pods, err: %v", err)
return
type PatchData struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patchData

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this, using a map instead.

} else {
payloadValue[NPLAnnotationKey] = nil
}
payload := PatchData{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use anonymous structs here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not using struct members anywhere, just changed the structs to maps instead.

@antoninbas
Copy link
Contributor

We should merge #1826 first and unit tests in this PR should be updated accordingly

payloadBytes, _ := json.Marshal(newPayload)
if _, err := kubeClient.CoreV1().Pods(pod.Namespace).Patch(context.TODO(), pod.Name, types.MergePatchType,
payloadBytes, metav1.PatchOptions{}); err != nil {
return fmt.Errorf("unable to update Annotation for Pod %s/%s, %s", pod.Namespace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("unable to update Annotation for Pod %s/%s, %s", pod.Namespace,
return fmt.Errorf("unable to update Annotation for Pod %s/%s: %s", pod.Namespace,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

klog.Warningf("Unable to update annotation for Pod %s/%s, error: %v", pod.Namespace, pod.Name, err)
return err
// CleanupNPLAnnotationForPod removes the NPL Annotation from the Pod's Annotations map entirely.
func CleanupNPLAnnotationForPod(kubeClient clientset.Interface, pod corev1.Pod) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func CleanupNPLAnnotationForPod(kubeClient clientset.Interface, pod corev1.Pod) error {
func CleanupNPLAnnotationForPod(kubeClient clientset.Interface, pod *corev1.Pod) error {

to avoid a copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the idea was to not use a pointer because we weren't changing anything for that pod. But yeah, we should avoid a copy too. Changed it.

newPod := pod.DeepCopy()
removePodAnnotation(newPod)
return c.updatePodAnnotation(newPod)
return CleanupNPLAnnotationForPod(c.kubeClient, *pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanupNPLAnnotationForPod has the same check as L348

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, removed it from here, not required.

}

if err := portTable.PodPortRules.AddAllRules(allNPLPorts); err != nil {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if err != nil, it ignores the error? why addRulesForNPLPorts returns an error then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed to return an error here. Fixed it. Thanks!

tnqn
tnqn previously approved these changes Feb 9, 2021
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

err := json.Unmarshal([]byte(nplAnnotation), &nplData)
if err != nil {
// if there's an error in this NPL Annotation, clean it up
err := nplk8s.CleanupNPLAnnotationForPod(kubeClient, &pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to use &podList.Items[i] to avoid the gosec failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you rebase main and update the unit tests to use the same "framework"?

Comment on lines 136 to 57
klog.Info("Will fetch Pods and generate NPL rules for these Pods")
if err := getPodsAndGenRules(kubeClient, portTable, nodeName); err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be done in the Run method of the controller, after the caches are synced and before starting the workers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. Moved it. Thanks.

Comment on lines 87 to 89
klog.Warningf("Unable to patch NPL annotations for Pod %s/%s: %s", pod.Namespace, pod.Name, err.Error())
}
klog.V(2).Infof("Successfully updated annotation for Pod %s/%s", pod.Namespace, pod.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logs & error messages should be consistent:

  • "NPL annotations" vs "annotation"
  • sometimes annotation is capitalized, sometimes it's not
  • sometimes you spell out NodePortLocal, sometimes you use NPL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it and made it consistent for the respective files. Thanks.

Currently we delete all the iptable rules when the antrea-agent
is restarted. This commit does the following:
- Evaluates the iptable rules in the Node Port Local chain and
  check if a rule is required. If a Pod got deleted while the
  antrea-agent was restarting, we don't need that rule and has
  to be deleted.
- The above evaluation logic also creates a cache for all the
  pods retrived from the iptable rules.
- Evaluates the kubernetes Pods, and for each Pod, it checks if
  its part of the above cache. If yes, only the Pod's annotations
  needs to be verified. If not, a new Node port has to be allocated
  and a rule has to be added.

One of the TODOs as mentioned
[here](antrea-io#1459).
* Simplified the flow to:
  - Get a list of all the Pods on this Node
  - Construct a list of NPL chain rules from the Pods' NPL Annotations
  - Restore the rules on the NPL chain

* Took care of nits, capitalization and removed unnecessary blocks
* Fix a few nits.
* Make IPTablesRules private.
* Moved the copying of the pod.
This is to update the NPL Annotations.
* Update the Pod NPL Annotations at all places via Patch.
* Remove cleanup of annotations from InitController().
* Break if even one of the annotations is invalid for a Pod.
Also fixes the logs and comments.
err := json.Unmarshal([]byte(nplAnnotation), &nplData)
if err != nil {
// if there's an error in this NodePortLocal annotation, clean it up
err := CleanupNPLAnnotationForPod(c.kubeClient, &podList.Items[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the code re-org, CleanupNPLAnnotationForPod can become a method of NPLController: there is no longer a need to pass the K8s client as a parameter directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, done.

Comment on lines 410 to 411
klog.Warningf("Unable to unmarshal NodePortLocal annotation for Pod %s/%s",
pod.Namespace, pod.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, you can use key directly in log messages to print <podNamespace>/<podName>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Thanks.

// is invalid or the NodePortLocal annotation is invalid, the NodePortLocal annotation is removed. The Pod
// event handlers take care of allocating a new Node port if required.
func (c *NPLController) GetPodsAndGenRules() error {
podList, err := c.kubeClient.CoreV1().Pods(metav1.NamespaceAll).List(context.TODO(), metav1.ListOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as pointed out earlier, you should use the podLister member of NPLController to list the Pods as it will used information which has already been fetched (and cached) from the K8s apiserver. There should be no need to do a new API request.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, wasn't possible earlier because this func was getting called from InitializeNPLAgent(). Since we have now moved it to the controller's Run() function, I have changed it to use the podLister. Thanks.

}
var ok bool
portTable, ok := portcache.NewPortTable(start, end)
if !ok {
return nil, errors.New("NPL port table could not be initialized")
return nil, errors.New("error in initializing NodePortLocal port table")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/error in/error when/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

err = portTable.PodPortRules.Init()
if err != nil {
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we no longer wrapping the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Fixed it. Thanks.

Comment on lines 56 to 58
for k := range pt.Table {
delete(pt.Table, k)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this is valid Go, why not just build a new map: ptable.Table = make(map[int]NodePortData)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// TestInitInvalidPod simulates an agent reboot case. A Pod with an invalid NPL annotation is
// added, this invalid annotation should get cleaned up. And a proper NPL annotation should get
// added.
func TestInitInvalidPod(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can the test pass without an EXPECT statement for AddAllRules on the mock object?

This seems to be testing an edge case, but I imagine that the actual functionality will be tested by e2e test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can the test pass without an EXPECT statement for AddAllRules on the mock object?

Yeah, weird, I think this test was passing because AddAllRules function isn't getting called for the invalid annotation. Have added this now.

This seems to be testing an edge case, but I imagine that the actual functionality will be tested by e2e test cases.

Yes, that's right.

- Using Pod lister to get all the Pods for the current Node.
- Add an EXPECT statement for AddAllRules().
- Fixed comments and other re-arrangements.
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address CI failures

only a couple nits remaining on my side, otherwise I think all my comments have been addressed. @jianjuns do you want to take another look?

klog.Info("Will fetch Pods and generate NodePortLocal rules for these Pods")

if err := c.GetPodsAndGenRules(); err != nil {
klog.Errorf("Error in getting Pods and generating rules: %s", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
klog.Errorf("Error in getting Pods and generating rules: %s", err.Error())
klog.Errorf("Error in getting Pods and generating rules: %v", err)

would be good to fix any other occurrence of this in the rest of the NPL code as well if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, replaced at other places too.

err = portTable.PodPortRules.Init()
if err != nil {
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %v", err)
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %s", err.Error())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return nil, fmt.Errorf("NPL rules for pod ports could not be initialized, error: %s", err.Error())
return nil, fmt.Errorf("NodePortLocal rules for Pod ports could not be initialized: %v", err)

@@ -151,3 +148,16 @@ func (ipt *IPTableRules) DeleteAllRules() error {
}
return nil
}

// Join all words with spaces, terminate with newline and write to buf.
func writeLine(buf *bytes.Buffer, words ...string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this function is the same as the one in pkg/agent/route/route_linux.go, it should got into a common utility package, but that can be done in a future PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, noted. Will take care of it in a subsequent PR.

Also, changed the printing of err.Error() to err.
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas
Copy link
Contributor

/test-all

@antoninbas
Copy link
Contributor

/test-conformance

@antoninbas antoninbas merged commit 2b0221a into antrea-io:main Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants