Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic IPTables sync #1751

Merged
merged 1 commit into from
Feb 3, 2021
Merged

Conversation

siddhant94
Copy link
Contributor

@siddhant94 siddhant94 commented Jan 15, 2021

Fixes #628 .
Add a long running goroutine which periodically syncs antrea required iptables rules on linux.
If the sync call/operation fails (for example - xtables lock contention), the next attempt would occur at the next sync interval.

@vmwclabot
Copy link

@siddhant94, you must sign our contributor license agreement before your changes are merged. Click here to sign the agreement. If you are a VMware employee, read this for further instruction.

@codecov-io
Copy link

codecov-io commented Jan 15, 2021

Codecov Report

❗ No coverage uploaded for pull request base (main@4d6bdaf). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1751   +/-   ##
=======================================
  Coverage        ?   63.19%           
=======================================
  Files           ?      192           
  Lines           ?    16436           
  Branches        ?        0           
=======================================
  Hits            ?    10386           
  Misses          ?     4986           
  Partials        ?     1064           
Flag Coverage Δ
e2e-tests 43.99% <0.00%> (?)
kind-e2e-tests 50.17% <0.00%> (?)
unit-tests 42.73% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this PR, I left some comments. Please make sure you sign the CLA as well.

test/integration/agent/route_test.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
deleteRuleCmd := deleteOption + strings.Join([]string{"-m", "comment", "--comment", "\"Antrea: jump to Antrea prerouting rules\"", "-j", "ANTREA-PREROUTING"}, " ")

// #nosec G204: ignore in test code
actualData, err := exec.Command("bash", "-c", deleteRuleCmd).Output()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice other tests doing the same thing (calling iptables through bash), but I wonder why the iptables executable cannot be invoked directly instead?

@siddhant94
Copy link
Contributor Author

siddhant94 commented Jan 17, 2021

Hi @antoninbas , just to collate feedback points,

  • Use wait.Until
  • Start long running from agent.go
  • Pass stopCh, called only once
  • Make sync interval top level constant
  • Separate integration test

I have addressed these, let me know if implementation is still inconsistent.
Also needed feedback on couple of things -

  1. sync interval is set at 60sec, and,
  2. Currently, routeClient initializes iptable with a retry logic(because of xtables lock). So, should the sync operation also have similar behaviour? In this PR, the sync operation on encountering error returns, which implies that next attempt would only be at next sync interval.

@siddhant94 siddhant94 force-pushed the syncloop-ip-tables branch 2 times, most recently from d7ec98b to 179930e Compare January 18, 2021 19:08
@siddhant94
Copy link
Contributor Author

Looking into ci failures

@vmwclabot
Copy link

@siddhant94, VMware has approved your signed contributor license agreement.

@siddhant94
Copy link
Contributor Author

Looking into ci failures

Fixed, updated the PR.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhant94 thanks for the PR, I left some comments.

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
@siddhant94
Copy link
Contributor Author

@siddhant94 thanks for the PR, I left some comments.

Updated the PR @tnqn

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
test/integration/agent/route_test.go Outdated Show resolved Hide resolved
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @siddhant94. One comment about the interface argument, otherwise LGTM.

@@ -253,6 +255,8 @@ func run(o *Options) error {

log.StartLogFileNumberMonitor(stopCh)

go routeClient.Run(stopCh, ipTablesSyncInterval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a comment about the consistant in #1751 (comment). Could we move ipTablesSyncInterval to route_linux.go? For two reasons:

  1. routeClient is an interface for both windows and linux platform, the argument doesn't make sense to windows.
  2. the argument is not configurable and specific to routeClient internal logic so no need to be the main package. Almost all such constants/variables are declared in their own package. informerDefaultResync is the only one declared here because it's an argument of the global component informerFactory which is used by several components.

For testing, you may declare it as a variable instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed @tnqn , this should also stay internal. I missed this in previous iteration. I was thinking of exposing a function SetSyncInterval which would override the variable, it would be meant to be called from route package's Initialization() method and from the integration test.
Let me know if this is inconsistent. Also, if there's someplace in antrea code where we have a similar issue, I can look at that and change this part.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that the integration test is in a different package. I'm ok with just declaring the variable as public, but we should comment why it's made so. We had a similar case before that an integration test needs to use package internal consistant/variable and it just made them public to solve it. However, that code has been deleted when implmenetation evolves.

@antoninbas what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically my favorite approach is to have a way to pass it to the constructor:

const ipTablesSyncInterval = 60 * time.Second

type ClientConfig struct {
    IPTablesSyncInterval time.Duration
}

func NewClient(serviceCIDR *net.IPNet, networkConfig *config.NetworkConfig, noSNAT bool, configFns ...func(*ReporterConfig)) *Client {
    config := ClientConfig{
        IPTablesSyncInterval: defaultIPTablesSyncInterval,
    }
    for _, fn := range configFns {
        fn(&config)
    }
    return &Client{
        ClientConfig: config,
        // other members
    }
}

This way the call to NewClient does not need to change in cmd/antrea-agent/agent.go, but for the integration tests (or unit tests for that matter), we can easily provide a different value for the sync interval.

Of course that may be a bit overkill here with a single configuration parameter. So if you prefer to make the variable public and add a comment, that's fine by me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antoninbas I actually missed this last comment, I just noticed it after push.

Base automatically changed from master to main January 26, 2021 00:00
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhant94 The commit message seems having a very long title or missing empty line between title and body. Could you update it following https://github.com/golang/go/wiki/CommitMessage?

pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_linux.go Outdated Show resolved Hide resolved
pkg/agent/route/route_windows.go Outdated Show resolved Hide resolved
pkg/agent/route/interfaces.go Outdated Show resolved Hide resolved
@siddhant94 siddhant94 force-pushed the syncloop-ip-tables branch 2 times, most recently from 49ef3ae to e312148 Compare January 27, 2021 00:59
@siddhant94
Copy link
Contributor Author

@siddhant94 The commit message seems having a very long title or missing empty line between title and body. Could you update it following https://github.com/golang/go/wiki/CommitMessage?

Updated it as per the wiki.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@siddhant94 The body of commit message should be wrapped to ~76 characters, otherwise it wouldn't look good from git log. And I think you don't need to add "pkg/agent/route:" as prefix as the other words can already scope the change well.

pkg/agent/route/interfaces.go Outdated Show resolved Hide resolved
@siddhant94
Copy link
Contributor Author

@siddhant94 The body of commit message should be wrapped to ~76 characters, otherwise it wouldn't look good from git log. And I think you don't need to add "pkg/agent/route:" as prefix as the other words can already scope the change well.

@tnqn I updated the Run fn comment and removed the said prefix from commit, but, for commit body, I could only reduce it to ~100 chars. Should I just omit the body? Just keep the "Fixes" line and title/subject line.

@tnqn
Copy link
Member

tnqn commented Jan 28, 2021

@siddhant94 By wrapping the body of commit message, I meant to limit each line ~76 chars.
For example, this is how the current git log shows:

commit 5cbe43181f6143f13b683f4a589dde30dfb492c9
Author: Siddhant Sinha <[email protected]>
Date:   Thu Jan 14 15:12:55 2021 +0530

    Periodically sync antrea required iptables rules

    Add a long-running goroutine, which waits for iptables to be initialised, before periodically syncing it.

    Fixes #628

By wrapping it, it appears as:

commit 5cbe43181f6143f13b683f4a589dde30dfb492c9
Author: Siddhant Sinha <[email protected]>
Date:   Thu Jan 14 15:12:55 2021 +0530

    Periodically sync antrea required iptables rules

    Add a long-running goroutine, which waits for iptables to be
    initialised, before periodically syncing it.

    Fixes #628

Add a long-running goroutine which periodically syncs iptables. To be able
to configure the sync interval for integration tests, IPTablesSyncInterval
is exported.

Fixes antrea-io#628
@siddhant94
Copy link
Contributor Author

@tnqn I missed the "wrapped" word, thus misunderstood it completely. I have updated the commit msg, with each line within ~76 chars.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @siddhant94

@siddhant94
Copy link
Contributor Author

LGTM, thanks @siddhant94

Thanks @tnqn @antoninbas for the patient iterations. Learnt new stuff and got to know a lot of antrea context also.

@tnqn
Copy link
Member

tnqn commented Jan 29, 2021

/test-all

@tnqn
Copy link
Member

tnqn commented Jan 29, 2021

/test-windows-conformance
/test-windows-networkpolicy

1 similar comment
@tnqn
Copy link
Member

tnqn commented Jan 29, 2021

/test-windows-conformance
/test-windows-networkpolicy

@tnqn
Copy link
Member

tnqn commented Jan 29, 2021

@lzhecheng do you know why the commands cannot trigger windows e2e tests?

@lzhecheng
Copy link
Contributor

@lzhecheng do you know why the commands cannot trigger windows e2e tests?

There was something wrong with Jenkins smee service. Now it should work.
/test-windows-conformance
/test-windows-networkpolicy

@tnqn
Copy link
Member

tnqn commented Feb 1, 2021

/test-windows-conformance
/test-windows-networkpolicy

@lzhecheng
Copy link
Contributor

windows-conformance is pending now but it has actually succeeded, checked on Jenkins side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Periodically verify that all required iptables rules are present in the Node
7 participants