Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ns: fix reading net namespace in multi-threaded processes #176

Merged
merged 1 commit into from
Apr 18, 2016

Conversation

dcbw
Copy link
Member

@dcbw dcbw commented Apr 6, 2016

/proc/self/ns/net gives the main thread's namespace, not necessarily
the namespace of the thread that's running the testcases. This causes
sporadic failures of the tests. For example, with a testcase reading
inodes after switching netns:

/proc/27686/task/27689/ns/net 4026532565
/proc/self/ns/net 4026531969
/proc/27686/task/27689/ns/net 4026532565

See also:
vishvananda/netns@008d17a

Running Suite: pkg/ns Suite
===========================
Random Seed: 1459953577
Will run 6 of 6 specs

• Failure [0.028 seconds]
Linux namespace operations
/cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:167
  WithNetNS
  /cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:166
    executes the callback within the target network namespace [It]
    /cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:97

    Expected
        <uint64>: 4026531969
    to equal
        <uint64>: 4026532565

    /cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:96
------------------------------
•••••

Summarizing 1 Failure:

[Fail] Linux namespace operations WithNetNS [It] executes the callback within the target network namespace
/cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:96

Ran 6 of 6 Specs in 0.564 seconds
FAIL! -- 5 Passed | 1 Failed | 0 Pending | 0 Skipped --- FAIL: TestNs (0.56s)
FAIL

@dcbw
Copy link
Member Author

dcbw commented Apr 6, 2016

@steveej @jonboulle @rosenhouse took a while to debug these failures, but I think this is the fix.

@zachgersh
Copy link
Contributor

@dcbw thanks for this. One thing of note now, I think this means that this set of tests now can no longer be run in parallel due to locking to a particular thread.

@dcbw
Copy link
Member Author

dcbw commented Apr 6, 2016

@zachgersh yeah, we've got to take that hit unfortunately :(

@steveej
Copy link
Member

steveej commented Apr 7, 2016

Naiv question, why isn't it enough to put runtime.LockOSThread() in the preamble of the the testsuite? I imagine the suite will be run by the same goroutine, and if we lock that to a thread it should be safe, or not? Happy to receive a patient explanation :-)

@steveej
Copy link
Member

steveej commented Apr 8, 2016

Also, could we just use https://github.com/vishvananda/netns instead of maintaining something similar here?
/cc @rosenhouse

@dcbw
Copy link
Member Author

dcbw commented Apr 13, 2016

Naiv question, why isn't it enough to put runtime.LockOSThread() in the preamble of the the testsuite? I imagine the suite will be run by the same goroutine, and if we lock that to a thread it should be safe, or not? Happy to receive a patient explanation :-)

Locking to an OS thread doesn't guarantee which thread you're locked to. /proc/self/ns/net only tells you what the main thread's namespace is, but the current thread may not be the main thread, depending on where Go schedules you and whatever Ginkgo does.

@steveej
Copy link
Member

steveej commented Apr 14, 2016

@dcbw

Naiv question, why isn't it enough to put runtime.LockOSThread() in the preamble of the the testsuite? I imagine the suite will be run by the same goroutine, and if we lock that to a thread it should be safe, or not? Happy to receive a patient explanation :-)

Locking to an OS thread doesn't guarantee which thread you're locked to. /proc/self/ns/net only tells you what the main thread's namespace is, but the current thread may not be the main thread, depending on where Go schedules you and whatever Ginkgo does.

Thanks, sounds reasonable to assume Ginkgo runs things in parallel from the beginning. 👍
Any thoughts on using vishvanananda's netns package?

var err error
originalNetNS, err = os.Open(CurrentNetNS)
name := fmt.Sprintf("/proc/%d/task/%d/ns/net", os.Getpid(), syscall.Gettid())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind using a constant for this format string?

@dcbw
Copy link
Member Author

dcbw commented Apr 14, 2016

Any thoughts on using vishvanananda's netns package?

@steveej I looked into it yesterday and it is possible, but it makes a few things harder because it's not quite a match for the testing infrastructure PRs I've submitted. I'd like to explore doing that after this PR and the testing ones are merged if that's OK?

@dcbw
Copy link
Member Author

dcbw commented Apr 14, 2016

@steveej pulled out the function and repushed, PTAL thanks!

@dcbw
Copy link
Member Author

dcbw commented Apr 14, 2016

Another test flake?

Bad response status from coveralls: 422 - {"message":"Couldn't find a repository matching this job.","error":true}

@steveej
Copy link
Member

steveej commented Apr 14, 2016

Another test flake?

Bad response status from coveralls: 422 - {"message":"Couldn't find a repository matching this job.","error":true}

Yeah, we've been getting those but I haven't found the reason. Pressing rebuild on travis has always lead to a successful build.

@steveej
Copy link
Member

steveej commented Apr 14, 2016

LGTM

@rosenhouse would you please take a look at this too? Since you wrote the initial test.

@rosenhouse
Copy link
Contributor

@steveej Just seeing this now. Will take a look soon!

@zachgersh
Copy link
Contributor

hey @dcbw - you may want to have a look at the test-helpers I just merged from @rosenhouse.

@rosenhouse
Copy link
Contributor

rosenhouse commented Apr 18, 2016

@dcbw @steveej 👍

Sorry about the delay.

/proc/self/ns/net gives the main thread's namespace, not necessarily
the namespace of the thread that's running the testcases.  This causes
sporadic failures of the tests.

For example, with a testcase reading inodes after switching netns:

/proc/27686/task/27689/ns/net 4026532565
/proc/self/ns/net 4026531969
/proc/27686/task/27689/ns/net 4026532565

See also:
vishvananda/netns@008d17a

Running Suite: pkg/ns Suite
===========================
Random Seed: 1459953577
Will run 6 of 6 specs

• Failure [0.028 seconds]
Linux namespace operations
/cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:167
  WithNetNS
  /cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:166
    executes the callback within the target network namespace [It]
    /cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:97

    Expected
        <uint64>: 4026531969
    to equal
        <uint64>: 4026532565

    /cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:96
------------------------------
•••••

Summarizing 1 Failure:

[Fail] Linux namespace operations WithNetNS [It] executes the callback within the target network namespace
/cni/gopath/src/github.com/appc/cni/pkg/ns/ns_test.go:96

Ran 6 of 6 Specs in 0.564 seconds
FAIL! -- 5 Passed | 1 Failed | 0 Pending | 0 Skipped --- FAIL: TestNs (0.56s)
FAIL
@dcbw
Copy link
Member Author

dcbw commented Apr 18, 2016

@rosenhouse @steveej @zachgersh rebased on top of master and moved a few things around. PTAL, thanks!

@zachgersh zachgersh merged commit 7383809 into containernetworking:master Apr 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants