Discussion: removing lockThread from WithNetNS? #183

rosenhouse · 2016-04-18T06:31:25Z

Summary

I believe that we can provide WithNetNS and WithNetNSPath functions that remove the lockThread argument and completely hide from the caller the complexity of goroutine thread affinity. This could let us simplify a lot of our plugin and test code. I'm opening this issue to gather feedback before I assemble a PR.

Background

In our network namespace switching package, we provide WithNetNS

func WithNetNS(ns *os.File, lockThread bool, f func(*os.File) error) error

which runs a function f inside a network namespace ns.

The middle argument, lockThread is easily misused.

If the caller passes false then she must be sure that she has called runtime.LockOSThread beforehand. Many of the CNI plugins use this approach, but it requires care, especially when creating goroutines.

Conversely, if the caller passes true, then that must be the only call using true within the call-stack. This is because Lock and UnlockOSThread do not reference count, so a single Unlock will unlock all prior Lock calls. That can leave a caller unlocked without knowing it.

Incorrect usage either way can lead to all sorts of hard-to-debug undefined behavior as goroutines end up in unexpected namespaces.

Proposed change

Replace WithNetNS in pkg/ns with the following

// WithNetNS synchronously executes toRun within the given network namespace.
//
// It does not affect the network namespace of the calling goroutine
// nor does it modify the thread-affinity of the calling goroutine.
// The caller does not need to call runtime.LockOSThread()
func WithNetNS(netNS *os.File, toRun func() error) error {
    containedCall := func() error {
        originalNetNS, err := os.Open(fmt.Sprintf("/proc/%d/task/%d/ns/net", os.Getpid(), syscall.Gettid()))
        if err != nil {
            return fmt.Errorf("opening current netns: %s", err)
        }
        defer originalNetNS.Close()

        if err = SetNS(netNS, syscall.CLONE_NEWNET); err != nil {
            return fmt.Errorf("switching to netns %v: %v", netNS.Name(), err)
        }
        defer SetNS(originalNetNS, syscall.CLONE_NEWNET) // switch back

        return toRun()
    }

    var wg sync.WaitGroup
    wg.Add(1)

    var innerError error
    go func() {
        defer wg.Done()
        runtime.LockOSThread()
        innerError = containedCall()
    }()
    wg.Wait()

    return innerError
}

We would need to update all uses of WithNetNS and WithNetNSPath to drop the lockThread boolean.

To see this approach in a slightly different context, look at MakeNetworkNS in our new testhelpers package. There are some explanatory comments above that.

Feedback?

Please share why this won't work, would break your client, or other issues you forsee. I haven't looked carefully at all the plugins.

If there aren't objections, I can put together a PR. (I know there are a couple other PRs in flight addressing netns switching and test coverage of plugins, there may be some overlap here? #167 #176 )

Thanks to @zachgersh for the original insight here.
cc: @sykesm

The text was updated successfully, but these errors were encountered:

dcbw · 2016-04-18T14:40:08Z

Any chance we could postpone any PR from here until I've gotten #167 merged? That changes quite a few things in the NS package so that we can mock out namespace operations.

dcbw · 2016-04-18T17:27:43Z

So the current closures take a pointer to the current host netNS, which this proposal drops. Maybe that was an oversight? If so, one issue I see with the current code is that the current netns from the goroutine won't necessarily be the calling thread's namespace that the closure will expect. ptp.go's setupVeth() still uses that to make sure that the host-side is put in the right namespace. That might require grabbing hte current thread's netns twice, once outside the goroutine (to pass to the closure) and once inside (to kindly reset that thread's namespace).

This also removes the thread-locking arguments from the ns package per containernetworking#183 by doing all the namespace changes in a separate goroutine that locks/unlocks itself, instead of the caller having to track OS thread locking.

dcbw · 2016-04-18T19:33:35Z

@rosenhouse I've implemented the approach you've outlined here in #167 in the first commit "ns: make namespaces an interface and add NetNS creation capability", and since I moved the NetNS creation capability into the ns package itself the testhelpers stuff goes away (including the testcases, the unique ones I added to ns_test.go). Seems to work pretty well.

This also removes the thread-locking arguments from the ns package per containernetworking#183 by doing all the namespace changes in a separate goroutine that locks/unlocks itself, instead of the caller having to track OS thread locking.

rosenhouse · 2016-04-19T03:14:27Z

@dcbw Thanks for the comments, I'm glad you found this useful already for your PR.

Dropping the "host" netNS argument to toRun() was intentional and I should have said so. Here's why: It always struck me as unnecessarily complex to provide that argument, given that Go supports closures. The user can just close their callback function over a file handle to the calling namespace if they need it.

I would be totally in favor of exporting a function like

func CurrentNetNS() (*os.File, error) {
    return os.Open(fmt.Sprintf("/proc/%d/task/%d/ns/net", os.Getpid(), syscall.Gettid()))
}

to make the closure approach easy, e.g.

originalNS := ns.CurrentNetNS()
err := ns.WithNetNS(containerNS, func() error {
    // do stuff with containerNS
    innerErr := ns.WithNetNS(originalNS, func() error {
        /* do stuff in originalNS */
    })
    if innerErr != nil {
       return innerErr
    }
    // do more stuff in the containerNS
}

(And of course CurrentNetNS() could be made as a method on NetOps struct returning a NetNS interface, as in your #167)

Does that make sense? Am I missing a use-case for the function argument?

This also removes the thread-locking arguments from the ns package per containernetworking#183 by doing all the namespace changes in a separate goroutine that locks/unlocks itself, instead of the caller having to track OS thread locking.

dcbw · 2016-04-22T17:50:24Z

That makes sense, and I was originally removing the argument until I found the sole caller(s). I'll play around and see what I come up with.

This also removes the thread-locking arguments from the ns package per containernetworking#183 by doing all the namespace changes in a separate goroutine that locks/unlocks itself, instead of the caller having to track OS thread locking.

…sues Add a namespace object interface for somewhat cleaner code when creating and switching between network namespaces. All created namespaces are now mounted in /var/run/netns to ensure they have persistent inodes and paths that can be passed around between plugin components without relying on the current namespace being correct. Also remove the thread-locking arguments from the ns package per containernetworking#183 by doing all the namespace changes in a separate goroutine that locks/unlocks itself, instead of the caller having to track OS thread locking.

rosenhouse · 2016-05-19T16:49:25Z

This issue is even more complicated than I originally thought in the case of nested calls to the WithNetNS function above. The go runtime may create a new os thread when the goroutine is created, and this thread will inherit the namespace of the parent -- the "switched to" non-original namesapce. That means that when the inner call tries to restore the "original" netns before returning, it actually ends up restoring to the "wrong" namespace -- and you're left with a long-lived thread in the wrong namespace.

For this reason, the code above is dangerous -- specifically my follow-up comment showing nested calls. I'm closing this issue until we can figure out how to make this safer.

…sues Add a namespace object interface for somewhat cleaner code when creating and switching between network namespaces. All created namespaces are now mounted in /var/run/netns to ensure they have persistent inodes and paths that can be passed around between plugin components without relying on the current namespace being correct. Also remove the thread-locking arguments from the ns package per containernetworking#183 by doing all the namespace changes in a separate goroutine that locks/unlocks itself, instead of the caller having to track OS thread locking.

…sues Add a namespace object interface for somewhat cleaner code when creating and switching between network namespaces. All created namespaces are now mounted in /var/run/netns to ensure they have persistent inodes and paths that can be passed around between plugin components without relying on the current namespace being correct. Also remove the thread-locking arguments from the ns package per containernetworking/cni#183 by doing all the namespace changes in a separate goroutine that locks/unlocks itself, instead of the caller having to track OS thread locking.

rosenhouse changed the title ~~Discussion: removing lockThread from WithNS operations?~~ Discussion: removing lockThread from WithNetNS operations? Apr 18, 2016

rosenhouse changed the title ~~Discussion: removing lockThread from WithNetNS operations?~~ Discussion: removing lockThread from WithNetNS? Apr 18, 2016

steveej added kind/enhancement kind/question needs review labels Apr 27, 2016

rosenhouse mentioned this issue Apr 29, 2016

Add e2e testing of ipvlan, macvlan, and bridge plugins #167

Closed

rosenhouse closed this as completed May 19, 2016

rosenhouse mentioned this issue May 19, 2016

Fix namespace switch issues and add ipvlan, macvlan, and bridge e2e testing #211

Merged

rosenhouse mentioned this issue Jul 2, 2016

After changing namespace, other goroutines may switch to that namespace #262

Closed

bboreham mentioned this issue Aug 9, 2016

Use nsinit or runtime specific method instead of nsenter in kubenet kubernetes/kubernetes#30144

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: removing lockThread from WithNetNS? #183

Discussion: removing lockThread from WithNetNS? #183

rosenhouse commented Apr 18, 2016 •

edited

Loading

dcbw commented Apr 18, 2016

dcbw commented Apr 18, 2016

dcbw commented Apr 18, 2016

rosenhouse commented Apr 19, 2016 •

edited

Loading

dcbw commented Apr 22, 2016

rosenhouse commented May 19, 2016 •

edited

Loading

Discussion: removing lockThread from WithNetNS? #183

Discussion: removing lockThread from WithNetNS? #183

Comments

rosenhouse commented Apr 18, 2016 • edited Loading

Summary

Background

Proposed change

Feedback?

dcbw commented Apr 18, 2016

dcbw commented Apr 18, 2016

dcbw commented Apr 18, 2016

rosenhouse commented Apr 19, 2016 • edited Loading

dcbw commented Apr 22, 2016

rosenhouse commented May 19, 2016 • edited Loading

rosenhouse commented Apr 18, 2016 •

edited

Loading

rosenhouse commented Apr 19, 2016 •

edited

Loading

rosenhouse commented May 19, 2016 •

edited

Loading