Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Golang 1.10 - network namespaces from within a long-lived, multithreaded Go processes #98

Closed
DennisDenuto opened this issue Nov 29, 2017 · 3 comments
Labels

Comments

@DennisDenuto
Copy link
Contributor

Context

Golang 1.10 changed how m-threads are spawned. They are spawned from a template thread and not cloned from the parent process.

Before this change, there was no safe way to change the namespace within a long lived multi-thread process.

Issue

We wrote a test for this to make sure the commit fixed the issue. It, correctly, eventually fails when run with golang 1.9. It passes with golang 1.10.

However, when we run the test with the untilItFails flag in 1.10. It will eventually error where the test failed to get the network namespace, because the process task directory was missing.

    Expected error:
        <*os.PathError | 0xc42319b590>: {
            Op: "open",
            Path: "/proc/29002/task/29015/ns/net",
            Err: 0x2,
        }
        open /proc/29002/task/29015/ns/net: no such file or directory
    not to have occurred

    /media/sf_dev_go/src/github.com/containernetworking/plugins/pkg/ns/ns_linux_test.go:112

Steps to Reproduce

  1. Run the test with untilItFails with golang 1.10. It consistently took about 300 tries for us.
  2. See it fail.

Expected result

We expected it to succeed indefinitely.

Current result

It eventually fails after running the test too many times.

Possible Fix

We found by adding a runtime.UnlockOSThread() to the Do function prevents the test from failing (even when running with the untilItFails).

We believe unlocking the os thread should be safe in 1.10 due to this fix.

However, this fix causes the test to slow down significantly over time, for reasons that we don't understand.

@squeed
Copy link
Member

squeed commented Nov 30, 2017

Interesting! 1.10 also adds another interesting feature, where if a G terminates on a locked M, the M is terminated instead of being returned to the free M pool. I strongly suspect that's the cause of the error message you're seeing: as you walk /proc, you're racing against threads (tasks) being terminated.

So, I'm not entirely sure that it's a... bug... but everything in this shadowy corner of go and linux is unclear.

@squeed
Copy link
Member

squeed commented Nov 30, 2017

Also, I'd love to understand why you see slowdowns. Theoretically by unlocking, you're preserving M's, which should be faster.

@bboreham
Copy link
Contributor

bboreham commented Feb 7, 2018

Note containernetworking/cni#262

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants