Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: NetlinkRIB's Recvfrom hangs #69797

Closed
quatre opened this issue Oct 7, 2024 · 4 comments
Closed

syscall: NetlinkRIB's Recvfrom hangs #69797

quatre opened this issue Oct 7, 2024 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@quatre
Copy link

quatre commented Oct 7, 2024

Go version

go version go1.22.5 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/abcd/.cache/go-build'
GOENV='/home/abcd/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/abcd/gocode/pkg/mod'
GOOS='linux'
GOPATH='/home/abcd/gocode'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/abcd/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/abcd/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.22.5'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/abcd/app/ovsstats-exporter-0.5.2/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3392846309=/tmp/go-build -gno-record-gcc-switches'

What did you do?

We have this custom-made exporter that periodically tries to get the IP addresses of interfaces in different network namespaces.
Basically it:

  • spawns a goroutine and locks it to its thread
  • switches into a namespace
  • Calls net.Interfaces()
  • Iterates onto each interface, each time calling iface.Addrs()

What did you see happen?

In some (very) rare cases, the call to net.(*Interface).Addrs hangs, until I either kill the process, or dlv attach it, at which point the syscall is interrupted.

I collected a stack trace from the blocked goroutine, of which here's a partial extract:

 0  0x0000000000409d2e in runtime/internal/syscall.Syscall6
    at /usr/local/go/src/runtime/internal/syscall/asm_linux_amd64.s:36
 1  0x0000000000409d0d in syscall.RawSyscall6
    at /usr/local/go/src/runtime/internal/syscall/syscall_linux.go:38
 2  0x00000000004cdf4a in syscall.Syscall6
    at /usr/local/go/src/syscall/syscall_linux.go:92
 3  0x00000000004cd994 in syscall.recvfrom
    at /usr/local/go/src/syscall/zsyscall_linux_amd64.go:1545
 4  0x00000000004ca373 in syscall.Recvfrom
    at /usr/local/go/src/syscall/syscall_unix.go:320
 5  0x00000000004c7dc9 in syscall.NetlinkRIB
    at /usr/local/go/src/syscall/netlink_linux.go:89
 6  0x0000000000578005 in net.interfaceAddrTable
    at /usr/local/go/src/net/interface_linux.go:124
 7  0x00000000005769e5 in net.(*Interface).Addrs
    at /usr/local/go/src/net/interface.go:81

Recvfrom on the netlink socket opened by syscall.NetlinkRIB seems to be the culprit here.

I have also collected a dump from NetlinkRIB's tab byte array. It contains a few syscall.RTM_NEWADDR messages but of course, no syscall.NLMSG_DONE.

What did you expect to see?

The call to iface.Addrs() should not hang.

Although I have no idea why it hangs, I've found issues here and there on different projects that look similar:

I'm no netlink specialist. Does anyone know if the kernel stopping talking to us over the netlink socket is something that can happen? Should we make that socket non-blocking?

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Oct 7, 2024
@cherrymui
Copy link
Member

cc @mikioh for netlink.

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 7, 2024
@cherrymui cherrymui added this to the Backlog milestone Oct 7, 2024
@quatre
Copy link
Author

quatre commented Oct 14, 2024

Hello!
It looks like the problem was between the keyboard and chair, more specifically somewhere we were closing some fd twice.
That second close was most likely killing the netlink socket every once in a while, resulting in that blocked recvfrom.

@quatre quatre closed this as not planned Won't fix, can't repro, duplicate, stale Oct 14, 2024
@ianlancetaylor
Copy link
Member

Thanks for following up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants