-
Notifications
You must be signed in to change notification settings - Fork 700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go-nats panic #367
Comments
I had fixed a similar issue in 1.3.0, so I will have another look. In the meantime, could you check for me the go client details? There is no 1.3.1, and I looked at few of the releases and the line numbers don't match any. |
Thanks kozlovic, package nats
import (
"bufio"
"bytes"
"crypto/tls"
"crypto/x509"
"encoding/json"
"errors"
"fmt"
"io/ioutil"
"math/rand"
"net"
"net/url"
"regexp"
"runtime"
"strconv"
"strings"
"sync"
"time"
"github.com/nats-io/go-nats/util"
"github.com/nats-io/nuid"
)
// Default Constants
const (
Version = "1.3.1"
DefaultURL = "nats://localhost:4222"
DefaultPort = 4222
DefaultMaxReconnect = 60
DefaultReconnectWait = 2 * time.Second
DefaultTimeout = 2 * time.Second
DefaultPingInterval = 2 * time.Minute
DefaultMaxPingOut = 2
DefaultMaxChanLen = 8192 // 8k
DefaultReconnectBufSize = 8 * 1024 * 1024 // 8MB
RequestChanLen = 8
LangString = "go"
)
// STALE_CONNECTION is for detection and proper handling of stale connections.
const STALE_CONNECTION = "stale connection"
... Later last night when we were running stress testing on another imsrv, which is using the same copy of the nats client, there was an other panic stack info
|
We will look into it for sure, but we do recommend using a formal release. If possible use the latest release found here: https://github.com/nats-io/go-nats/releases/tag/v1.5.0 |
@shengery These are really strange crashes. Would you mind showing how you create the NATS connection and give us as much information as possible as how you use it? Thanks! |
@kozlovic sure ,all the nats library is use in a type func (proxy *Proxy) Register(prefix, selfName string, port, httpPort int) (err error) {
//selfName meas the server name of application, ex: imsrv or pushsrv
proxy.selfSrvName = selfName
localIP, err := proxy.GetLocalAddr()
if err != nil {
log.Printf("proxy: register self fail, srvname:%s, error:%s\n", selfName, err.Error())
return
}
selfAddr := fmt.Sprintf("%s:%d", localIP, port)
//PeerKey is used when the same group of server, like pushsrv to pushsrv, it's a sync action, via Request method .
proxy.mqPeerKey = proxy.getPeerSubject(selfAddr)
//MsgKey is used for message route between different server groups such as pushsrv and imsrv
proxy.mqMsgKey = proxy.getMsgSubject(selfAddr)
httpAddr := ""
if httpPort != 0 {
httpAddr = fmt.Sprintf("%s:%d", localIP, httpPort)
}
proxy.selfAddr = selfAddr
proxy.selfKey = path.Clean(prefix + "/" + selfName + "/" + proxy.selfAddr)
d := etcdData{
Addr: selfAddr,
Status: 0,
HTTPAddr: httpAddr,
}
bts, _ := json.Marshal(&d)
err = proxy.etcdClient.Put(proxy.selfKey, string(bts))
if err != nil {
log.Printf("proxy: put etcd fail, key:%s, error:%s\n", proxy.selfKey, err.Error())
err = terror.New(terror.ErrCodeNetwork, err.Error())
return
}
enableMQ := false
//create the connection here !
if proxy.natsOpt.natsURL != "" {
if proxy.natsCli, err = nats.Connect(proxy.natsOpt.natsURL); err != nil {
log.Printf("proxy: fail to connect mq(nats):%s\n", err.Error())
err = terror.New(terror.ErrCodeNetwork, err.Error())
return
}
enableMQ = true
}
if enableMQ && proxy.natsOpt.enablePeer {
if proxy.natsOpt.peerOnMsg == nil {
log.Println("proxy: mq address set but peerOnMsg is nil")
err = terror.New(terror.ErrCodeBadParam, "peerOnMsg is nil")
return
}
onMsg := proxy.natsOpt.peerOnMsg
cli := proxy.natsCli
cli.Subscribe(proxy.mqPeerKey, func(m *nats.Msg) {
data := onMsg(m.Data)
cli.Publish(m.Reply, data)
cli.Flush()
})
if err = proxy.ScanDest(prefix, selfName, 100, LoadBalanceHash, false); err != nil {
log.Printf("proxy : scan dest %s error:%s\n", selfName, err.Error())
return
}
}
if enableMQ && proxy.natsOpt.onMsg != nil {
onmsg := proxy.natsOpt.onMsg
proxy.natsCli.Subscribe(proxy.mqMsgKey, func(m *nats.Msg) {
data := m.Data
from := m.Reply
onmsg(data, from)
})
go func() {
// the flusher calls nats.Flush() periodically
proxy.flusher()
}()
}
return
} The OnMsg handler receive the msg and it's reply key , from which we can know where the message from(the value is mqMsgKey or mqPeerKey) and how we can reply to it. |
Thanks. Need to dig more into that, but still very confused with the stack. For instance, from your original report, we see that:
Yet, you can see that from the 1.3.0 tag, this line has nothing to do with the report: The line should point to something like this. |
hi @kozlovic , is there any further investigation ? |
Sorry for the delay, I was on vacation. I may refactor a bit the code around that for a fix to #368, but not sure what the problem for this issue is since the rare panic due to waitgroup was already fixed (or at least an attempt to) in 1.3.0. |
@kozlovic thanks for your reply |
Closing for now.. we will re-open if you have some updates. |
Versions of gnatsd and affected client libraries used:
OS/Container environment:
CentOS 6.8 64Bit bare VM
Steps or code to reproduce the issue:
gnatsd is run with single node
The go *nats.Conn is used in multiple goroutines, each of which may publish message and pushsrv subscribe 6 topics with the same connection.
When there are a lot of concurrent users login into the service , the pushsrv panic with the below message :
panic: sync: negative WaitGroup counter
goroutine 9291 [running]:
sync.(*WaitGroup).Add(0xc42121a530, 0xffffffffffffffff)
/usr/local/go/src/sync/waitgroup.go:75 +0x134
sync.(*WaitGroup).Done(0xc42121a530)
/usr/local/go/src/sync/waitgroup.go:100 +0x34
pushsrv/vendor/github.com/nats-io/go-nats.(*Conn).readLoop(0xc4200b6a00, 0xc42121a530)
/data/home/go_workspace/src/pushsrv/vendor/github.com/nats-io/go-nats/nats.go:1554 +0x238
created by pushsrv/vendor/github.com/nats-io/go-nats.(*Conn).spinUpGoRoutines
/data/home/go_workspace/src/pushsrv/vendor/github.com/nats-io/go-nats/nats.go:979 +0xb1
The text was updated successfully, but these errors were encountered: