-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nsqd: retry sub to ephemeral topic/channel which is Exiting #891
Conversation
nsqd/protocol_v2.go
Outdated
topic := p.ctx.nsqd.GetTopic(topicName) | ||
channel := topic.GetChannel(channelName) | ||
channel.AddClient(client.ID, client) | ||
var channel *Channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment in here explaining why we need to loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment added (and insta-squashed because it's trivial)
@slayercat can you confirm this fixes the issue? |
quick hacky fix, instead of "proper" locking
Cool, just waiting for confirmation — have you tested this yourself @ploxiln? |
nope, I'm lazy. will do in the next day or so and report back |
I managed to reproduce the problem and show that this change fixes it (or makes it significantly harder to hit, anyway). First, start nsqd: Then, start publishing every half-second: while true ; do
curl -d 'whateva-really' 'http://localhost:4151/pub?topic=t1%23ephemeral'
echo " =$?"
sleep 0.5
done Then, run the test script #!/bin/sh
set -u
COUNT=20
LAST_PID=
start_client() {
python test_ephemeral.py &
LAST_PID=$!
}
start_client
sleep 1
for I in $(seq $COUNT); do
KILL_PID=$LAST_PID
start_client
sleep 0.08
kill $KILL_PID
wait $KILL_PID
sleep 1
done
kill $LAST_PID
wait $LAST_PID Which uses this test client import nsq
got_msg = 0
def handler(msg):
global got_msg
got_msg += 1
return True
reader = nsq.Reader("t1#ephemeral", "c1#ephemeral",
handler, nsqd_tcp_addresses=["127.0.0.1:4150"])
nsq.run()
if got_msg:
print("got %d messages" % got_msg)
else:
print("!! DID NOT GET MESSAGE") Notice that the test script has a "sleep 0.08", you may need to adjust that, or possibly even move the kill before the start, depending on your processor speed and such. With all that, I get a few "DID NOT GET MESSAGE" from master, and none from this pull-request, on each of several test script runs. |
note how it's a bit tricky to publish to an ephemeral topic over http, the |
It fixes the issue. thanks, @ploxiln |
Signed-off-by: lilinzhe <[email protected]>
Signed-off-by: lilinzhe <[email protected]>
Signed-off-by: lilinzhe <[email protected]>
quick hacky fix, instead of "proper" locking
alternative to #886 for fixing #883
not yet testedcc @slayercat