-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UnboundLocalError after receiving ZMQ corrupted message #2260
Comments
Please only file bugs for latest version, 2.13 at the moment. Is it still an issue there? |
That part of the code is the same |
@solowalker27 Maybe you have a minute to look at this? |
I've done some analysis of problematic code and for me it seems it does not work at all the way it suppose to.
Case 1: Case 2: Additionally, Locust doesn't have any protection against any external communication on bind port because of this code. I've made simple test -> 1 master + 1 worker and send simple get to the master bind port, then whole ZMQ communication collapsed. In my opinion this code should be replaced with simple warning in exception handling and simply ignore it. Worker's reconnection should be made only when heartbeat is missing. What do you think? |
I openen BUG for pyzmq so that address is not included in RPCReceiveError. If they fix it we could think about more specific error handling) |
Oh yea, I see now, that code looks very strange. Tbh, I'm not sure PR #2120 was a good idea at all. |
Sorry for the bug. Perhaps it shouldn't try to reference the client_id? Or maybe still try if it's not None but give a warning that it might not be the right address. At any rate, the code worked for me in solving the issues I had. Perhaps a simpler try/catch with just a warning might work, too, without trying to reset any connections? I'm not going to have time to revisit this for a long while, though, so someone else would need to make a new PR to fix this better, if desired. |
I would probably have some space to fix it in near future. |
The new/current implementation (as I understand it now) doesnt actually know what connection to reset - it just resets the last one it got a message from, so I’m not sure the current way is salvagable. |
I would just log warning and simply ignore such message - when ZMQ implementation changes and provides address data we can add reconnect mechanism. What do you think? |
Tbh, I dont know whether the old implementation was better or worse. Waiting for a change in ZMQ implementation that can tell us which connection to reset might take a long time. I think the main change in 2120 (apart from attempting to reset worker connections) is that it no longer resets the rpc server on errors, which may be a good thing or a bad thing (I am not very familiar with this part of the code). Maybe @delulu (who made some changes in this area before) or @heyman has an opinion? I tried reverting the old PR and it was easy enough. I think that is the right thing to do here, but I'm not sure. |
From my perspective, when you use Locust inside company with security restrictions and periodical scans for open ports, resetting whole RPC server is not a good think. This is what we are facing now and our results during such scan are broken because of bouncing RPC connection. That's why I'm pushing to have simple warning as an exception handling or at least have possibility to control this behavior via some king of config entry (like in case o HEARTBET_INTERVAL) |
But if we dont reset the rpc server, wont the connection just stay broken? This was introduced in #1280 What is your security scan actually doing? Just closing ports it doesnt like? |
Catching RPCReceiveError doesn't mean connection is broken - it means Server couldn't parse incoming message so imho resetting connection is not needed here. As posted in first message you can send simple GET to the server to trigger such exception. Frankly, I don't know what such scans are actually doing - we're investigating it at the moment. |
Oh, now I get it - its just a port scan, I thought it was doing something more intrusive. Yea, restarting the server because of that makes no sense. |
We can do a proper fix later. For now just changing it to a warning is good, looking forward to your PR. |
Thanks for quick release! |
PR 2120 introduced problem with wrong handling of ZMQ corrupted message what leads to:
UnboundLocalError: local variable 'client_id' referenced before assignment.
Locust: 2.10.1
Python: 3.9
OS: CentOS
The text was updated successfully, but these errors were encountered: