-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Receptor does not handle exceptions properly when starting the server #133
Comments
The duplicate |
Unfortunately, this issue isn't solved in the latest version of receptor. If I start two receptor processes and tell them to bind on the same socket address (by default: 8888), then one will bind, and the other will emit an error about being unable to bind but continue running. $ poetry run receptor --data-dir="$(mktemp --directory)" node $ poetry run receptor --data-dir="$(mktemp --directory)" node
ERROR 2020-03-05 13:57:20,800 controller [Errno 98] error while attempting to bind on address ('0.0.0.0', 8888): address already in use
Traceback (most recent call last):
File "/home/ichimonji10/code/receptor/receptor/controller.py", line 46, in exit_on_exceptions_in
await task
File "/usr/lib/python3.8/asyncio/streams.py", line 94, in start_server
return await loop.create_server(factory, host, port, **kwds)
File "/usr/lib/python3.8/asyncio/base_events.py", line 1459, in create_server
raise OSError(err.errno, 'error while attempting '
OSError: [Errno 98] error while attempting to bind on address ('0.0.0.0', 8888): address already in use ...unless this behavior is intentional? That doesn't seem to be the case, though. From #149:
|
I am experiencing the same as @Ichimonji10 is experiencing. One thing to add is that I tried to stop both nodes and got the following stack trace:
Edit: I've submitted ansible/receptor#154 to fix that. |
If I start two receptor nodes and ask them to bind to the same port, then the second one will exit: $ poetry run receptor --data-dir="$(mktemp --directory)" node
ERROR 2020-03-16 14:31:46,753 controller [Errno 98] error while attempting to bind on address ('0.0.0.0', 8888): address already in use
Traceback (most recent call last):
File "/home/ichimonji10/code/receptor/receptor/controller.py", line 44, in exit_on_exceptions_in
await task
File "/usr/lib/python3.8/asyncio/streams.py", line 94, in start_server
return await loop.create_server(factory, host, port, **kwds)
File "/usr/lib/python3.8/asyncio/base_events.py", line 1463, in create_server
raise OSError(err.errno, 'error while attempting '
OSError: [Errno 98] error while attempting to bind on address ('0.0.0.0', 8888): address already in use However, the receptor process which bails exits with a status code of zero. That seems like a bug. Would you like to open a new issue for incorrect exit status codes, or does this issue suffice? |
I agree it should set the exit code. I have no preference whether we track this on the current issue or a new issue. |
This issue seems to focus on the issue of "receptor should bail but doesn't," which is fixed, and which I'm therefore closing. #180 tracks "receptor is bailing with the wrong status code." |
Receptor is ignoring any exception raised when starting the server and continue if everything was alright. It should check if it could start the server before trying to do anything else.
To reproduce this consider that you got a controller running on port
9999
by running the following command:Then try to start a node and also make it listen on port
9999
:So it tried to bind on a address that was already in use but never checked that it failed to connect and continued as if everything was fine. When the above happen the following was shown on the controller logs:
So the node-a connected to the controller and it was considered to be a node of the network no problem since it connected to the controller but it could not serve since the address was already in use.
On the scenario above you can actually send work to
node-a
but you can't have nodes connected to it since it failed to start the listener server. In other words it handler the--peer
option no problem but failed to properly handle the--listen
.With that said, should receptor fail in case any of the expected behavior fails to start? Maybe failing only if
--listen
fails is the way to go since--peer
could point to a--peer
that is temporarily unavailable and will come back at some point. Not sure here but getting receptor running even though it had initialization issues may bring some headache in the future, specially when trying to debug things.The text was updated successfully, but these errors were encountered: