-
Notifications
You must be signed in to change notification settings - Fork 978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Panic in libp2p-websocket
at 'SinkImpl::poll_ready called after error.'
#2598
Comments
Return an error instead. `quicksink` panics if you call a method after it returned an error once. Fixes libp2p#2598.
I believe the issue might be here: rust-libp2p/misc/multistream-select/src/negotiated.rs Lines 117 to 127 in f701b24
For context, this object is a stream onto which a protocol has been negotiated. Because the negotiation is reported in the API has having succeeded before the underlying stream has been flushed, reading on the stream also has the side effect of flushing the stream. However, if an error happens when flushing, we don't keep track of it, meaning that if the API user then calls I don't know if this is the cause of this bug, but it is certainly a bug. |
Actual, what happens, I think is: The yamux muxer reads from the connection, which calls Yamux detects the errors, and when yamux detects an error when reading, it switches to "shutdown mode" and tries to close the connection in a clean way. This closing calls |
I updated my parachain to Polkadot 0.9.26 dependencies and this issue is breaking node after around 100 blocks. |
In such case
@tomaka am I missing something? |
@mxinden Thanks for helping, here are the logs with 10 seconds before: https://we.tl/t-6TuATzVqV4 |
A suspicion I have: When closing the https://github.com/libp2p/rust-libp2p/blob/master/muxers/yamux/src/lib.rs#L132-L150 This could result in both the https://github.com/libp2p/rust-yamux/blob/master/src/connection.rs#L492-L504 The I suggest I prepare a patch for rust-yamux. To validate the above assumption, could one of you (e.g. @NZT48 @hrxi @jasl @doutv) then run a version of Polkadot with that patch? You would simply need to build Polkadot with a dependency override. |
> The `frame` future might be _ready_ with an `Error` from the underlying socket (i.e. here `libp2p-websocket`). Though given that the result of the `control_command` `Future` is handled first, `on_control_command` is called despite `frame` having returned an `Error`. `on_control_command` itself may try to write to the underlying socket, which will panic given that the socket returned an error earlier via the `frame` `Future`. Patch to validate suspicion in libp2p/rust-libp2p#2598.
@mxinden It works on my side also :) |
Running nearly a day with patched binaries, haven't crash yet |
Will try to prepare a proper patch for |
> The `frame` future might be _ready_ with an `Error` from the underlying socket (i.e. here `libp2p-websocket`). Though given that the result of the `control_command` `Future` is handled first, `on_control_command` is called despite `frame` having returned an `Error`. `on_control_command` itself may try to write to the underlying socket, which will panic given that the socket returned an error earlier via the `frame` `Future`. With this patch, once any of `next_stream_command`, `next_control_command` or `next_frame` `Future` is ready, the result is processed right away, without additionally polling the remaining pending `Future`s, thus surfacing errors as early as possible. See libp2p/rust-libp2p#2598 for details.
I prepared libp2p/rust-yamux#138, which in my eyes represents a valid bug fix, replacing libp2p/rust-yamux#137. libp2p/rust-yamux#138 still needs to pass review. Given that this is a very subtle issue, and given that libp2p/rust-yamux#138 differs from libp2p/rust-yamux#137, I would appreciate help testing libp2p/rust-yamux#138 as well. @NZT48 @hrxi @jasl @doutv @kpp in case one of you has more capacity to run tests, would you mind deploying libp2p/rust-yamux#138 on one of your test instances? |
no problem |
libp2p/rust-yamux#138 is reviewed and approved thanks to @elenaf9 and @thomaseizinger. @jasl once I get the green light from you, I will cut a release. |
Great! I can't wait BTW, could you help to backport the upgrade to polkadot-v0.9.27 ? |
Just to make sure, do I understand correctly that you ran libp2p/rust-yamux#138 and that you are not seeing any panics @jasl?
I can not, as I am no longer directly involved in the project. Maybe @kpp or @bkchr can help with the backport. |
For a night I don't see panic |
> The `frame` future might be _ready_ with an `Error` from the underlying socket (i.e. here `libp2p-websocket`). Though given that the result of the `control_command` `Future` is handled first, `on_control_command` is called despite `frame` having returned an `Error`. `on_control_command` itself may try to write to the underlying socket, which will panic given that the socket returned an error earlier via the `frame` `Future`. With this patch, once any of `next_stream_command`, `next_control_command` or `next_frame` `Future` is ready, the result is processed right away, without additionally polling the remaining pending `Future`s, thus surfacing errors as early as possible. See libp2p/rust-libp2p#2598 for details.
libp2p/rust-yamux#138 (comment) Thanks everyone for the help! I am closing here. Since this is a patch release, there is no need to update |
No need to backport this. Parachains can just run cargo update -p yamux to get the latest version of the crate. As this is nothing critical and doesn't happen that often there is no need for a full polkadot release. It can wait and go into the next polkadot release. |
Still the yamux dep in libp2p needs to be increased in order to force people to download the new version upon upgrade. |
I would expect downstream users to use tools like @dependabot and thus keeping their dependencies up to date. That said, I am happy to merge a pull request updating the |
There's no need to force people. People are expected to understand that they need to run |
> The `frame` future might be _ready_ with an `Error` from the underlying socket (i.e. here `libp2p-websocket`). Though given that the result of the `control_command` `Future` is handled first, `on_control_command` is called despite `frame` having returned an `Error`. `on_control_command` itself may try to write to the underlying socket, which will panic given that the socket returned an error earlier via the `frame` `Future`. With this patch, once any of `next_stream_command`, `next_control_command` or `next_frame` `Future` is ready, the result is processed right away, without additionally polling the remaining pending `Future`s, thus surfacing errors as early as possible. See libp2p/rust-libp2p#2598 for details.
Update the `yamux` dependency from version 0.10.1 to version 0.10.2 to pull a fix for a call to `SinkImpl::poll_ready` after error as described in [this rust-libp2p issue](libp2p/rust-libp2p#2598). This fixes #1319.
Update the `yamux` dependency from version 0.10.1 to version 0.10.2 to pull a fix for a call to `SinkImpl::poll_ready` after error as described in [this rust-libp2p issue](libp2p/rust-libp2p#2598). This fixes #1319.
I don't know how to reproduce the bug, but it seemed likely to be fixable without reproduction.
CC nimiq/core-rs-albatross#732
The text was updated successfully, but these errors were encountered: