Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#1436 triggers assertion failure in libp2p-kad #1437

Closed
twittner opened this issue Feb 6, 2020 · 6 comments · Fixed by #1449
Closed

#1436 triggers assertion failure in libp2p-kad #1437

twittner opened this issue Feb 6, 2020 · 6 comments · Fixed by #1449
Assignees
Labels

Comments

@twittner
Copy link
Contributor

twittner commented Feb 6, 2020

Testing #1436 with substrate causes an assertion failure in libp2p-kad (when compiled in debug mode):

====================

Version: 2.0.0-faf608eee-x86_64-linux-gnu

stack backtrace:
   0: sp_panic_handler::panic_hook
             at primitives/panic-handler/src/lib.rs:148
   1: sp_panic_handler::set::{{closure}}
             at primitives/panic-handler/src/lib.rs:58
   2: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:475
   3: std::panicking::begin_panic
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panicking.rs:404
   4: libp2p_kad::behaviour::Kademlia<TSubstream,TStore>::connection_updated
             at <::std::macros::panic macros>:3
   5: <libp2p_kad::behaviour::Kademlia<TSubstream,TStore> as libp2p_swarm::behaviour::NetworkBehaviour>::inject_connected
             at /home/user/workspace/libp2p/rust-libp2p/protocols/kad/src/behaviour.rs:1070
   6: <sc_network::discovery::DiscoveryBehaviour<TSubstream> as libp2p_swarm::behaviour::NetworkBehaviour>::inject_connected
             at client/network/src/discovery.rs:257
   7: <sc_network::behaviour::Behaviour<B,S,H> as libp2p_swarm::behaviour::NetworkBehaviour>::inject_connected
             at client/network/src/behaviour.rs:35
   8: libp2p_swarm::ExpandedSwarm<TTransport,TBehaviour,TInEvent,TOutEvent,THandler,THandlerErr,TConnInfo>::poll_next_event
             at /home/user/workspace/libp2p/rust-libp2p/swarm/src/lib.rs:380
   9: libp2p_swarm::ExpandedSwarm<TTransport,TBehaviour,TInEvent,TOutEvent,THandler,THandlerErr,TConnInfo>::next_event::{{closure}}::{{closure}}
             at /home/user/workspace/libp2p/rust-libp2p/swarm/src/lib.rs:340
  10: <futures_util::future::poll_fn::PollFn<F> as core::future::future::Future>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.1/src/future/poll_fn.rs:54
  11: std::future::poll_with_tls_context
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/future.rs:99
  12: libp2p_swarm::ExpandedSwarm<TTransport,TBehaviour,TInEvent,TOutEvent,THandler,THandlerErr,TConnInfo>::next_event::{{closure}}
             at /home/user/workspace/libp2p/rust-libp2p/swarm/src/lib.rs:340
  13: <std::future::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/future.rs:43
  14: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/future/future.rs:119
  15: futures_util::future::future::FutureExt::poll_unpin
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.1/src/future/future/mod.rs:510
  16: <sc_network::service::NetworkWorker<B,S,H> as core::future::future::Future>::poll
             at client/network/src/service.rs:796
  17: sc_service::build_network_future::{{closure}}
             at client/service/src/lib.rs:475
  18: <futures_util::future::poll_fn::PollFn<F> as core::future::future::Future>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.1/src/future/poll_fn.rs:54
  19: futures_util::future::future::FutureExt::poll_unpin
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.1/src/future/future/mod.rs:510
  20: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.1/src/future/select.rs:62
  21: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.1/src/future/future/map.rs:40
  22: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/future/future.rs:119
  23: <futures_diagnose::fut_with_diag::DiagnoseFuture<T> as core::future::future::Future>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-diagnose-1.0.0/src/fut_with_diag.rs:74
  24: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/future/future.rs:119
  25: tokio::task::core::Core<T>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/core.rs:128
  26: tokio::task::harness::Harness<T,S>::poll::{{closure}}::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/harness.rs:120
  27: core::ops::function::FnOnce::call_once
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/ops/function.rs:232
  28: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panic.rs:318
  29: std::panicking::try::do_call
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panicking.rs:292
  30: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:78
  31: std::panicking::try
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panicking.rs:270
  32: std::panic::catch_unwind
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panic.rs:394
  33: tokio::task::harness::Harness<T,S>::poll::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/harness.rs:101
  34: tokio::loom::std::causal_cell::CausalCell<T>::with_mut
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/loom/std/causal_cell.rs:41
  35: tokio::task::harness::Harness<T,S>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/harness.rs:100
  36: tokio::task::raw::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/raw.rs:162
  37: tokio::task::raw::RawTask::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/raw.rs:113
  38: tokio::task::Task<S>::run
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/mod.rs:381
  39: tokio::runtime::thread_pool::worker::GenerationGuard::run_task
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/worker.rs:448
  40: tokio::runtime::thread_pool::worker::GenerationGuard::process_available_work
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/worker.rs:306
  41: tokio::runtime::thread_pool::worker::GenerationGuard::run
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/worker.rs:271
  42: tokio::runtime::thread_pool::worker::Worker::run::{{closure}}::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/worker.rs:160
  43: std::thread::local::LocalKey<T>::try_with
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/thread/local.rs:262
  44: std::thread::local::LocalKey<T>::with
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/thread/local.rs:239
  45: tokio::runtime::thread_pool::worker::Worker::run::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/worker.rs:136
  46: tokio::runtime::thread_pool::current::set::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/current.rs:47
  47: std::thread::local::LocalKey<T>::try_with
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/thread/local.rs:262
  48: std::thread::local::LocalKey<T>::with
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/thread/local.rs:239
  49: tokio::runtime::thread_pool::current::set
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/current.rs:29
  50: tokio::runtime::thread_pool::worker::Worker::run
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/worker.rs:132
  51: tokio::runtime::thread_pool::Workers::spawn::{{closure}}::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/thread_pool/mod.rs:113
  52: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/blocking/task.rs:30
  53: tokio::task::core::Core<T>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/core.rs:128
  54: tokio::task::harness::Harness<T,S>::poll::{{closure}}::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/harness.rs:120
  55: core::ops::function::FnOnce::call_once
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/ops/function.rs:232
  56: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panic.rs:318
  57: std::panicking::try::do_call
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panicking.rs:292
  58: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:78
  59: std::panicking::try
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panicking.rs:270
  60: std::panic::catch_unwind
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panic.rs:394
  61: tokio::task::harness::Harness<T,S>::poll::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/harness.rs:101
  62: tokio::loom::std::causal_cell::CausalCell<T>::with_mut
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/loom/std/causal_cell.rs:41
  63: tokio::task::harness::Harness<T,S>::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/harness.rs:100
  64: tokio::task::raw::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/raw.rs:162
  65: tokio::task::raw::RawTask::poll
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/raw.rs:113
  66: tokio::task::Task<S>::run
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/task/mod.rs:381
  67: tokio::runtime::blocking::pool::run_task
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/blocking/pool.rs:290
  68: tokio::runtime::blocking::pool::Inner::run
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/blocking/pool.rs:206
  69: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/blocking/pool.rs:186
  70: tokio::runtime::context::enter
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/context.rs:72
  71: tokio::runtime::handle::Handle::enter
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/handle.rs:34
  72: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /home/user/.cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-0.2.11/src/runtime/blocking/pool.rs:185
  73: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/sys_common/backtrace.rs:136
  74: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/thread/mod.rs:469
  75: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panic.rs:318
  76: std::panicking::try::do_call
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panicking.rs:292
  77: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:78
  78: std::panicking::try
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panicking.rs:270
  79: std::panic::catch_unwind
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/panic.rs:394
  80: std::thread::Builder::spawn_unchecked::{{closure}}
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libstd/thread/mod.rs:468
  81: core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/libcore/ops/function.rs:232
  82: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/liballoc/boxed.rs:1022
  83: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
             at /rustc/5e1a799842ba6ed4a57e91f7ab9435947482f7d8/src/liballoc/boxed.rs:1022
      std::sys_common::thread::start_thread
             at src/libstd/sys_common/thread.rs:13
      std::sys::unix::thread::Thread::new::thread_start
             at src/libstd/sys/unix/thread.rs:80
  84: start_thread
  85: clone


Thread 'main-tokio-' panicked at 'assertion failed: !self.connected_peers.contains(disconnected.preimage())', /home/user/workspace/libp2p/rust-libp2p/protocols/kad/src/behaviour.rs:647
@romanb
Copy link
Contributor

romanb commented Feb 10, 2020

Is there an easy way to reproduce this, i.e. does it occur pretty much every time #1436 is used with substrate in debug mode?

@twittner
Copy link
Contributor Author

Yes, it did happen to me every single time with #1436 after some minutes.

@romanb
Copy link
Contributor

romanb commented Feb 11, 2020

I managed to reproduce it three times. The cause seems to be outside of libp2p-kad and in particular related to the story with inline vs hashed peer IDs, i.e #555, #1230, #1237.

The summary is that libp2p-kad relies on the property

(peer_id_1 == peer_id_2) iff (peer_id_1.as_bytes() == peer_id_2.as_bytes())

because the Kademlia keyspace is essentially based upon SHA256(x.as_bytes()), where x is a peer ID, a multihash or anything else that is AsRef<[u8]>. This property no longer holds since #1413 due to the implementation of PartialEq. Now a single peer ID can have two IDs in the Kademlia keyspace and thus self.connected_peers which has peer IDs as keys and the k-buckets, which have Kademlia keys can appear to be out of sync w.r.t. the connection status of a peer. Specifically, since there can be two bucket entries for the same peer ID, one of them can be considered still disconnected and returned for a connection check when the bucket is full and another entry is inserted. However, because the preimage peer ID of that kbucket key compares equal to a peer ID that is in self.connected_peers (whose connected entry is a different entry in the k-buckets), the debug assertion is triggered about the mismatch.

Troubleshooting this can be quite confusing because peer IDs that appear visually different as per their Debug output can compare equal as per PartialEq.

I don't know yet how best to resolve this, but thought it makes sense to already share my findings here.

@romanb
Copy link
Contributor

romanb commented Feb 11, 2020

So this is most definitely unrelated to #1436 and thus needn't block these changes, as I initially feared.

@romanb
Copy link
Contributor

romanb commented Feb 11, 2020

I guess #1444 will temporarily address this issue as well, though not long-term. As far as I can currently tell, libp2p-kad needs some equal encoding of equal peer IDs to serve as the basis for integrating them into the Kademlia keyspace properly, because a single peer being represented multiple times in the Kademlia topology does not seem desirable. Maybe a PeerId could provide some canonical encoding other than what is offered by as_bytes() which guarantees that the encoding is the same for equal (as per PartialEq) peer IDs. That implementation would essentially be an encoding equivalent that always agrees with PartialEq, i.e. one that straightens out the differences between identity hashing and sha256.

@tomaka
Copy link
Member

tomaka commented Feb 11, 2020

Maybe a PeerId could provide some canonical encoding other than what is offered by as_bytes() which guarantees that the encoding is the same for equal (as per PartialEq) peer IDs. That implementation would essentially be an encoding equivalent that always agrees with PartialEq, i.e. one that straightens out the differences between identity hashing and sha256.

We could simply use the SHA256 hash all the time for comparison.
And for efficiency reasons, store that SHA256 within the PeerId struct even if it uses identity hashing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants