Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions when passing in a BMv2 JSON config #1243

Closed
matthewtlam opened this issue Apr 15, 2024 · 12 comments
Closed

Race conditions when passing in a BMv2 JSON config #1243

matthewtlam opened this issue Apr 15, 2024 · 12 comments

Comments

@matthewtlam
Copy link
Contributor

matthewtlam commented Apr 15, 2024

After using the Tsan tool on our tests with a simple grpc service with a BMV2 JSON config passed in with the command line arguments, I noticed that it resulted in some flaky tests with race conditions. The error follows as below:

ThreadSanitizer: data race [p4lang_behavioral_model/src/bm_sim/context.cpp:867:13] in bm::Context::do_swap()

I was wondering if this is the expected behavior?

@smolkaj @jonathan-dilorenzo

@smolkaj
Copy link
Member

smolkaj commented Apr 15, 2024

Pasting the line TSAN is complaining about here for convenience:

// src/bm_sim/context.cpp
int
Context::do_swap() {
  if (!swap_ordered) return 1;
  boost::unique_lock<boost::shared_mutex> lock(request_mutex);
  p4objects = p4objects_rt;  // <------------------ TSAN error
  swap_ordered = false;
  send_swap_status_notification(SwapStatus::SWAP_COMPLETED);
  return 0;
}

@matthewtlam
Copy link
Contributor Author

As a side note, running the TSAN tool while removing the BMv2 Config JSON in the command line arguments (and using the --no-p4 instead) provides no race conditions

@smolkaj
Copy link
Member

smolkaj commented Apr 15, 2024

I found the following context in switch.h:

//! Both switch classes support live swapping of P4-JSON configurations. To
//! enable it you need to provide the correct flag to the constructor (see
//! bm::SwitchWContexts::SwitchWContexts()). Swaps are ordered through the
//! runtime interfaces. We ensure that during the actual swap operation
//! (bm::SwitchWContexts::do_swap() method), there is no Packet instance
//! inflight, which we achieve using the process_packet_mutex mutex). The final
//! step of the swap is to call bm::SwitchWContexts::swap_notify_(), which
//! targets can override if they need to perform some operations as part of the
//! swap. Targets are guaranteed that no Packet instances exist as that
//! time. Note that swapping configurations may invalidate pointers that you are
//! still using, and it is your responsibility to refresh them.

@smolkaj
Copy link
Member

smolkaj commented Apr 16, 2024

Just an educated guess, but on the surface it seems like this may be a race between the configuration through the command line and the configuration through P4Runtime? If so, this could be fixed by not starting the P4Runtime server until after the initial configuration.

@smolkaj
Copy link
Member

smolkaj commented Apr 16, 2024

Here is the complete ThreadSanitizer report:

==================
WARNING: ThreadSanitizer: data race (pid=3936)
  Write of size 8 at 0x724c00015c50 by thread T62 (mutexes: write M0):
    #0 swap<bm::P4Objects *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__utility/swap.h:44:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 swap third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/shared_ptr.h:697:5 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998)
    #2 operator= third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/shared_ptr.h:655:21 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998)
    #3 bm::Context::do_swap() third_party/p4lang_behavioral_model/src/bm_sim/context.cpp:867:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x104b998)
    #4 bm::SwitchWContexts::do_swap() third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:471:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1123127) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #5 bm::SwitchWContexts::swap_configs() third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:368:17 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1122f02) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #6 _pi_update_device_end third_party/p4lang_behavioral_model/PI/src/pi_imp.cpp:112:38 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xfc6314) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #7 pi_update_device_end third_party/p4lang_PI/src/pi.c:221:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xfb9449) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #8 pi::fe::proto::DeviceMgrImp::pipeline_config_set(p4::v1::SetForwardingPipelineConfigRequest_Action, p4::v1::ForwardingPipelineConfig const&) third_party/p4lang_PI/proto/frontend/src/device_mgr.cpp:617:19 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xf35667) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #9 pi::fe::proto::DeviceMgr::pipeline_config_set(p4::v1::SetForwardingPipelineConfigRequest_Action, p4::v1::ForwardingPipelineConfig const&) third_party/p4lang_PI/proto/frontend/src/device_mgr.cpp:3120:16 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xf34d39) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #10 pi::server::(anonymous namespace)::P4RuntimeServiceImpl::SetForwardingPipelineConfig(grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*) third_party/p4lang_PI/proto/server/pi_server.cpp:459:31 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xf2e51e) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #11 grpc::Status std::__tsan::__function::__policy_invoker<grpc::Status (p4::v1::grpc::P4Runtime::Service*, grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*)>::__call_impl<std::__tsan::__function::__default_alloc_func<p4::v1::grpc::P4Runtime::Service::Service()::$_2, grpc::Status (p4::v1::grpc::P4Runtime::Service*, grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*)>>(std::__tsan::__function::__policy_storage const*, p4::v1::grpc::P4Runtime::Service*, grpc::ServerContext*, p4::v1::SetForwardingPipelineConfigRequest const*, p4::v1::SetForwardingPipelineConfigResponse*) p4runtime.grpc.pb.cc (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11c5743) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #12 grpc::internal::RpcMethodHandler<p4::v1::grpc::P4Runtime::Service, p4::v1::SetForwardingPipelineConfigRequest, p4::v1::SetForwardingPipelineConfigResponse, proto2::MessageLite, proto2::MessageLite>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&) <null> (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11c59d8) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #13 grpc::Server::SyncRequest::ContinueRunAfterInterception() third_party/grpc/src/cpp/server/server_cc.cc:471:14 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f733ae) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #14 grpc::Server::SyncRequest::Run(std::__tsan::shared_ptr<grpc::Server::GlobalCallbacks> const&, bool) third_party/grpc/src/cpp/server/server_cc.cc:459:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f73052) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #15 grpc::Server::SyncRequestThreadManager::DoWork(void*, bool, bool) third_party/grpc/src/cpp/server/server_cc.cc:830:15 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f7287a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #16 grpc::ThreadManager::MainWorkLoop() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:214:9 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f8847f) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #17 Run third_party/grpc/src/cpp/thread_manager/thread_manager.cc:48:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #18 operator() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:69 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #19 grpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #20 (anonymous namespace)::ThreadBodyWithCleanup(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*) third_party/grpc/google_specific/src/core/support/thd_google.cc:117:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8291) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #21 __invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #22 invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #23 operator()<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/bind_front.h:36:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #24 operator()<void> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/perfect_forward.h:77:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #25 __invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #26 invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #27 InvokeR<void, std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *>, void> third_party/absl/functional/internal/any_invocable.h:132:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #28 void absl::internal_any_invocable::RemoteInvoker<false, void, std::__tsan::__bind_front_t<void (*)(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*), void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*>&&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:368:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #29 operator() third_party/absl/functional/internal/any_invocable.h:876:1 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #30 ClosureThread::Run() thread/thread.h:464:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644)
    #31 Thread::ThreadBody(void*) thread/thread.cc:1354:16 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d96d20) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Previous read of size 8 at 0x724c00015c50 by main thread:
    #0 operator-> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/shared_ptr.h:727:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 field_exists third_party/p4lang_behavioral_model/include/bm/bm_sim/context.h:453:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #2 field_exists third_party/p4lang_behavioral_model/include/bm/bm_sim/switch.h:165:32 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #3 field_exists third_party/p4lang_behavioral_model/include/bm/bm_sim/switch.h:984:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #4 SimpleSwitch::check_queueing_metadata() third_party/p4lang_behavioral_model/targets/simple_switch/simple_switch.cpp:442:26 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef3056)
    #5 SimpleSwitch::start_and_return_() third_party/p4lang_behavioral_model/targets/simple_switch/simple_switch.cpp:275:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef2c84) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #6 bm::SwitchWContexts::start_and_return() third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:87:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11209e3) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #7 sswitch_grpc::SimpleSwitchGrpcRunner::init_and_start(bm::OptionsParser const&) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:670:18 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xede378) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #8 main third_party/p4lang_behavioral_model/targets/simple_switch_grpc/main.cpp:241:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedb22a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Location is heap block of size 424 at 0x724c00015c40 allocated by main thread:
    #0 operator new(unsigned long) third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_new_delete.cpp:64:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xed857c) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 __libcpp_operator_new<unsigned long> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/new:271:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #2 __libcpp_allocate third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/new:295:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #3 allocate third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/allocator.h:125:32 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #4 __allocate_at_least<std::__tsan::allocator<bm::Context> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__memory/allocate_at_least.h:41:19 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #5 __vallocate third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/vector:742:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #6 vector third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/vector:1139:5 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #7 bm::SwitchWContexts::SwitchWContexts(unsigned long, bool) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:58:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x111fec5)
    #8 bm::Switch::Switch(bool) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:578:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x11242f8) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #9 SimpleSwitch::SimpleSwitch(bool, unsigned int, unsigned long) third_party/p4lang_behavioral_model/targets/simple_switch/simple_switch.cpp:203:5 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xef1380) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #10 sswitch_grpc::SimpleSwitchGrpcRunner::SimpleSwitchGrpcRunner(bool, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::shared_ptr<sswitch_grpc::SSLOptions>, unsigned long) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:505:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedd8ac) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #11 sswitch_grpc::SimpleSwitchGrpcRunner::get_instance(bool, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>>, unsigned int, std::__tsan::shared_ptr<sswitch_grpc::SSLOptions>, unsigned long) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.h:67:35 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedbeb3) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #12 main third_party/p4lang_behavioral_model/targets/simple_switch_grpc/main.cpp:233:18 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedb1f4) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Mutex M0 (0x726400015b98) created at:
    #0 pthread_mutex_lock third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1341:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xe582fd) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 std::__tsan::__libcpp_mutex_lock(pthread_mutex_t*) third_party/llvm/llvm-project/libcxx/include/__thread/support/pthread.h:96:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x421ba49) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #2 std::__tsan::mutex::lock() third_party/llvm/llvm-project/libcxx/src/mutex.cpp:29:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x421ba49)
    #3 unique_lock third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__mutex/unique_lock.h:41:11 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x112117f) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #4 bm::SwitchWContexts::init_objects(std::__tsan::basic_string<char, std::__tsan::char_traits<char>, std::__tsan::allocator<char>> const&, unsigned long, std::__tsan::shared_ptr<bm::TransportIface>) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:195:34 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x112117f)
    #5 bm::SwitchWContexts::init_from_options_parser(bm::OptionsParser const&, std::__tsan::shared_ptr<bm::TransportIface>, std::__tsan::unique_ptr<bm::DevMgrIface, std::__tsan::default_delete<bm::DevMgrIface>>) third_party/p4lang_behavioral_model/src/bm_sim/switch.cpp:263:14 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1121760) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #6 sswitch_grpc::SimpleSwitchGrpcRunner::init_and_start(bm::OptionsParser const&) third_party/p4lang_behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:561:31 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xeddd8a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #7 main third_party/p4lang_behavioral_model/targets/simple_switch_grpc/main.cpp:241:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xedb22a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

  Thread T62 'grpcpp_sync_server' (tid=4043, running) created by thread T54 at:
    #0 pthread_create third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1022:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0xe56739) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #1 Thread::CreatePthread(pthread_attr_t&) thread/thread.cc:502:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d9660d) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #2 Thread::Start() thread/thread.cc:678:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d9739a) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #3 (anonymous namespace)::ThreadInternalsGoogleThread::Start() third_party/grpc/google_specific/src/core/support/thd_google.cc:106:36 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af83c3) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #4 Start third_party/grpc/src/core/lib/gprpp/thd.h:152:14 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f885e8) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #5 Start third_party/grpc/src/cpp/thread_manager/thread_manager.h:124:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f885e8)
    #6 grpc::ThreadManager::MainWorkLoop() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:186:23 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f885e8)
    #7 Run third_party/grpc/src/cpp/thread_manager/thread_manager.cc:48:13 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #8 operator() third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:69 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #9 grpc::ThreadManager::WorkerThread::WorkerThread(grpc::ThreadManager*)::$_0::__invoke(void*) third_party/grpc/src/cpp/thread_manager/thread_manager.cc:40:7 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x1f89231)
    #10 (anonymous namespace)::ThreadBodyWithCleanup(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*) third_party/grpc/google_specific/src/core/support/thd_google.cc:117:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8291) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #11 __invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #12 invoke<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #13 operator()<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/bind_front.h:36:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #14 operator()<void> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/perfect_forward.h:77:12 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #15 __invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__type_traits/invoke.h:150:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #16 invoke<std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *> > third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:28:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #17 InvokeR<void, std::__tsan::__bind_front_t<void (*)(void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *), void (*)(void *), void *, (anonymous namespace)::ThreadInternalsGoogleThread *>, void> third_party/absl/functional/internal/any_invocable.h:132:3 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #18 void absl::internal_any_invocable::RemoteInvoker<false, void, std::__tsan::__bind_front_t<void (*)(void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*), void (*)(void*), void*, (anonymous namespace)::ThreadInternalsGoogleThread*>&&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:368:10 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3af8468)
    #19 operator() third_party/absl/functional/internal/any_invocable.h:876:1 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644) (BuildId: d5ce5c58fe208b528b4cddb59015805a)
    #20 ClosureThread::Run() thread/thread.h:464:25 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x133a644)
    #21 Thread::ThreadBody(void*) thread/thread.cc:1354:16 (8e625be115533f7ede2ca241f73921059fdca1e02f696e3df5079051bcf7a941_02001134d160+0x3d96d20) (BuildId: d5ce5c58fe208b528b4cddb59015805a)

SUMMARY: ThreadSanitizer: data race third_party/p4lang_behavioral_model/src/bm_sim/context.cpp:867:13 in bm::Context::do_swap()
==================

@smolkaj
Copy link
Member

smolkaj commented Apr 16, 2024

Swapping the relative order of the calls to simple_switch->start_and_return() and PIGrpcServerRunV2 in SimpleSwitchGrpcRunner::init_and_start, such that the latter comes second, seems to do the trick. I suppose this makes it so that the P4Runtime server only comes up after the switch has been started and initialized.

@antoninbas
Copy link
Member

antoninbas commented Apr 16, 2024

Do you push the forwarding pipeline as soon as the P4Runtime server is started? It's strange that these 2 things happen concurrently.

I think it would be fine to swap these 2 function calls as you suggested. I don't see any potential drawback or issue.

What's interesting is that the function ultimately responsible for the issue (check_queueing_metadata) is not really relevant for simple_switch_grpc and P4Runtime:

SimpleSwitch::check_queueing_metadata() {
. It dates back to the days of a single static JSON pipeline.

^ I was wrong there. check_queueing_metadata is also called whenever there is a pipeline swap, which is the correct thing to do. Although in that context, the same race condition does not seem possible, as check_queueing_metadata is called from do_swap itself, and we hold a unique lock on process_packet_mutex for the entire duration.

I think the solution you propose (changing order of function calls) may not work in the end. I see that start_and_return waits for a P4 config to be available:

void
SwitchWContexts::start_and_return() {
{
std::unique_lock<std::mutex> config_lock(config_mutex);
if (!config_loaded && !enable_swap) {
Logger::get()->error(
"The switch was started with no P4 and config swap is disabled");
}
config_loaded_cv.wait(config_lock, [this]() { return config_loaded; });
}
start(); // DevMgr::start
start_and_return_();
// Starts any registered periodically-executing externs
PeriodicTaskList::get_instance().start();
}

If we start the switch without a P4 config and we wait for start_and_return to return before starting the P4Runtime server, I think we end up with a deadlock, as no config can be pushed until the server is started?

@smolkaj
Copy link
Member

smolkaj commented Apr 16, 2024

Do you push the forwarding pipeline as soon as the P4Runtime server is started?

Yes, pretty much. We're seeing this issue in a unit test which is pretty much straight line code without any delays.

If we start the switch without a P4 config and we wait for start_and_return to return before starting the P4Runtime server, I think we end up with a deadlock, as no config can be pushed until the server is started?

Ohhhh, good catch! Perhaps we can swap the calls only iff an initial config was specified?

@antoninbas
Copy link
Member

I would recommend just locking the config_mutex when calling start_and_return_:

start_and_return_();

Since targets are using this callback to validate the initial config, it makes sense to lock the mutex before calling it.

I hope I am not missing something, but this should work:

void
SwitchWContexts::start_and_return() {
  std::unique_lock<std::mutex> config_lock(config_mutex);
  if (!config_loaded && !enable_swap) {
    Logger::get()->error(
        "The switch was started with no P4 and config swap is disabled");
  }
  config_loaded_cv.wait(config_lock, [this]() { return config_loaded; });
  start();  // DevMgr::start
  start_and_return_();
  // Starts any registered periodically-executing externs
  PeriodicTaskList::get_instance().start();
}

After the condition variable's wait call returns, the lock will be held. Holding the lock while calling these other functions should be fine (I am not super familiar with PeriodicTaskList, but I would say it's probably fine). It's also possible to release the lock just before calling PeriodicTaskList::get_instance().start() if we want to be on the safe side.

@smolkaj
Copy link
Member

smolkaj commented Apr 16, 2024

Where does the initial config from the JSON file get set? I guess we would want to verify that at least that bit happens before the P4RT server is started, right?

@matthewtlam
Copy link
Contributor Author

matthewtlam commented Apr 16, 2024

I believe that the initial config from the JSON file gets set in the SwitchWContexts::init_from_options_parser function on line 263 in p4lang/behavioral_model/src/bm_sim/switch.cpp. That function gets called on p4lang/behavioral_model/targets/simple_switch_grpc/switch_runner.cpp:561, before starting PIGrpcServerRunV2 (P4RT server) on line 644.

@smolkaj
Copy link
Member

smolkaj commented Apr 17, 2024

Thanks for verifying that, that looks correct to me. And thanks @antoninbas for the suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants