Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Darwin tsan failure in chip-tool #27535

Closed
woody-apple opened this issue Jun 28, 2023 · 6 comments · Fixed by #29277
Closed

[Build] Darwin tsan failure in chip-tool #27535

woody-apple opened this issue Jun 28, 2023 · 6 comments · Fixed by #29277
Assignees
Milestone

Comments

@woody-apple
Copy link
Contributor

Build issue(s)

https://github.com/project-chip/connectedhomeip/actions/runs/5401576138/jobs/9811580025

ERROR   14:34:35.573 - APP  OUT  : [1687962875573] [27599:293692] [DIS] Mdns: OnRegisterRecord flags: 1
ERROR   14:34:35.573 - APP  OUT  : [1687962875573] [27599:293692] [DIS] Mdns: OnRegisterRecord flags: 0
ERROR   14:34:35.573 - APP  OUT  : [1687962875573] [27599:293692] [DIS] Mdns: OnRegisterRecord flags: 1
ERROR   14:34:35.573 - APP  OUT  : [1687962875573] [27599:293692] [DIS] Mdns: OnRegisterRecord flags: 0
ERROR   14:34:48.676 - TEST ERR  : ==================
ERROR   14:34:48.676 - TEST ERR  : WARNING: ThreadSanitizer: data race (pid=27621)
ERROR   14:34:48.676 - TEST ERR  :   Write of size 1 at 0x7b5c00000510 by main thread:
ERROR   14:34:48.676 - TEST ERR  :     #0 chip::Optional<unsigned long long>::ClearValue() <null>:246088793 (chip-tool:x86_64+0x1001b419f) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #1 void (anonymous namespace)::ResetOptionalArg<unsigned long long>(Argument const&) <null>:246088793 (chip-tool:x86_64+0x1011e1ab7) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #2 Command::ResetArguments() <null>:246088793 (chip-tool:x86_64+0x1011e03e8) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #3 CHIPCommand::Shutdown() <null>:246088793 (chip-tool:x86_64+0x1001871c5) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #4 CHIPCommand::Run() <null>:246088793 (chip-tool:x86_64+0x1011cc4a7) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #5 Command::RunAsInteractive(chip::Optional<char*> const&) <null>:246088793 (chip-tool:x86_64+0x1012050fd) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #6 Commands::RunCommand(int, char**, bool, chip::Optional<char*> const&) <null>:246088793 (chip-tool:x86_64+0x101202fe4) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #7 Commands::RunInteractive(char const*, chip::Optional<char*> const&) <null>:246088793 (chip-tool:x86_64+0x10120373a) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #8 InteractiveCommand::ParseCommand(char*, int*) <null>:246088793 (chip-tool:x86_64+0x10122f353) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #9 InteractiveServerCommand::OnWebSocketMessageReceived(char*) <null>:246088793 (chip-tool:x86_64+0x10122efb9) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #10 non-virtual thunk to InteractiveServerCommand::OnWebSocketMessageReceived(char*) <null>:246088793 (chip-tool:x86_64+0x10122f82c) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #11 WebSocketServer::OnWebSocketMessageReceived(char*) <null>:246088793 (chip-tool:x86_64+0x1016435c0) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.677 - TEST ERR  :     #12 (anonymous namespace)::OnWebSocketCallback(lws*, lws_callback_reasons, void*, void*, unsigned long) <null>:246088793 (chip-tool:x86_64+0x10164316b) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #13 user_callback_handle_rxflow <null>:246088793 (chip-tool:x86_64+0x10108c71f) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #14 lws_ws_frame_rest_is_payload <null>:246088793 (chip-tool:x86_64+0x1010b38a0) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #15 lws_parse_ws <null>:246088793 (chip-tool:x86_64+0x1010b2d01) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #16 lws_read_h1 <null>:246088793 (chip-tool:x86_64+0x10109b25f) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #17 rops_handle_POLLIN_ws <null>:246088793 (chip-tool:x86_64+0x1010afd71) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #18 lws_service_fd_tsi <null>:246088793 (chip-tool:x86_64+0x101085455) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #19 _lws_plat_service_forced_tsi <null>:246088793 (chip-tool:x86_64+0x1010b5d8a) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #20 _lws_plat_service_tsi <null>:246088793 (chip-tool:x86_64+0x1010b64fb) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #21 lws_plat_service <null>:246088793 (chip-tool:x86_64+0x1010b65e3) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #22 lws_service <null>:246088793 (chip-tool:x86_64+0x10108573e) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #23 WebSocketServer::Run(chip::Optional<unsigned short>, WebSocketServerDelegate*) <null>:246088793 (chip-tool:x86_64+0x101642c3d) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #24 InteractiveServerCommand::RunCommand() <null>:246088793 (chip-tool:x86_64+0x10122ec58) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #25 CHIPCommand::StartWaiting(std::__2::chrono::duration<unsigned int, std::__2::ratio<1l, 1000l> >) <null>:246088793 (chip-tool:x86_64+0x1011cc773) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #26 CHIPCommand::Run() <null>:246088793 (chip-tool:x86_64+0x1011cc458) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.678 - TEST ERR  :     #27 Commands::RunCommand(int, char**, bool, chip::Optional<char*> const&) <null>:246088793 (chip-tool:x86_64+0x101203261) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #28 Commands::Run(int, char**) <null>:246088793 (chip-tool:x86_64+0x101201a1f) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #29 main <null>:246088793 (chip-tool:x86_64+0x100181336) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  : 
ERROR   14:34:48.679 - TEST ERR  :   Previous read of size 1 at 0x7b5c00000510 by thread T1:
ERROR   14:34:48.679 - TEST ERR  :     #0 chip::Optional<unsigned long long>::HasValue() const <null>:246088793 (chip-tool:x86_64+0x1001fa109) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #1 CHIPCommand::GetIdentityNodeId(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >, unsigned long long*) <null>:246088793 (chip-tool:x86_64+0x1011cbfac) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #2 CHIPCommand::GetCommissioner(std::__2::basic_string<char, std::__2::char_traits<char>, std::__2::allocator<char> >) <null>:246088793 (chip-tool:x86_64+0x1011ce146) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #3 CHIPCommand::CurrentCommissioner() <null>:246088793 (chip-tool:x86_64+0x1011cdefa) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #4 WaitForCommissioneeCommand::RunCommand() <null>:246088793 (chip-tool:x86_64+0x101211a41) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #5 CHIPCommand::RunQueuedCommand(long) <null>:246088793 (chip-tool:x86_64+0x1011cf604) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #6 chip::DeviceLayer::Internal::GenericPlatformManagerImpl<chip::DeviceLayer::PlatformManagerImpl>::_DispatchEvent(chip::DeviceLayer::ChipDeviceEvent const*) <null>:246088793 (chip-tool:x86_64+0x101431e62) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #7 chip::DeviceLayer::PlatformManager::DispatchEvent(chip::DeviceLayer::ChipDeviceEvent const*) <null>:246088793 (chip-tool:x86_64+0x101432ae9) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.679 - TEST ERR  :     #8 invocation function for block in chip::DeviceLayer::PlatformManagerImpl::_PostEvent(chip::DeviceLayer::ChipDeviceEvent const*) <null>:246088793 (chip-tool:x86_64+0x101432a86) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.680 - TEST ERR  :     #9 __tsan::invoke_and_release_block(void*) <null>:246088793 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x7d1db) (BuildId: 4c4c44ce55553144a15e063c7d8e617d2400000010000000000a0a0000010c00)
ERROR   14:34:48.680 - TEST ERR  :     #10 _dispatch_client_callout <null>:246088793 (libdispatch.dylib:x86_64+0x3316) (BuildId: 1a04b38076e43e4bb0fc9837533d021d32000000200000000100000000060c00)
ERROR   14:34:48.680 - TEST ERR  : 
ERROR   14:34:48.680 - TEST ERR  :   Location is heap block of size 832 at 0x7b5c00000380 allocated by main thread:
ERROR   14:34:48.680 - TEST ERR  :     #0 operator new(unsigned long) <null>:246088793 (libclang_rt.tsan_osx_dynamic.dylib:x86_64+0x80d3d) (BuildId: 4c4c44ce55553144a15e063c7d8e617d2400000010000000000a0a0000010c00)
ERROR   14:34:48.680 - TEST ERR  :     #1 std::__2::unique_ptr<Command, std::__2::default_delete<Command> > make_unique<WaitForCommissioneeCommand, CredentialIssuerCommands*&>(CredentialIssuerCommands*&) <null>:246088793 (chip-tool:x86_64+0x100050106) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.680 - TEST ERR  :     #2 registerCommandsDelay(Commands&, CredentialIssuerCommands*) <null>:246088793 (chip-tool:x86_64+0x10004ff8c) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.680 - TEST ERR  :     #3 main <null>:246088793 (chip-tool:x86_64+0x10018126d) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00)
ERROR   14:34:48.680 - TEST ERR  : 
ERROR   14:34:48.681 - TEST ERR  :   Thread T1 (tid=293070, running) is a GCD worker thread
ERROR   14:34:48.681 - TEST ERR  : 
ERROR   14:34:48.681 - TEST ERR  : SUMMARY: ThreadSanitizer: data race (chip-tool:x86_64+0x1001b419f) (BuildId: 4c4c44f655553144a16f6698152c453332000000200000000100000000000b00) in chip::Optional<unsigned long long>::ClearValue()+0x3f
ERROR   14:34:48.681 - TEST ERR  : ==================
ERROR   14:34:48.685 - TEST ERR  : Traceback (most recent call last):
ERROR   14:34:48.685 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/pigweed-venv/lib/python3.9/site-packages/websockets/legacy/protocol.py", line 968, in transfer_data
ERROR   14:34:48.699 - TEST ERR  :     message = await self.read_message()
ERROR   14:34:48.699 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/pigweed-venv/lib/python3.9/site-packages/websockets/legacy/protocol.py", line 1038, in read_message
ERROR   14:34:48.699 - TEST ERR  :     frame = await self.read_data_frame(max_size=self.max_size)
ERROR   14:34:48.699 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/pigweed-venv/lib/python3.9/site-packages/websockets/legacy/protocol.py", line 1113, in read_data_frame
ERROR   14:34:48.699 - TEST ERR  :     frame = await self.read_frame(max_size)
ERROR   14:34:48.699 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/pigweed-venv/lib/python3.9/site-packages/websockets/legacy/protocol.py", line 1170, in read_frame
ERROR   14:34:48.699 - TEST ERR  :     frame = await Frame.read(
ERROR   14:34:48.700 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/pigweed-venv/lib/python3.9/site-packages/websockets/legacy/framing.py", line 69, in read
ERROR   14:34:48.701 - TEST ERR  :     data = await reader(2)
ERROR   14:34:48.701 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/cipd/packages/python/lib/python3.9/asyncio/streams.py", line 723, in readexactly
ERROR   14:34:48.702 - TEST ERR  :     await self._wait_for_data('readexactly')
ERROR   14:34:48.702 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/cipd/packages/python/lib/python3.9/asyncio/streams.py", line 517, in _wait_for_data
ERROR   14:34:48.703 - TEST ERR  :     await self._waiter
ERROR   14:34:48.703 - TEST ERR  :   File "/Users/runner/work/connectedhomeip/connectedhomeip/.environment/cipd/packages/python/lib/python3.9/asyncio/selector_events.py", line 856, in _read_ready__data_received
ERROR   14:34:48.705 - TEST ERR  :     data = self._sock.recv(self.max_size)

Platform

No response

Anything else?

No response

@arkq
Copy link
Contributor

arkq commented Jul 4, 2023

In the https://github.com/project-chip/connectedhomeip/actions/runs/5441510163/jobs/9908339747?pr=25311 run I've seen also something like that:

chip-tool(13125,0x11811e600) malloc: nano zone abandoned due to inability to preallocate reserved vm space.

I'm not familiar with Darwin platform, though, so I don't know whether malloc failure is fatal in that case...

@bzbarsky-apple
Copy link
Contributor

so I don't know whether malloc failure is fatal in that case...

It's not a failure in anything that matters: malloc is still succeeding.

@bzbarsky-apple
Copy link
Contributor

OK, so I think the TSAN failure is more or less a red herring. It's showing that we don't handle a command timing out correctly, but I think the real issue here is that the WaitForCommissionee times out. From https://github.com/project-chip/connectedhomeip/actions/runs/5401576138/jobs/9811580025:

ERROR   14:34:35.555 - TEST OUT  : 		***** Test Step 8 : DUT is commissioned by TH2 on Fabric ID2
...
ERROR   14:34:48.676 - TEST ERR  : WARNING: ThreadSanitizer: data race (pid=27621)

That's running a command with a 10-second timeout, right?

https://github.com/project-chip/connectedhomeip/actions/runs/5441510163/jobs/9908339747?pr=25311 seems unrelated to this issue.

@woody-apple
Copy link
Contributor Author

@bzbarsky-apple The TSAN failure is real right?

@bzbarsky-apple
Copy link
Contributor

@bzbarsky-apple The TSAN failure is real right?

It is. My point above is that if I understand the situation correctly and we just fixed the TSAN failure (which is highlighting that command timeout handling in chip-tool is kind of broken), the job would still have failed because the command in fact timed out.

@woody-apple woody-apple added this to the 1.2 milestone Jul 11, 2023
@woody-apple woody-apple changed the title [Build] Darwin tsan failure [Build] Darwin tsan failure in chip-tool Jul 11, 2023
@woody-apple woody-apple removed this from the 1.2 milestone Jul 11, 2023
@github-actions github-actions bot removed the darwin label Jul 11, 2023
@woody-apple woody-apple added this to the Future milestone Jul 11, 2023
bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Sep 15, 2023
Since the Matter event loop is still running in interacive mode, we need to do
the command cleanup on that event loop.

* Fixes project-chip#29275
* Fixes project-chip#27535
@bzbarsky-apple
Copy link
Contributor

#29277 moves the shutdown-after-timeout to the Matter queue when in interactive mode, which should solve the threading issue here.

@mergify mergify bot closed this as completed in #29277 Sep 15, 2023
mergify bot pushed a commit that referenced this issue Sep 15, 2023
…29277)

* Fix chip-tool threading asserts if an interactive command times out.

Since the Matter event loop is still running in interacive mode, we need to do
the command cleanup on that event loop.

* Fixes #29275
* Fixes #27535

* Address review comment.
@github-project-automation github-project-automation bot moved this from Todo to Done in [Build] Build Issues Sep 15, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in [Platform] Darwin Sep 15, 2023
HunsupJung pushed a commit to HunsupJung/connectedhomeip that referenced this issue Oct 23, 2023
…roject-chip#29277)

* Fix chip-tool threading asserts if an interactive command times out.

Since the Matter event loop is still running in interacive mode, we need to do
the command cleanup on that event loop.

* Fixes project-chip#29275
* Fixes project-chip#27535

* Address review comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants