Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test parallel/test-worker-message-port-transfer-terminate #30846

Closed
joaocgreis opened this issue Dec 8, 2019 · 5 comments
Closed
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. v8 platform Issues and PRs related to Node's v8::Platform implementation. worker Issues and PRs related to Worker support.
Milestone

Comments

@joaocgreis
Copy link
Member

  • Version: master
  • Platform: Windows
  • Subsystem: test
foo[7636]: C:\workspace\node-compile-windows\node\src\node_platform.cc:275: Assertion `!flush_tasks_' failed.
 1: 00007FF6FAA6E2BF v8::internal::wasm::DisjointAllocationPool::~DisjointAllocationPool+89903
 2: 00007FF6FAA0C2B6 v8::base::CPU::has_sse+35974
 3: 00007FF6FAA0C5D3 v8::base::CPU::has_sse+36771
 4: 00007FF6FA9B84D2 v8::internal::wasm::JSToWasmWrapperCompilationUnit::~JSToWasmWrapperCompilationUnit+148338
 5: 00007FF6FA9B977B v8::internal::wasm::JSToWasmWrapperCompilationUnit::~JSToWasmWrapperCompilationUnit+153115
 6: 00007FF6FA9902BB v8::internal::interpreter::BytecodeLabel::bind+4379
 7: 00007FF6FA9928E0 v8::internal::interpreter::BytecodeLabel::bind+14144
 8: 00007FF6FA98F169 v8::internal::LookupIterator::index+92601
 9: 00007FF6FAAA8F0D uv_poll_stop+765
10: 00007FF6FB728230 v8::internal::SetupIsolateDelegate::SetupHeap+1526544
11: 00007FF96DF57BD4 BaseThreadInitThunk+20
12: 00007FF96FB0CED1 RtlUserThreadStart+33

On: https://ci.nodejs.org/computer/test-azure_msft-win10_vcbt2015-x64-4/

Ref: https://ci.nodejs.org/job/node-test-binary-windows-js-suites/69/RUN_SUBSET=1,nodes=win10-COMPILED_BY-vs2019/testReport/junit/(root)/test/parallel_test_worker_message_port_transfer_terminate/

@joaocgreis joaocgreis added flaky-test Issues and PRs related to the tests with unstable failures on the CI. worker Issues and PRs related to Worker support. labels Dec 8, 2019
@joaocgreis
Copy link
Member Author

cc @nodejs/workers

@addaleax addaleax added this to the 14.0.0 milestone Dec 8, 2019
@addaleax
Copy link
Member

addaleax commented Dec 8, 2019

@joaocgreis By the way… any idea how for how to get proper stack traces in Windows release builds? 😄

I feel like this might have been caused by c712fb7 (the CHECK was added there). I’ll take a look.

@joaocgreis
Copy link
Member Author

@addaleax if you're debugging locally, the easiest way is to build node on the console, and then open node.sln in Visual Studio and launch it from there (by setting node as the startup project in the Solution Explorer and adding command line arguments in properties).

If you need the debugger to launch only when node crashes (to run in a loop for example) you can use this script:

jitdebug.cmd
@echo off
setlocal
set _command=%1
set _image=%2

if /i "%_command%" == "" (
  set _command=on
) else if /i "%_command%" == "on" (
  set _command=on
) else if /i "%_command%" == "off" (
  set _command=off
) else (
  echo Invalid command: %_command%. Must be either "on" or "off".
  goto :eof
)

if "%_image%" == "" (
  set _image=node.exe
)

if /i "%_command%" == "on" (
REG ADD "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\%_image%" /v debugger /t REG_SZ /d vsjitdebugger.exe /f /reg:64
echo Just-In-Time debugging enabled for %_image%
) else (
REG DELETE "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\%_image%" /v debugger /f /reg:64
echo Just-In-Time debugging disabled for %_image%
)

I think there was a discussion somewhere about generating a memory dump automatically on crash, but I can't find it. IIRC, that was not straightforward.

@addaleax
Copy link
Member

addaleax commented Dec 11, 2019

@bnoordhuis How consistently can you reproduce this (in the form of #30850)? Can you verify whether this makes a difference?

diff --git a/src/node_worker.cc b/src/node_worker.cc
index 9976ec3f0ac0..c93a6e5b5a9e 100644
--- a/src/node_worker.cc
+++ b/src/node_worker.cc
@@ -189,8 +189,8 @@ class WorkerThreadData {
         *static_cast<bool*>(data) = true;
       }, &platform_finished);
 
-      isolate->Dispose();
       w_->platform_->UnregisterIsolate(isolate);
+      isolate->Dispose();
 
       // Wait until the platform has cleaned up all relevant resources.
       while (!platform_finished)

My only theory so far would be that there’s a race condition window in which a new Isolate could end up being allocated at the same address as the previous one on a different thread, which then fails to register with the platform instance because the PerIsolatePlatformData for the old one is still in the table and is thus destroyed immediately without having had Shutdown() called on it. That would be consistent with this only failing tests that spawn multiple Workers.

Also /cc @codebytere because we had a conversation about the order of these calls a while back which ended up in #30181 but in which I may have been mistaken about the order of the calls being irrelevant.

@Trott
Copy link
Member

Trott commented Dec 12, 2019

With @addaleax at Node+JS Interactive, I was able to reproduce this bug on my laptop infrequently but multiple times nonetheless. With the patch she proposes above, I am now unable to reproduce.

addaleax added a commit to addaleax/node that referenced this issue Dec 12, 2019
I previously thought the order of these calls was no longer
relevant. I was wrong.

This commit undoes the changes from 312c02d, adds a comment
explaining why I was wrong, and flips the order of the calls
elsewhere for consistency, the latter having been the goal
of 312c02d.

Fixes: nodejs#30846
Refs: nodejs#30181
@Trott Trott closed this as completed in 25447d8 Dec 14, 2019
MylesBorins pushed a commit that referenced this issue Dec 17, 2019
I previously thought the order of these calls was no longer
relevant. I was wrong.

This commit undoes the changes from 312c02d, adds a comment
explaining why I was wrong, and flips the order of the calls
elsewhere for consistency, the latter having been the goal
of 312c02d.

Fixes: #30846
Refs: #30181

PR-URL: #30909
Reviewed-By: Shelley Vohr <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: David Carlier <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
targos pushed a commit that referenced this issue Jan 14, 2020
I previously thought the order of these calls was no longer
relevant. I was wrong.

This commit undoes the changes from 312c02d, adds a comment
explaining why I was wrong, and flips the order of the calls
elsewhere for consistency, the latter having been the goal
of 312c02d.

Fixes: #30846
Refs: #30181

PR-URL: #30909
Reviewed-By: Shelley Vohr <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: David Carlier <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
BethGriggs pushed a commit that referenced this issue Feb 6, 2020
I previously thought the order of these calls was no longer
relevant. I was wrong.

This commit undoes the changes from 312c02d, adds a comment
explaining why I was wrong, and flips the order of the calls
elsewhere for consistency, the latter having been the goal
of 312c02d.

Fixes: #30846
Refs: #30181

PR-URL: #30909
Reviewed-By: Shelley Vohr <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Jiawen Geng <[email protected]>
Reviewed-By: Gus Caplan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: David Carlier <[email protected]>
Reviewed-By: Rich Trott <[email protected]>
@addaleax addaleax added the v8 platform Issues and PRs related to Node's v8::Platform implementation. label Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Issues and PRs related to the tests with unstable failures on the CI. v8 platform Issues and PRs related to Node's v8::Platform implementation. worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants