Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate flaky test-debug-port-from-cmdline #5368

Closed
Trott opened this issue Feb 22, 2016 · 19 comments · Fixed by #5904
Closed

Investigate flaky test-debug-port-from-cmdline #5368

Trott opened this issue Feb 22, 2016 · 19 comments · Fixed by #5904
Labels
test Issues and PRs related to the tests. v8 engine Issues and PRs related to the V8 dependency.

Comments

@Trott
Copy link
Member

Trott commented Feb 22, 2016

Example failure:

/cc the short list of folks who have touched the test in the last year or so, no reason to think they're responsible, but you know, gotta start somewhere... @misterdjules @joaocgreis @bnoordhuis

@Trott Trott added debugger test Issues and PRs related to the tests. labels Feb 22, 2016
@santigimeno
Copy link
Member

@Trott Did it timed out? If so, I was about to report it. I have seen it in on OS X and Jessie 64. I have investigated a little and don't understand what's going on, but by placing fprintf's in the code and when the test fails I have observed a couple of things that are different from when the test passes:

  • The execution in the child process does not seem to go further from a v8::Debug::DebugBreak(isolate); call at https://github.com/nodejs/node/blob/master/src/node.cc#L3524 so it looks like the main thread is never notified by the uv_async_send call.
  • Some printf's are located in the node::Init function. When the test passes, the lines from the parent process are printed before the ones from the child process whereas when it fails, the lines from the child process are printed before the ones from the parent process.

I hope it helps somehow

@Trott
Copy link
Member Author

Trott commented Feb 22, 2016

@santigimeno Yes, it times out:

not ok 168 test-debug-port-from-cmdline.js
# TIMEOUT

@santigimeno
Copy link
Member

It looks like there's a deadlock on the child process. I'm getting this backtrace when the test times out:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fc1bc3c44d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fc1bc3c4336 in __GI___pthread_mutex_lock (mutex=0x3096820) at ../nptl/pthread_mutex_lock.c:114
#3  0x0000000000ac2da5 in v8::internal::StackGuard::RequestInterrupt(v8::internal::StackGuard::InterruptFlag) ()
#4  0x0000000000e3832f in node::EnableDebugSignalHandler(int) ()
#5  <signal handler called>
#6  0x00007fc1bc3c4330 in __GI___pthread_mutex_lock (mutex=0x3096820) at ../nptl/pthread_mutex_lock.c:114
#7  0x0000000000ac2ce2 in v8::internal::StackGuard::PopPostponeInterruptsScope() ()
#8  0x0000000000c906c2 in v8::internal::RegExpImpl::CompileIrregexp(v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, bool) ()
#9  0x0000000000c90c64 in v8::internal::RegExpImpl::IrregexpPrepare(v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>) ()
#10 0x0000000000c90ecd in v8::internal::RegExpImpl::GlobalCache::GlobalCache(v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, bool, v8::internal::Isolate*) ()
#11 0x0000000000d093d8 in v8::internal::Runtime_StringMatch(int, v8::internal::Object**, v8::internal::Isolate*) ()
#12 0x000009e40750b61b in ?? ()
#13 0x000009e40750b561 in ?? ()
#14 0x00007fff0754e040 in ?? ()
#15 0x00007fff0754e098 in ?? ()

@santigimeno
Copy link
Member

Also, when running the test with node compiled with --debug I get sometimes this stacktrace:

=== release test-debug-port-from-cmdline ===                     
Path: parallel/test-debug-port-from-cmdline
> #
> # Fatal error in ../deps/v8/src/execution.cc, line 292
> # Check failed: (thread_local_.interrupt_flags_ & top->intercept_mask_) == 0.
> #
> 
> ==== C stack trace ===============================
> 1: V8_Fatal
>  2: v8::internal::StackGuard::PopPostponeInterruptsScope()
>  3: v8::internal::PostponeInterruptsScope::~PostponeInterruptsScope()
>  4: v8::internal::RegExpImpl::Compile(v8::internal::Handle<v8::internal::JSRegExp>, v8::internal::Handle<v8::internal::String>, v8::internal::JSRegExp::Flags)
>  5: 0x1468ae0
>  6: v8::internal::Runtime_RegExpInitializeAndCompile(int, v8::internal::Object**, v8::internal::Isolate*)
>  7: 0x1c23ec70b61b
assert.js:89
  throw new assert.AssertionError({
  ^
AssertionError: 13 == 2
    at assertOutputLines (/home/sgimeno/node/node/test/parallel/test-debug-port-from-cmdline.js:45:10)
    at process.<anonymous> (/home/sgimeno/node/node/test/parallel/test-debug-port-from-cmdline.js:26:3)
    at emitOne (events.js:96:20)
    at process.emit (events.js:183:7)
Command: out/Release/node /home/sgimeno/node/node/test/parallel/test-debug-port-from-cmdline.js

@santigimeno
Copy link
Member

This is a backtrace from the main thread of the child process with node compiled in debug mode, when the test fails:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f6b253d94d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f6b253d9336 in __GI___pthread_mutex_lock (mutex=0x3af28f8) at ../nptl/pthread_mutex_lock.c:114
#3  0x00000000016eaba5 in v8::base::LockNativeHandle (mutex=0x3af28f8) at ../deps/v8/src/base/platform/mutex.cc:57
#4  0x00000000016eae96 in v8::base::RecursiveMutex::Lock (this=0x3af28f8) at ../deps/v8/src/base/platform/mutex.cc:163
#5  0x0000000000dbec2b in v8::internal::ExecutionAccess::Lock (isolate=0x3af10d0) at ../deps/v8/src/isolate.h:1434
#6  0x0000000000dbebeb in v8::internal::ExecutionAccess::ExecutionAccess (this=0x7ffc97307350, isolate=0x3af10d0) at ../deps/v8/src/isolate.h:1430
#7  0x00000000011a3f86 in v8::internal::StackGuard::RequestInterrupt (this=0x3af2930, flag=v8::internal::StackGuard::DEBUGBREAK) at ../deps/v8/src/execution.cc:307
#8  0x0000000000dbe1a5 in v8::internal::StackGuard::RequestDebugBreak (this=0x3af2930) at ../deps/v8/src/execution.h:121
#9  0x0000000000df0fd2 in v8::Debug::DebugBreak (isolate=0x3af10d0) at ../deps/v8/src/api.cc:7750
#10 0x0000000001628a14 in node::TryStartDebugger () at ../src/node.cc:3523
#11 0x0000000001628b8f in node::EnableDebugSignalHandler (signo=10) at ../src/node.cc:3557
#12 <signal handler called>
#13 __GI___pthread_mutex_lock (mutex=0x3af28f8) at ../nptl/pthread_mutex_lock.c:163
#14 0x00000000016eaba5 in v8::base::LockNativeHandle (mutex=0x3af28f8) at ../deps/v8/src/base/platform/mutex.cc:57
#15 0x00000000016eae96 in v8::base::RecursiveMutex::Lock (this=0x3af28f8) at ../deps/v8/src/base/platform/mutex.cc:163
#16 0x0000000000dbec2b in v8::internal::ExecutionAccess::Lock (isolate=0x3af10d0) at ../deps/v8/src/isolate.h:1434
#17 0x0000000000dbebeb in v8::internal::ExecutionAccess::ExecutionAccess (this=0x7ffc97307a60, isolate=0x3af10d0) at ../deps/v8/src/isolate.h:1430
#18 0x00000000011a3e6e in v8::internal::StackGuard::PopPostponeInterruptsScope (this=0x3af2930) at ../deps/v8/src/execution.cc:289
#19 0x000000000105d663 in v8::internal::PostponeInterruptsScope::~PostponeInterruptsScope (this=0x7ffc97307ad0, __in_chrg=<optimized out>) at ../deps/v8/src/isolate.h:1486
#20 0x000000000106505e in v8::internal::GetUnoptimizedCodeCommon (info=0x7ffc97307b90) at ../deps/v8/src/compiler.cc:743
#21 0x0000000001066135 in v8::internal::Compiler::GetLazyCode (function=...) at ../deps/v8/src/compiler.cc:985
#22 0x000000000141dbd5 in v8::internal::__RT_impl_Runtime_CompileLazy (args=..., isolate=0x3af10d0) at ../deps/v8/src/runtime/runtime-compiler.cc:38
#23 0x000000000141d9e0 in v8::internal::Runtime_CompileLazy (args_length=1, args_object=0x7ffc97307ee8, isolate=0x3af10d0) at ../deps/v8/src/runtime/runtime-compiler.cc:20
#24 0x0000339d7b60b61b in ?? ()
#25 0x00007ffc97307f28 in ?? ()
#26 0x0000339d7b60b561 in ?? ()

@santigimeno
Copy link
Member

@Trott
Copy link
Member Author

Trott commented Feb 26, 2016

So we have failures on fedora22 and centos5-64...

@Trott
Copy link
Member Author

Trott commented Feb 26, 2016

Debian and OS X too, according to @santigimeno.

Is this a deadlock in v8 or is the problem somewhere else? @nodejs/v8

@tunniclm
Copy link
Contributor

Typically pthread_mutex_lock is not async-signal-safe, and could deadlock if interrupted with a signal handler that also calls pthread_mutex_lock (possibly only on the same mutex). From the stack traces shown, that might be what is occurring here.

@santigimeno
Copy link
Member

Could this message from @bnoordhuis be related to the issue? . He says:

There's a little bit of additional magic in that sending a SIGUSR1 to
the process will start the debugger with a call to
v8::Debug::DebugBreak(). Caveat emptor: although it works for us, I
don't think V8 makes any guarantees about async signal-safety. That
it works may be an accident of the implementation.

@bnoordhuis
Copy link
Member

Yes, seems that's it. v8::Debug::DebugBreak() used to be async signal-safe but it no longer is. Looking at how it's currently implemented, I don't think it's going to be easy to change that back.

@Trott
Copy link
Member Author

Trott commented Mar 19, 2016

What's the correct path forward here? Modify the test? Eliminate the test? Change the underlying implementation? Deprecate/eliminate a particular aspect of the feature? Something else? Difficult to determine?

@bnoordhuis
Copy link
Member

I've been thinking about that but I'm not sure. The best I've been able to come up with so far is using a semaphore to wake up a watchdog thread from the signal handler (because sem_post() is async signal-safe), which then calls v8::Debug::DebugBreak().

@hashseed
Copy link
Member

Why would the signal handler for mutex lock attempt to issue a debug break?

@bnoordhuis
Copy link
Member

@hashseed I'm not sure I understand the question. We call v8::Debug::DebugBreak() on SIGUSR1. That used to work but now V8 tries to grab a mutex and that occasionally deadlocks.

@hashseed
Copy link
Member

Let me see if I understand the backtrace correctly:

  • V8 grabs the ExecutionAccess mutex to access the interrupt bitfield.
  • At that point a SIGUSR1 comes in, and we jump to the signal handler.
  • The signal handler attempts to call v8::Debug::DebugBreak.
  • v8::Debug::DebugBreak attempts to grab the ExecutionAccess mutex.
  • This causes the deadlock.

I suspect that it used to work in older V8 because we did not grab the mutex, but should have.

One solution I can think of is to introduce an API to probe the mutex to see whether it can be acquired. If not, the embedder needs to reschedule the signal.

@Trott
Copy link
Member Author

Trott commented Mar 24, 2016

Here it is being flaky for the Linux fips build too: https://ci.nodejs.org/job/node-test-commit-linux-fips/1368/nodes=ubuntu1404-64/console

@Trott
Copy link
Member Author

Trott commented Apr 20, 2016

@Trott
Copy link
Member Author

Trott commented May 4, 2016

@Trott Trott added the v8 engine Issues and PRs related to the V8 dependency. label May 4, 2016
bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue May 10, 2016
Calling v8::Debug::DebugBreak() directly from the signal handler is
unsafe because said function tries to grab a mutex.  Work around that
by starting a watchdog thread that is woken up from the signal handler,
which then calls v8::Debug::DebugBreak().

Using a watchdog thread also removes the need to use atomic operations
so this commit does away with that.

Fixes: nodejs#5368
PR-URL: nodejs#5904
Reviewed-By: Anna Henningsen <[email protected]>
evanlucas pushed a commit that referenced this issue May 17, 2016
Calling v8::Debug::DebugBreak() directly from the signal handler is
unsafe because said function tries to grab a mutex.  Work around that
by starting a watchdog thread that is woken up from the signal handler,
which then calls v8::Debug::DebugBreak().

Using a watchdog thread also removes the need to use atomic operations
so this commit does away with that.

Fixes: #5368
PR-URL: #5904
Reviewed-By: Anna Henningsen <[email protected]>
bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Jul 12, 2016
Calling v8::Debug::DebugBreak() directly from the signal handler is
unsafe because said function tries to grab a mutex.  Work around that
by starting a watchdog thread that is woken up from the signal handler,
which then calls v8::Debug::DebugBreak().

Using a watchdog thread also removes the need to use atomic operations
so this commit does away with that.

Fixes: nodejs#5368
PR-URL: nodejs#5904
Reviewed-By: Anna Henningsen <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Calling v8::Debug::DebugBreak() directly from the signal handler is
unsafe because said function tries to grab a mutex.  Work around that
by starting a watchdog thread that is woken up from the signal handler,
which then calls v8::Debug::DebugBreak().

Using a watchdog thread also removes the need to use atomic operations
so this commit does away with that.

Fixes: #5368
PR-URL: #5904
Reviewed-By: Anna Henningsen <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 12, 2016
Calling v8::Debug::DebugBreak() directly from the signal handler is
unsafe because said function tries to grab a mutex.  Work around that
by starting a watchdog thread that is woken up from the signal handler,
which then calls v8::Debug::DebugBreak().

Using a watchdog thread also removes the need to use atomic operations
so this commit does away with that.

Fixes: #5368
PR-URL: #5904
Reviewed-By: Anna Henningsen <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
Calling v8::Debug::DebugBreak() directly from the signal handler is
unsafe because said function tries to grab a mutex.  Work around that
by starting a watchdog thread that is woken up from the signal handler,
which then calls v8::Debug::DebugBreak().

Using a watchdog thread also removes the need to use atomic operations
so this commit does away with that.

Fixes: #5368
PR-URL: #5904
Reviewed-By: Anna Henningsen <[email protected]>
MylesBorins pushed a commit that referenced this issue Jul 14, 2016
Calling v8::Debug::DebugBreak() directly from the signal handler is
unsafe because said function tries to grab a mutex.  Work around that
by starting a watchdog thread that is woken up from the signal handler,
which then calls v8::Debug::DebugBreak().

Using a watchdog thread also removes the need to use atomic operations
so this commit does away with that.

Fixes: #5368
PR-URL: #5904
Reviewed-By: Anna Henningsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Issues and PRs related to the tests. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants