Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault in embedded NodeJS when exiting #43687

Closed
viferga opened this issue Jul 5, 2022 · 1 comment
Closed

Segmentation fault in embedded NodeJS when exiting #43687

viferga opened this issue Jul 5, 2022 · 1 comment

Comments

@viferga
Copy link

viferga commented Jul 5, 2022

Version

v12.21.0 | v14.x | v15.x

Platform

Linux a8e75a23e0fd 5.11.0-49-generic #55-Ubuntu x86_64 GNU/Linux

Subsystem

node.h / node internals

What steps will reproduce the bug?

The setup for reproducing the bug is really complex to implement so the best way to reproduce it is to clone the MetaCall repository and run the tests through docker (compose):

git clone https://github.com/metacall/core.git
cd core
./docker-compose.sh test &> output

Then open the output file and you will see different failed tests, some of them are false positives because some runtimes need to be recompiled to fully support sanitizers. You can check out the output of some node failed tests and you will see most of them are due to this bug.

How often does it reproduce? Is there a required condition?

I think this bug is related to threading because it does not happen always, it's basically unpredictable, but the failure point is always the same. Just to provide some context, what I am doing here is embedding NodeJS into a C/C++ application. In order to control when to delete the queues that force node to be open, I have a counter of the asynchronous handles opened. On each event loop iteration I check for them and if they reach the same amount of asynchronous handles as the ones that MetaCall needs to run, then I issue a destroy invoke that deletes all those queues, letting libuv to gracefully die.

What is the expected behavior?

It should finalize gracefully without generating a segmentation fault.

What do you see instead?

NodeJS finalizes with a segmentation fault.

==7578==ERROR: AddressSanitizer: SEGV on unknown address 0x00021a000062 (pc 0x7fef1c8d1f99 bp 0x7fef19fc6e10 sp 0x7fef19fc6dd8 T2)
==7578==The signal is caused by a READ memory access.
    #0 0x7fef1c8d1f99  (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7b4f99)
    #1 0x7fef1cc2df22 in v8::internal::GlobalHandles::PendingPhantomCallback::Invoke(v8::internal::Isolate*, v8::internal::GlobalHandles::PendingPhantomCallback::InvocationType) (/usr/lib/x86_64-linux-gnu/libnod
e.so.72+0xb10f22)
    #2 0x7fef1cc2dfac in v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacks() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0xb10fac)
    #3 0x7fef1cc2e031 in v8::internal::GlobalHandles::InvokeSecondPassPhantomCallbacksFromTask() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0xb11031)
    #4 0x7fef1c999ef4 in node::PerIsolatePlatformData::RunForegroundTask(std::unique_ptr<v8::Task, std::default_delete<v8::Task> >) (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x87cef4)
    #5 0x7fef1c99ab75 in node::PerIsolatePlatformData::FlushForegroundTasksInternal() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x87db75)
    #6 0x7fef1c0fbe7c  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0xee7c)
    #7 0x7fef1c10da22  (/usr/lib/x86_64-linux-gnu/libuv.so.1+0x20a22)
    #8 0x7fef1c0fc713 in uv_run (/usr/lib/x86_64-linux-gnu/libuv.so.1+0xf713)
    #9 0x7fef1c8c92df in node::Environment::CleanupHandles() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7ac2df)
    #10 0x7fef1c8c968b in node::Environment::RunCleanup() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7ac68b)
    #11 0x7fef1c87d620 in node::FreeEnvironment(node::Environment*) (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x760620)
    #12 0x7fef1c968cc8 in node::NodeMainInstance::Run() (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x84bcc8)
    #13 0x7fef1c8eb122 in node::Start(int, char**) (/usr/lib/x86_64-linux-gnu/libnode.so.72+0x7ce122)
    #14 0x7fef1e440bb3 in node_loader_impl_thread /usr/local/metacall/source/loaders/node_loader/source/node_loader_impl.cpp:4175
    #15 0x7fef24abdd7f in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7d7f)
    #16 0x7fef249d776e in clone (/lib/x86_64-linux-gnu/libc.so.6+0xfa76e)

Additional information

I know this is a complex bug that probably is not related to NodeJS itself, but I think this is a valid use case and it will be interesting if you can give me some hints in order to solve this accordingly. My current system for detecting asynchronous handles is the only way that I have found to finalize NodeJS gracefully, because the embedding API does not have this implemented. So I ended up doing this kind of hack in order to end the event loop properly, but still seems there's some race condition. Thanks for your time.

If you want to review the methodology I have used, there's some interesting parts here:

This is the function that tires to destroy:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L5161

This is the part that checks if there's async handles, and otherwise it hooks into the event loop for doing the check of the async handles on each iteration:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4870

And from here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4803

..to here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4856

There's the hooks for the event loop, I have two hooks, in the prepare and check stage.

If everything goes well, all the async queues are destroyed here:
https://github.com/metacall/core/blob/af94689f9ba64f69af7eabba8a7097debcee80f5/source/loaders/node_loader/source/node_loader_impl.cpp#L4997

Sorry for the spaghetti code, this will be eventually refactored. Even if the code is bad, it has a good test suite with valgrind and sanitizers so it can be refactored without fear in the future. My workforce is limited right now.

@bnoordhuis
Copy link
Member

I'll convert this to a discussion, you'll probably get more response that way than when it's languishing here as a "probably not our bug" type of issue.

@nodejs nodejs locked and limited conversation to collaborators Jul 20, 2022
@bnoordhuis bnoordhuis converted this issue into discussion #43914 Jul 20, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants