Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[6.x] Troubleshooting memory leaks #466

Closed
willingc opened this issue Apr 27, 2021 · 12 comments · Fixed by #631
Closed

[6.x] Troubleshooting memory leaks #466

willingc opened this issue Apr 27, 2021 · 12 comments · Fixed by #631

Comments

@willingc
Copy link
Contributor

willingc commented Apr 27, 2021

This is an issue to consolidate troubleshooting of memory leaks for greater visibility by others. Related #464

#464 (comment)

  • Memory leaks (GC failures). The tests for these fail:
  1) socket with inproc close
       in gc finalizer
         should release reference to context:
      AssertionError: expected false to equal true
      + expected - actual
      -false
      +true
      
      at Context.<anonymous> (test/unit/socket-close-test.ts:156:16)
  2) socket with tcp close
       in gc finalizer
         should release reference to context:
      AssertionError: expected false to equal true
      + expected - actual
      -false
      +true
      
      at Context.<anonymous> (test/unit/socket-close-test.ts:156:16)
  3) socket with ipc close
       in gc finalizer
         should release reference to context:
      AssertionError: expected false to equal true
      + expected - actual
      -false
      +true
      
      at Context.<anonymous> (test/unit/socket-close-test.ts:156:16)
@willingc
Copy link
Contributor Author

willingc commented Apr 27, 2021

Troubleshooting work
#464 (comment)

@aminya re memory leaks: tried my best, but I have no idea why it is leaking (either because of my C++ knowledge or the NAPI). I'll just provide what I tried and maybe it will be useful for you.

Something interesting I found here: https://github.com/zeromq/zeromq.js/blob/master/src/socket.cc#L79 which creates a strong reference to the context that was passed and when the socket is garbage collected, it unrefs the context, but the context destructor itself is being called after a few seconds (this, I don't understand why). Even if I don't reset the ref, it has the same behavior.

When I set the weak reference, it worked, but it's not the correct behavior, because if context will be garbage collected, socket.context will be also garbage collected.

Here's the sample code that reproduces it:

const weak = require("weak-napi")
const zmq = require("./lib")

const task = async () => {
  let context = new zmq.Context()
  new zmq.Dealer({context})

  weak(context, () => {
    console.info("memory freed")
  })

  context = undefined
  global.gc()
}

async function run() {
  await task()
  global.gc()

  /*
  * but if I run setImmediate(() => global.gc())
  * then it works.
  */

  // if I remove this, I get SIGSEGV
  setInterval(() => {}, 1000) // keep the event loop busy
}

run()

logs I get from SIGSEGV (if I won't keep event loop busy & call gc in the main loop)

* thread #1, name = 'node', stop reason = signal SIGSEGV: invalid address (fault address: 0x61)
  * frame #0: 0x0000000000000061
    frame #1: 0x00000000009b8de7 node`v8impl::(anonymous namespace)::RefBase::Finalize(bool) + 231
    frame #2: 0x00000000009d85cf node`node_napi_env__::~node_napi_env__() + 95
    frame #3: 0x00000000009a9fdb node`node::Environment::RunCleanup() + 619
    frame #4: 0x000000000096d547 node`node::FreeEnvironment(node::Environment*) + 71
    frame #5: 0x0000000000a4483f node`node::NodeMainInstance::Run() + 271
    frame #6: 0x00000000009d1e15 node`node::Start(int, char**) + 277
    frame #7: 0x00007ffff7a810b3 libc.so.6`__libc_start_main + 243
    frame #8: 0x00000000009694cc node`_start + 41

Originally posted by @overflowz in #464 (comment)

@overflowz
Copy link

The SIGSEGV part can be ignored, it was caused by the weak-napi library.

@aminya aminya pinned this issue Apr 28, 2021
@aminya aminya added the 6.x label Apr 28, 2021
@aminya aminya changed the title Troubleshooting memory leaks [6.x] Troubleshooting memory leaks Apr 28, 2021
@overflowz
Copy link

UPDATES: I made a test project (just for PoC) to try to reproduce the issue with a minimal example. I created two classes (Context & Socket) where Socket accepts the Context as a value and creates a persistent reference to it.

The results were the same. Either it's not a bug (because it gets freed later) or it's a NAPI issue (or I'm doing something wrong).

They look like this:

// xsocket.cc
#include "xsocket.h"

Napi::Object XSocket::Init(Napi::Env env, Napi::Object exports) {
  auto constructor = DefineClass(env, "XSocket", {});

  exports.Set("XSocket", constructor);
  return exports;
}

XSocket::XSocket(const Napi::CallbackInfo& info)
    : Napi::ObjectWrap<XSocket>(info) {

      auto options = info[0].As<Napi::Object>();
      if (options.Has("context")) {
        printf("got context option, creating reference\n");
        context_ref.Reset(options.Get("context").As<Napi::Object>(), 1);
      }
}

XSocket::~XSocket() {
  printf("XSocket: destroy\n");
  context_ref.Reset();
}
// context.cc
#include "xsocket.h"

Napi::Object XSocket::Init(Napi::Env env, Napi::Object exports) {
  auto constructor = DefineClass(env, "XSocket", {});

  exports.Set("XSocket", constructor);
  return exports;
}

XSocket::XSocket(const Napi::CallbackInfo& info)
    : Napi::ObjectWrap<XSocket>(info) {

      auto options = info[0].As<Napi::Object>();
      if (options.Has("context")) {
        printf("got context option, creating reference\n");
        context_ref.Reset(options.Get("context").As<Napi::Object>(), 1);
      }
}

XSocket::~XSocket() {
  printf("XSocket: destroy\n");
  context_ref.Reset();
}

The test case in JS looks like this:

const zmq = require("bindings")("gctest");

const task = async () => {
  let context = new zmq.Context();
  new zmq.XSocket({ context });

  context = undefined;
  global.gc();
};

async function run() {
  await task();
  global.gc();

  setInterval(() => {}, 100);
}

run();

output:

got context option, creating reference
XSocket: destroy
// and after ~10 seconds
Context: destroy

@willingc
Copy link
Contributor Author

Thanks @overflowz. It's helpful to have a use case to hunt down the root cause.

@aminya
Copy link
Member

aminya commented Apr 29, 2021

Yeah, maybe we just need to wait until it gets released? Unrefing a reference doesn't guarantee instant freeing of the memory.

@overflowz
Copy link

I asked the question here: nodejs/node-addon-api#985 and waiting for the reply for now.

@overflowz
Copy link

Got a reply, does not seem to be a leak (the GC might kick in later). We need to test this differently I believe.

nodejs/node-addon-api#985 (comment)

@aminya
Copy link
Member

aminya commented May 6, 2021

If we really need to test the garbage collection we need to increase the timeout.

@Bartel-C8
Copy link
Contributor

Bartel-C8 commented Dec 7, 2021

Any update on a new 6.x release? Thanks!
@aminya

@vafanassieff
Copy link

Hey !

Is the project still active ?
How can we help to debug the memory leak ?

Best !

@Bartel-C8
Copy link
Contributor

So @aminya ? Is it actually recommended to fall back to 5.x.x as this is still in beta?

The M1 support is highly wanted, for example Microsoft VSCode is using zeromq: microsoft/vscode-jupyter#10386 . Seems that they are willing to help contributing, big chance to grab, no?
Thanks!

@aminya
Copy link
Member

aminya commented Nov 17, 2022

I have opened #529 to track the blocking issues for the v6.0.0 release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants