Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diagnostic report behavior on fatal error #331

Closed
boneskull opened this issue Oct 8, 2019 · 14 comments
Closed

diagnostic report behavior on fatal error #331

boneskull opened this issue Oct 8, 2019 · 14 comments
Labels

Comments

@boneskull
Copy link

boneskull commented Oct 8, 2019

By "fatal error" I mean stuff like "out of memory", not exceptions thrown from JS.

  • In Node.js v12.11.0, passing --experimental-report to the node executable will cause a report to be written to disk
  • --report-on-fatalerror does not seem to have any practical effect that I'm aware of?

Furthermore, when a fatal error does occur, we get a stack-trace-ish thing (though not always--has something to do with where we are in the event loop), similar to the following:

<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x10092bed9]
Security context: 0x0f61e84c08a1 <JSObject>
    1: nextTick [0xf61a90422f9] [internal/process/task_queues.js:~101] [pc=0x181bf64721a](this=0x0f61a9042481 <process map = 0xf613daf6941>,0x0f61bcdd0341 <JSFunction push (sfi = 0xf61c6f6e841)>)
    2: processTicksAndRejections [0xf61a90425d9] [internal/process/task_queues.js:~65] [pc=0x181bf64609b](this=0x0f61a9042481 <process map = 0xf613daf6941>)
    3: Int...

The above information is not in the written diagnostic report in any form. My questions, then:

  1. Is the above information practically useful? It does not show the root cause of the OOM fatal, but only where the process ran OOM
  2. If it is useful, should it be in the report?
  3. If it should be in the report, how can it be trapped and put into the report?
  4. Is there a bug in the implementation wherein --report-on-fatalerror is always enabled?
@boneskull
Copy link
Author

cc @sam-github @richardlau

@sam-github
Copy link

@boneskull Do you have a reproduction? Something I can paste into my shell?

In OOM, the location of OOM is likely unrelated to the location of the memory leak. Automatic identification of the root cause of a memory leak is an unsolved problem, but the stack might be useful sometimes, its worth including.

Also, fatal errors have multiple possible causes, not just OOM, and in those cases the stack is more likely to be relevant.

@sam-github
Copy link

I found test/report/test-report-fatal-error.js and I'm running to see what the report looks like. Afterwards I'll run it without --report-on-fatalerror to see if the behaviour differs.

@richardlau
Copy link
Member

4. Is there a bug in the implementation wherein `--report-on-fatalerror` is always enabled?

Yes, see nodejs/node#29601.

With regards to a recreate, if I run on the current master:

bash-4.2$ ./node --experimental-report --max-old-space-size=20 -e 'const list = []; while(true) { list.push( new Record()); } function Record() { this.name = "foo"; }'

<--- Last few GCs --->

[35247:0x542cfc0]      391 ms: Mark-sweep 36.2 (55.0) -> 34.8 (65.7) MB, 97.6 / 0.0 ms  (average mu = 0.109, current mu = 0.070) allocation failure scavenge might not succeed
[35247:0x542cfc0]      455 ms: Mark-sweep 34.8 (65.7) -> 34.6 (67.5) MB, 63.7 / 0.0 ms  (average mu = 0.066, current mu = 0.000) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x13a6df9]
Security context: 0x1f4611f808d1 <JSObject>
    1: /* anonymous */ [0x10a6f3388ab9] [[eval]:~1] [pc=0x236ff6f436e3](this=0x3550aa6c2549 <JSGlobal Object>)
    2: InternalFrame [pc: 0x132a4fa]
    3: EntryFrame [pc: 0x132a2d8]
    4: builtin exit frame: runInThisContext(this=0x3550aa6f9c91 <ContextifyScript map = 0xdb8743d1f9>,0x011f225406e9 <false>,0x011f225406e9 <false>,0x011f22540639 <true>,-1,0x3550aa6f...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

Writing Node.js report to file: report.20191008.134109.35247.0.001.json
Node.js report completed
 1: 0x9e4000 node::Abort() [./node]
 2: 0x9e60de node::OnFatalError(char const*, char const*) [./node]
 3: 0xb502de v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [./node]
 4: 0xb50657 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [./node]
 5: 0xd09a35  [./node]
 6: 0xd0a421 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [./node]
 7: 0xd16d2b v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [./node]
 8: 0xd17b23 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [./node]
 9: 0xd1918a v8::internal::Heap::CollectAllAvailableGarbage(v8::internal::GarbageCollectionReason) [./node]
10: 0xd1a4e6 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [./node]
11: 0xce0bea v8::internal::Factory::NewUninitializedFixedArray(int, v8::internal::AllocationType) [./node]
12: 0xe5c92f  [./node]
13: 0xe5cbff  [./node]
14: 0x1009b6a v8::internal::Runtime_GrowArrayElements(int, unsigned long*, v8::internal::Isolate*) [./node]
15: 0x13a6df9  [./node]
Aborted (core dumped)
bash-4.2$

report.20191008.134109.35247.0.001.json contains the native stack but not the JavaScript stack:

  "javascriptStack": {
    "message": "No stack.",
    "stack": [
      "Unavailable."
    ]
  },
  "nativeStack": [
    {
      "pc": "0x0000000000b198a5",
      "symbol": "report::TriggerNodeReport(v8::Isolate*, node::Environment*, char const*, char const*, std::string const&, v8::Local<v8::String>) [./node]"
    },
    {
      "pc": "0x00000000009e60a9",
      "symbol": "node::OnFatalError(char const*, char const*) [./node]"
    },
    {
      "pc": "0x0000000000b502de",
      "symbol": "v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [./node]"
    },
    {
      "pc": "0x0000000000b50657",
      "symbol": "v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [./node]"
    },
    {
      "pc": "0x0000000000d09a35",
      "symbol": " [./node]"
    },
    {
      "pc": "0x0000000000d0a421",
      "symbol": "v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [./node]"
    },
    {
      "pc": "0x0000000000d16d2b",
      "symbol": "v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [./node]"
    },
    {
      "pc": "0x0000000000d17b23",
      "symbol": "v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [./node]"
    },
    {
      "pc": "0x0000000000d1918a",
      "symbol": "v8::internal::Heap::CollectAllAvailableGarbage(v8::internal::GarbageCollectionReason) [./node]"
    },
    {
      "pc": "0x0000000000d1a4e6",
      "symbol": "v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [./node]"
    },
    {
      "pc": "0x0000000000ce0bea",
      "symbol": "v8::internal::Factory::NewUninitializedFixedArray(int, v8::internal::AllocationType) [./node]"
    },
    {
      "pc": "0x0000000000e5c92f",
      "symbol": " [./node]"
    },
    {
      "pc": "0x0000000000e5cbff",
      "symbol": " [./node]"
    },
    {
      "pc": "0x0000000001009b6a",
      "symbol": "v8::internal::Runtime_GrowArrayElements(int, unsigned long*, v8::internal::Isolate*) [./node]"
    },
    {
      "pc": "0x00000000013a6df9",
      "symbol": " [./node]"
    }
  ],

@boneskull
Copy link
Author

@sam-github I adapted the fatal error example from node-report. as written, that example shows no "JS Stacktrace", ostensibly because it's running blocking code--so I updated it to (ab)use process.tick:

// Example - generation of report on fatal error (JavaScript heap OOM)
var http = require('http');

function my_listener(request, response) {
  switch (request.url) {
  case '/':
    response.writeHead(200, "OK",{'Content-Type': 'text/html'});
    response.write('<html><head><title>node-report example</title></head><body style="font-family:arial;">');
    response.write('<h2>node-report example: report triggered on fatal error (heap OOM)</h2>');
    response.write('<p>Click on button below to initiate excessive memory usage.');
    response.write('<p>The application will fail when the heap is full, and the node-report module will produce a report.');
    response.write('<form enctype="application/x-www-form-urlencoded" action="/memory" method="post">');
    response.write('<button>Use memory</button></form>');
    response.write('</form></body></html');
    response.end();
    break;
  case '/memory':
    console.log('fatalerror.js: allocating excessive JavaScript heap memory....please wait');
    var list = [];
    function push() {
      list.push(new my_record());
      process.nextTick(push);
    }
    push();
  }
}

function my_record() {
  this.name = 'foo';
  this.id = 128;
  this.account = 98454324;
}

var http_server = http.createServer(my_listener);
http_server.listen(8080);

console.log('fatalerror.js: Node running');
console.log('fatalerror.js: Note: heap default is 1.4Gb, use --max-old-space-size=<size in Mb> to change');
console.log('fatalerror.js: Go to http://<machine>:8080/ or http://localhost:8080/');

setTimeout(function(){
  console.log('fatalerror.js: application timeout expired, exiting.');
  process.exit(0);
}, 60000);

run via node --experimental-report oom.js then load http://localhost:8080 in browser (I realize this is not strictly necessary, but it's a copypasta job)

@sam-github
Copy link

I ran ./out/Release/node --experimental-report test/report/test-report-fatal-error.js child, I see a node-report, the c++ stack in json is:

 "nativeStack": [                                                               
    {                                                                            
      "pc": "0x000055df792c8379",                                                
      "symbol": "report::TriggerNodeReport(v8::Isolate*, node::Environment*, char const*, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator
    },                                                                           
    {                                                                            
      "pc": "0x000055df79196cf9",                                                
      "symbol": "node::OnFatalError(char const*, char const*) [./out/Release/node]"
    },                                                                           
    {                                                                            
      "pc": "0x000055df7932e192",                                                
      "symbol": "v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [./out/Release/node]"
    },                                                                                                                                                 

The stack reported to stderr is:

Writing Node.js report to file: report.20191008.104845.16723.0.001.json
Node.js report completed
 1: 0x55df791944d0 node::Abort() [./out/Release/node]
 2: 0x55df79196cae node::OnFatalError(char const*, char const*) [./out/Release/node]
 3: 0x55df7932e192 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [./out/Release/node]                                                               
 4: 0x55df7932e51f v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [./out/Release/node]                                                 
 5: 0x55df794ddb65  [./out/Release/node]

Which is basically the same. The first c++ stack is to the code writing the report, the second is to the Abort() call which is about to call abort() (the C library call).

I don't get a JS stack trace on stderr:

<--- JS stacktrace --->                                                
Cannot get stack trace in GC.
FATAL ERROR: MarkCompactCollector: semi-space copy, fallback in old gen Allocation failed - JavaScript heap out of memory

or in report:

  "javascriptStack": {
    "message": "No stack.",
    "stack": [
      "Unavailable."
    ]
  },

@boneskull I guess in your code you found that there was JS trace on stderr, but not in the report?

I'm rerunning without the --report-on-fatalerror to see if it causes the report not be generated, but it sure looks like its respected: https://github.com/nodejs/node/blob/cbd8d715b2286e5726e6988921f5c870cbf74127/src/node_errors.cc#L437-L440

@sam-github
Copy link

Ah, ignore above, @boneskull I think @richardlau answered your question?

@richardlau
Copy link
Member

I think the lack of a JavaScript stack is related to @addaleax's observation in nodejs/node#29601 (comment) that

env == nullptr here for fatal errors, because V8 has no JS Context entered when the fatal error comes from garbage collection, and we associated Node.js instances with JS Contexts

@boneskull
Copy link
Author

@boneskull I guess in your code you found that there was JS trace on stderr, but not in the report?

@sam-github Yes, in the sample code above it does print the "JS Stack Trace", but if I use the while (true) in the original node-report sample, it does not.

@gireeshpunathil
Copy link
Member

Trying to summarize the discussion so far:

  1. Is the above information practically useful? It does not show the root cause of the OOM fatal, but only where the process ran OOM

Yes, in a subset of the situations.

  1. If it is useful, should it be in the report?
    I guess it here refers to the root cause of the OOM, so the answer is Yes, but the info is not readily available anywhere, it requires analysis of heap residents and their inter-relations.

  2. If it should be in the report, how can it be trapped and put into the report?

see answer to 2

  1. Is there a bug in the implementation wherein --report-on-fatalerror is always enabled?

Yes. It is now resolved by nodejs/node#32207

@sam-github
Copy link

nodejs/node#32497 <--- these 3 options were not always respected and would drop back to defaults, fix is PRed.

@gireeshpunathil
Copy link
Member

just trying to figure out if there is anything outstanding.

@github-actions
Copy link

This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made.

@zen0wu
Copy link

zen0wu commented Aug 23, 2020

Just tested on 14.8.0 with a OOM, there's still no JS stacktrace in the report.

Maybe this is related to nodejs/node#33532. Given both PR fixes are released only into v14, but v14's JS stacktrace is completely broken in OOM at least, so there's no JS stack trace reported anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants