Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible Segfault in HTTP2 module #48567

Closed
JoshuaWise opened this issue Jun 26, 2023 · 9 comments
Closed

Reproducible Segfault in HTTP2 module #48567

JoshuaWise opened this issue Jun 26, 2023 · 9 comments
Labels
http2 Issues or PRs related to the http2 subsystem.

Comments

@JoshuaWise
Copy link

Version

v18.16.1

Platform

Darwin Kernel Version 19.6.0

Subsystem

none

What steps will reproduce the bug?

I can reliably reproduce a Segmentation fault by using the core http2 module.

Step 1: Acquire a TLS/SSL public+private key pair

I followed this guide in the Node.js documentation to create a self-signed certificate.

Step 2: Run this script to start an HTTP/2 server:

'use strict';
const fs = require('fs');
const http2 = require('http2');

(async function main() {
	const sockets = new Set();
	const sessions = new Set();
	const server = http2.createSecureServer({
		cert: fs.readFileSync('test-cert.pem'),
		key: fs.readFileSync('test-key.pem'),
	});

	server.on('request', (req, res) => {
		const message = 'hello world.';
		res.setHeader('Content-Length', String(Buffer.byteLength(message)));
		res.setHeader('Content-Type', 'text/plain; charset=utf-8');
		res.writeHead(200);
		res.end(message);
	});
	server.on('connection', (socket) => {
		sockets.add(socket);
		socket.on('close', () => sockets.delete(socket));
		socket.setTimeout(2000, () => {
			console.log('socket timeout');
			socket.resetAndDestroy();
		});
	});
	server.on('session', (session) => {
		sessions.add(session);
		session.on('close', () => sessions.delete(session));
	});

	async function gracefullyClose() {
		for (const session of sessions) {
			session.close();
		}
		await new Promise(resolve => server.close(resolve));
	}

	await new Promise(resolve => server.listen(443, resolve));
	console.log('Listening at https://localhost:443');

	process.once('SIGINT', () => {
		console.log('closing...');
		gracefullyClose().then(() => console.log('done'));
	});
})();

Step 3:

Open up Google Chrome (or probably any browser or HTTP client) and open the URL: https://localhost:443. You should get a hello world. plain text response. The only purpose of this step is to open an HTTP/2 session on the server. You don't really need Google Chrome specifically.

Step 4:

Wait for the Node.js server to print socket timeout in the console. This means the there was no activity on the socket for 2 seconds and so the socket was destroyed.

Step 5:

Close the Node.js server by triggering a SIGINT (i.e., press ctrl+c in the console). This will initiate a graceful shutdown of the server, which involves calling session.close() on every open session. This will cause the program to crash and print Segmentation fault: 11.

How often does it reproduce? Is there a required condition?

I can reliably reproduce it 100% of the time.

What is the expected behavior? Why is that the expected behavior?

The expected behavior is for the session.close() to not cause a segfault. Additionally, when the socket times out and socket.resetAndDestroy() is called, the session's close event should be emitted (which is currently not happening). The current behavior contradicts the Node.js docs:

When either the Socket or the Http2Session are destroyed, both will be destroyed.

What do you see instead?

Segmentation fault: 11

Additional information

I know some people might suggest that I call setTimeout on the session instead of the socket, and that I use session.destroy() instead of socket.resetAndDestroy(). However, that isn't really an acceptable solution, because I need to close ALL connections that are established, even if an HTTP/2 session hasn't yet been established on that connection.

@JoshuaWise
Copy link
Author

JoshuaWise commented Jun 27, 2023

Instead of using Google Chrome to open an HTTP/2 session, you can paste this code into the same script (at the end of the main function):

const client = http2.connect('https://localhost:443', { rejectUnauthorized: false });
client.on('error', err => console.error(err));
const req = client.request({ ':path': '/' });
req.on('response', headers => console.log('response status ' + headers[':status']));
req.on('data', () => {});
req.on('end', () => console.log('response ended'));
req.end();

@atlowChemi atlowChemi added the http2 Issues or PRs related to the http2 subsystem. label Jun 27, 2023
@RaisinTen
Copy link
Contributor

I added the HTTP2 client request code at the end of main and replaced the process.once('SIGINT' with a setTimeout with a delay of 5000 and I was able to produce this stack trace using lldb:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
    frame #0: 0x000000010020a66c node`node::StreamBase::Write(this=0x0000000000000000, bufs=<unavailable>, count=<unavailable>, send_handle=0x0000000000000000, req_wrap_obj=(val_ = 0x0000000000000000), skip_try_write=<unavailable>) at stream_base.cc:0 [opt]
   354 	         !StringBytes::Size(isolate, string, enc).To(&storage_size)) ||
   355 	          !StringBytes::StorageSize(isolate, string, enc).To(&storage_size)) {
   356 	    return -1;
-> 357 	  }
   358
   359 	  if (storage_size > INT_MAX)
   360 	    return UV_ENOBUFS;
Note: this address is compiler-generated code in function node::StreamBase::Write(uv_buf_t*, unsigned long, uv_stream_s*, v8::Local<v8::Object>, bool) that has no source code associated with it.
Target 0: (node) stopped.
warning: node was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
  * frame #0: 0x000000010020a66c node`node::StreamBase::Write(this=0x0000000000000000, bufs=<unavailable>, count=<unavailable>, send_handle=0x0000000000000000, req_wrap_obj=(val_ = 0x0000000000000000), skip_try_write=<unavailable>) at stream_base.cc:0 [opt]
    frame #1: 0x00000001002932ff node`node::crypto::TLSWrap::EncOut(this=0x0000000127a047d0) at crypto_tls.cc:611:48 [opt]
    frame #2: 0x0000000100294be9 node`node::crypto::TLSWrap::DoWrite(this=0x0000000127a047d0, w=<unavailable>, bufs=<unavailable>, count=2, send_handle=<unavailable>) at crypto_tls.cc:1017:3 [opt]
    frame #3: 0x000000010020a7ab node`node::StreamBase::Write(this=0x0000000127a047d0, bufs=<unavailable>, count=<unavailable>, send_handle=0x0000000000000000, req_wrap_obj=(val_ = 0x000000010680cc50), skip_try_write=<unavailable>) at stream_base.cc:116:9 [opt]
    frame #4: 0x000000010014d4b7 node`node::http2::Http2Session::SendPendingData(this=0x000000012783c960) at node_http2.cc:1795:48 [opt]
    frame #5: 0x000000010014cf28 node`node::http2::Http2Session::Close(this=0x000000012783c960, code=0, socket_closed=<unavailable>) at node_http2.cc:693:5 [opt]
    frame #6: 0x0000000100346e5c node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) [inlined] v8::internal::FunctionCallbackArguments::Call(this=<unavailable>, handler=<unavailable>) at api-arguments-inl.h:146:3 [opt]
    frame #7: 0x0000000100346dd6 node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(isolate=0x0000000105d14000, new_target=<unavailable>, fun_data=<unavailable>, receiver=<unavailable>, argv=<unavailable>, argc=<unavailable>) at builtins-api.cc:113:36 [opt]
    frame #8: 0x00000001003464d9 node`v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) at builtins-api.cc:144:5 [opt]
    frame #9: 0x0000000100346452 node`v8::internal::Builtin_HandleApiCall(args_length=<unavailable>, args_object=<unavailable>, isolate=0x0000000105d14000) at builtins-api.cc:135:1 [opt]
    frame #10: 0x0000000100cbda36 node`Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit + 54
    frame #11: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #12: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #13: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #14: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #15: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #16: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #17: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #18: 0x0000000100c2f51c node`Builtins_InterpreterEntryTrampoline + 220
    frame #19: 0x0000000100c2d8dc node`Builtins_JSEntryTrampoline + 92
    frame #20: 0x0000000100c2d603 node`Builtins_JSEntry + 131
    frame #21: 0x0000000100450870 node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [inlined] v8::internal::GeneratedCode<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long**>::Call(this=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>) at simulator.h:154:12 [opt]
    frame #22: 0x000000010045086e node`v8::internal::(anonymous namespace)::Invoke(isolate=0x0000000105d14000, params=<unavailable>)::InvokeParams const&) at execution.cc:427:33 [opt]
    frame #23: 0x0000000100450025 node`v8::internal::Execution::Call(isolate=0x0000000105d14000, callable=<unavailable>, receiver=<unavailable>, argc=1, argv=0x00007ff7bfefede8) at execution.cc:529:10 [opt]
    frame #24: 0x00000001002fbfa7 node`v8::Function::Call(this=0x000000010683e6c0, context=<unavailable>, recv=(val_ = 0x000000010683e5c0), argc=1, argv=0x00007ff7bfefede8) at api.cc:5547:7 [opt]
    frame #25: 0x00000001000b929e node`node::Environment::RunTimers(handle=0x000000010681ead0) at env.cc:1263:15 [opt]
    frame #26: 0x0000000100c0b746 node`uv__run_timers(loop=0x0000000104375e70) at timer.c:178:5 [opt]
    frame #27: 0x0000000100c0fa1f node`uv_run(loop=0x0000000104375e70, mode=UV_RUN_DEFAULT) at core.c:465:5 [opt]
    frame #28: 0x0000000100044dc3 node`node::SpinEventLoopInternal(env=0x000000010681ea00) at embed_helpers.cc:41:7 [opt]
    frame #29: 0x0000000100169018 node`node::NodeMainInstance::Run(this=<unavailable>, exit_code=0x00007ff7bfeff0ac, env=0x000000010681ea00) at node_main_instance.cc:106:9 [opt]
    frame #30: 0x0000000100168dac node`node::NodeMainInstance::Run(this=<unavailable>) at node_main_instance.cc:86:3 [opt]
    frame #31: 0x00000001000e194d node`node::LoadSnapshotDataAndRun(snapshot_data_ptr=<unavailable>, result=<unavailable>) at node.cc:1208:29 [opt]
    frame #32: 0x00000001000e1ba2 node`node::Start(int, char**) [inlined] node::StartInternal(argc=<unavailable>, argv=<unavailable>) at node.cc:1259:10 [opt]
    frame #33: 0x00000001000e1a9c node`node::Start(argc=<unavailable>, argv=<unavailable>) at node.cc:1266:27 [opt]
    frame #34: 0x00007ff811c3541f dyld`start + 1903

Since the crash site in the stack trace doesn't make a lot of sense, I added some logs and recompiled and this is the more understandable stack trace:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
    frame #0: 0x0000000100209645 node`node::StreamBase::Write(uv_buf_t*, unsigned long, uv_stream_s*, v8::Local<v8::Object>, bool) [inlined] node::StreamBase::stream_env(this=0x0000000000000000) const at stream_base.h:336:44 [opt]
   333
   334 	  // This is named `stream_env` to avoid name clashes, because a lot of
   335 	  // subclasses are also `BaseObject`s.
-> 336 	  Environment* stream_env() const { return env_; }
   337
   338 	  // TODO(RaisinTen): Update the return type to a Maybe, so that we can indicate
   339 	  // if there is a pending exception/termination.
Target 0: (node) stopped.
warning: node was compiled with optimization - stepping may behave oddly; variables may not be available.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
  * frame #0: 0x0000000100209645 node`node::StreamBase::Write(uv_buf_t*, unsigned long, uv_stream_s*, v8::Local<v8::Object>, bool) [inlined] node::StreamBase::stream_env(this=0x0000000000000000) const at stream_base.h:336:44 [opt]
    frame #1: 0x0000000100209645 node`node::StreamBase::Write(this=0x0000000000000000, bufs=0x00007ff7bfefdee0, count=1, send_handle=0x0000000000000000, req_wrap_obj=(val_ = 0x0000000000000000), skip_try_write=<unavailable>) at stream_base.cc:90:22 [opt]
    frame #2: 0x000000010029301f node`node::crypto::TLSWrap::EncOut(this=0x00000001279072a0) at crypto_tls.cc:611:48 [opt]
    frame #3: 0x0000000100294909 node`node::crypto::TLSWrap::DoWrite(this=0x00000001279072a0, w=<unavailable>, bufs=<unavailable>, count=2, send_handle=<unavailable>) at crypto_tls.cc:1017:3 [opt]
    frame #4: 0x0000000100209a78 node`node::StreamBase::Write(this=0x00000001279072a0, bufs=<unavailable>, count=<unavailable>, send_handle=0x0000000000000000, req_wrap_obj=(val_ = 0x000000012800a250), skip_try_write=<unavailable>) at stream_base.cc:124:9 [opt]
    frame #5: 0x000000010014c4b7 node`node::http2::Http2Session::SendPendingData(this=0x0000000105f314c0) at node_http2.cc:1795:48 [opt]
    frame #6: 0x000000010014bf28 node`node::http2::Http2Session::Close(this=0x0000000105f314c0, code=0, socket_closed=<unavailable>) at node_http2.cc:693:5 [opt]
    frame #7: 0x0000000100346b7c node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, unsigned long*, int) [inlined] v8::internal::FunctionCallbackArguments::Call(this=<unavailable>, handler=<unavailable>) at api-arguments-inl.h:146:3 [opt]
    frame #8: 0x0000000100346af6 node`v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(isolate=0x0000000105d14000, new_target=<unavailable>, fun_data=<unavailable>, receiver=<unavailable>, argv=<unavailable>, argc=<unavailable>) at builtins-api.cc:113:36 [opt]
    frame #9: 0x00000001003461f9 node`v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) at builtins-api.cc:144:5 [opt]
    frame #10: 0x0000000100346172 node`v8::internal::Builtin_HandleApiCall(args_length=<unavailable>, args_object=<unavailable>, isolate=0x0000000105d14000) at builtins-api.cc:135:1 [opt]
    frame #11: 0x0000000100cbd776 node`Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit + 54
    frame #12: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #13: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #14: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #15: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #16: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #17: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #18: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #19: 0x0000000100c2f25c node`Builtins_InterpreterEntryTrampoline + 220
    frame #20: 0x0000000100c2d61c node`Builtins_JSEntryTrampoline + 92
    frame #21: 0x0000000100c2d343 node`Builtins_JSEntry + 131
    frame #22: 0x0000000100450590 node`v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) [inlined] v8::internal::GeneratedCode<unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long**>::Call(this=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>, args=<unavailable>) at simulator.h:154:12 [opt]
    frame #23: 0x000000010045058e node`v8::internal::(anonymous namespace)::Invoke(isolate=0x0000000105d14000, params=<unavailable>)::InvokeParams const&) at execution.cc:427:33 [opt]
    frame #24: 0x000000010044fd45 node`v8::internal::Execution::Call(isolate=0x0000000105d14000, callable=<unavailable>, receiver=<unavailable>, argc=1, argv=0x00007ff7bfefede8) at execution.cc:529:10 [opt]
    frame #25: 0x00000001002fbcc7 node`v8::Function::Call(this=0x00000001060320c0, context=<unavailable>, recv=(val_ = 0x0000000106031fc0), argc=1, argv=0x00007ff7bfefede8) at api.cc:5547:7 [opt]
    frame #26: 0x00000001000b829e node`node::Environment::RunTimers(handle=0x000000010600fcd0) at env.cc:1263:15 [opt]
    frame #27: 0x0000000100c0b466 node`uv__run_timers(loop=0x0000000104375e70) at timer.c:178:5 [opt]
    frame #28: 0x0000000100c0f73f node`uv_run(loop=0x0000000104375e70, mode=UV_RUN_DEFAULT) at core.c:465:5 [opt]
    frame #29: 0x0000000100043dc3 node`node::SpinEventLoopInternal(env=0x000000010600fc00) at embed_helpers.cc:41:7 [opt]
    frame #30: 0x0000000100168018 node`node::NodeMainInstance::Run(this=<unavailable>, exit_code=0x00007ff7bfeff0ac, env=0x000000010600fc00) at node_main_instance.cc:106:9 [opt]
    frame #31: 0x0000000100167dac node`node::NodeMainInstance::Run(this=<unavailable>) at node_main_instance.cc:86:3 [opt]
    frame #32: 0x00000001000e094d node`node::LoadSnapshotDataAndRun(snapshot_data_ptr=<unavailable>, result=<unavailable>) at node.cc:1208:29 [opt]
    frame #33: 0x00000001000e0ba2 node`node::Start(int, char**) [inlined] node::StartInternal(argc=<unavailable>, argv=<unavailable>) at node.cc:1259:10 [opt]
    frame #34: 0x00000001000e0a9c node`node::Start(argc=<unavailable>, argv=<unavailable>) at node.cc:1266:27 [opt]
    frame #35: 0x00007ff811c3541f dyld`start + 1903

It means that when

StreamWriteResult res = underlying_stream()->Write(bufs, count);
executes
Environment* env = stream_env();
, the this value of the StreamBase instance is set to nullptr and so, accessing the
Environment* env_;
member variable causes a crash.

To reconfirm my claim, I tried printing the return value of underlying_stream() before

StreamWriteResult res = underlying_stream()->Write(bufs, count);
and it is indeed nullptr.

I tried adding an early return if underlying_stream() is nullptr and it fixed the issue for me although I'm not sure if this is the right fix:

diff --git a/src/crypto/crypto_tls.cc b/src/crypto/crypto_tls.cc
index cc2f6d60e7..a06ac7436a 100644
--- a/src/crypto/crypto_tls.cc
+++ b/src/crypto/crypto_tls.cc
@@ -608,6 +608,7 @@ void TLSWrap::EncOut() {
     buf[i] = uv_buf_init(data[i], size[i]);

   Debug(this, "Writing %zu buffers to the underlying stream", count);
+  if (underlying_stream() == nullptr) return;
   StreamWriteResult res = underlying_stream()->Write(bufs, count);
   if (res.err != 0) {
     InvokeQueued(res.err);

Feel free to send a PR with this diff (and tests) if you want!

@bnoordhuis
Copy link
Member

Before you send a PR, I think the right question to ask first is: why is it nullptr?

@JoshuaWise
Copy link
Author

Besides the segfault, another problem is that the Http2Session isn't being closed when the underlying socket is closed (contrary to what the Node.js docs say will happen). This can be considered a memory leak.

@ywave620
Copy link
Contributor

ywave620 commented Aug 11, 2023

@JoshuaWise The two issuses

  1. segfault when trying to close a HTTP2Session whose underlying TLSSocket has been destroyed
  2. the Http2Session isn't being closed when the underlying TLSSocket is closed

you are suffering are due to the minor misuse of the http2 server in your code.

You should not manipulate a socket emitted from the 'connection' event on a tls server. For your purpose, I suggest

server.on('secureConnection', (tlssock) => {
		sockets.add(tlssock);
		tlssock.on('close', () => sockets.delete(tlssock));
		tlssock.setTimeout(2000, () => {
			tlssock.destroy();
		});
	});

However, there is something node needs to improve. At least there should be no segfault

@JoshuaWise
Copy link
Author

@ywave620 your suggested solution doesn't really work for me, because clients can create idle TCP connections without completing the TLS handshake (in which case they won't be emitted by secureConnection), and I need to detect those connections as well. In fact, I've already observed that Chrome does exactly this when communicating with my web server: it opens multiple idle TCP connections and just lets them hang around, without using them. My guess is that it does this as an eager way of having connections ready in case it needs them, as an optimization. I need to destroy such connections.

@JoshuaWise
Copy link
Author

JoshuaWise commented Aug 18, 2023

@ywave620 This is for an http2.Http2SecureServer, not an https.Server. But even if I could use headersTimeout, that's not the desired behavior. I don't want to destroy idle connections while the server is running. I only want to destroy idle connections while gracefully shutting down the server. Even if I tried using headersTimeout, I would have to set the timeout to 20-30 seconds in order to avoid negatively affecting connections while the server is running, but during a graceful shutdown I want to close idle connections instantly, without waiting at all (by checking socket.bytesRead === 0 and calling socket.resetAndDestroy()).

@pimterry
Copy link
Member

pimterry commented Sep 1, 2023

I think this is fixed by #49327, which was just merged, and definitely does fix another very similar segfault: #49307. Can you try that out and see if it works for you @JoshuaWise?

@JoshuaWise
Copy link
Author

@pimterry Yup it looks like this issue is fixed in Node.js v20.8.0. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http2 Issues or PRs related to the http2 subsystem.
Projects
None yet
Development

No branches or pull requests

6 participants