Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on connection close #1452

Closed
marcioapm opened this issue Mar 9, 2016 · 8 comments
Closed

Segfault on connection close #1452

marcioapm opened this issue Mar 9, 2016 · 8 comments

Comments

@marcioapm
Copy link
Contributor

0.7.28
Ubuntu 64-bit
libevent2

Whenever a connection is closed, logs get spammed with the following forever:

[781D1FE4:00000000 2016.03.09 11:23:26.491 dia] Actively closing TCP connection
core.exception.InvalidMemoryOperationError@src/core/exception.d(679): Invalid memory operation
----------------
core.exception.InvalidMemoryOperationError@src/core/exception.d(679): Invalid memory operation
----------------
core.exception.InvalidMemoryOperationError@src/core/exception.d(679): Invalid memory operation
----------------
...

Will stay on 0.7.25 for now

@s-ludwig
Copy link
Member

s-ludwig commented Mar 9, 2016

Do you have a reproduction case for this? I couldn't directly reproduce it and there are several active connection closes in the test suite, but all seem to pass.

@marcioapm
Copy link
Contributor Author

Hmm, I don't have an isolated one, this is just running our main web app. It happens for example, if the server fails to listen on a port (because it's already in use), or if I CTRL+C while it's listening.

This started in 0.7.27, and was also one of the reasons I didn't upgrade then.

Basically we spawn a background thread sending HTTPS requests every 15 secs - because of leaks in 0.7.25 we are manually sending header Connection: close on these.
We load a bunch of data from the DB using mysql-lited before starting the webserver.
We finally open the listen socket.

I also tried to debug it myself but there are no callstacks...

Here is a full log - maybe it will ring a bell:

[C9FE3F56:00000000 2016.03.09 12:05:57.885 INF] DEBUG MODE *ON*
[C9FE3F56:00000000 2016.03.09 12:05:57.885 dbg] Initializing OpenSSL...
[C9FE3F56:00000000 2016.03.09 12:05:57.887 dbg] ... done.
[CA003956:00000000 2016.03.09 12:05:57.897 dia] libevent version: 2.0.21-stable
[CA003756:00000000 2016.03.09 12:05:57.897 dia] libevent version: 2.0.21-stable
[CA003A56:00000000 2016.03.09 12:05:57.897 dia] libevent version: 2.0.21-stable
[CA003B56:00000000 2016.03.09 12:05:57.897 dia] libevent version: 2.0.21-stable
[C9FE3F56:00000000 2016.03.09 12:05:57.897 dbg] creating new Connection!(VibeSocket, cast(ConnectionOptions)3)* connection, all 0 are in use
[CA003956:00000000 2016.03.09 12:05:57.898 dia] libevent is using epoll for events.
[CA003756:00000000 2016.03.09 12:05:57.898 dia] libevent is using epoll for events.
[CA003956:00000000 2016.03.09 12:05:57.898 dbg] entering worker thread
[CA003756:00000000 2016.03.09 12:05:57.898 dbg] entering worker thread
[C9FE3F56:00000000 2016.03.09 12:05:57.898 dbg] dnsresolve sql.ebookingservices.com
[C9FE3F56:00000000 2016.03.09 12:05:57.898 dbg] dnsresolve ret
[CA003B56:00000000 2016.03.09 12:05:57.898 dia] libevent is using epoll for events.
[CA003956:00000000 2016.03.09 12:05:57.898 dbg] running event loop
[CA003956:00000000 2016.03.09 12:05:57.899 dbg] Starting event loop.
[CA003956:CA005356 2016.03.09 12:05:57.899 dbg] worker thread enter
[CA003956:CA005356 2016.03.09 12:05:57.899 dbg] worker thread loop enter
[CA003956:CA005356 2016.03.09 12:05:57.899 dbg] worker thread check
[CA003956:CA005356 2016.03.09 12:05:57.899 dbg] worker thread got task
[CA003756:00000000 2016.03.09 12:05:57.899 dbg] running event loop
[CA003756:00000000 2016.03.09 12:05:57.899 dbg] Starting event loop.
[CA003756:CA005156 2016.03.09 12:05:57.899 dbg] worker thread enter
[CA003A56:00000000 2016.03.09 12:05:57.899 dia] libevent is using epoll for events.
[CA003B56:00000000 2016.03.09 12:05:57.899 dbg] entering worker thread
[CA003756:CA005156 2016.03.09 12:05:57.900 dbg] worker thread loop enter
[CA003756:CA005156 2016.03.09 12:05:57.900 dbg] worker thread check
[CA003A56:00000000 2016.03.09 12:05:57.900 dbg] entering worker thread
[CA003956:CA00FB56 2016.03.09 12:05:57.900 dbg] Create HTTP client pool metrics-api.librato.com:443 true proxy :0
[CA003956:CA00FB56 2016.03.09 12:05:57.900 dbg] creating new HTTPClient connection, all 0 are in use
[CA003B56:00000000 2016.03.09 12:05:57.900 dbg] running event loop
[CA003B56:00000000 2016.03.09 12:05:57.900 dbg] Starting event loop.
[CA003A56:CA00F556 2016.03.09 12:05:57.901 dbg] worker thread enter
[CA003A56:CA00F556 2016.03.09 12:05:57.901 dbg] worker thread loop enter
[CA003A56:CA00F556 2016.03.09 12:05:57.901 dbg] worker thread check
[CA003956:CA00FB56 2016.03.09 12:05:57.901 dbg]  ... 7F56CA00D700
[CA003956:CA00FB56 2016.03.09 12:05:57.901 dbg] Now got 1 connections
[CA003956:CA005356 2016.03.09 12:05:57.901 dbg] worker thread check
[C9FE3F56:00000000 2016.03.09 12:05:57.941 dbg] Socket event on fd 30: 128 (2B122E8 vs 2B122E8)
[C9FE3F56:00000000 2016.03.09 12:05:58.027 dbg]  ... 7F56CA002A00
[C9FE3F56:00000000 2016.03.09 12:05:58.027 dbg] Now got 1 connections
[C9FE3F56:00000000 2016.03.09 12:05:58.375 WRN] currencies loaded in 43ms
[C9FE3F56:00000000 2016.03.09 12:05:58.379 dbg] creating new MongoConnection connection, all 0 are in use
[C9FE3F56:00000000 2016.03.09 12:05:58.379 dbg] dnsresolve nosql.ebookingservices.com
[C9FE3F56:00000000 2016.03.09 12:05:58.379 dbg] dnsresolve ret
[C9FE3F56:00000000 2016.03.09 12:05:58.421 dbg] Socket event on fd 36: 128 (2B1CCA8 vs 2B1CCA8)
[C9FE3F56:00000000 2016.03.09 12:05:58.505 dbg]  ... 7F56C9FE9000
[C9FE3F56:00000000 2016.03.09 12:05:58.505 dbg] Now got 1 connections
[C9FE3F56:00000000 2016.03.09 12:05:58.556 dbg] Create HTTP client pool metrics-api.librato.com:443 true proxy :0
[C9FE3F56:00000000 2016.03.09 12:05:58.556 dbg] creating new HTTPClient connection, all 0 are in use
[C9FE3F56:00000000 2016.03.09 12:05:58.556 dbg]  ... 7F56C9FEE980
[C9FE3F56:00000000 2016.03.09 12:05:58.556 dbg] Now got 1 connections
[C9FE3F56:00000000 2016.03.09 12:05:58.556 dbg] dnsresolve metrics-api.librato.com
[C9FE3F56:00000000 2016.03.09 12:05:58.557 dbg] dnsresolve yield
[C9FE3F56:00000000 2016.03.09 12:05:58.741 dbg] dnsresolve ret
[C9FE3F56:00000000 2016.03.09 12:05:58.850 dbg] Socket event on fd 37: 128 (2AC25F8 vs 2AC25F8)
[C9FE3F56:00000000 2016.03.09 12:05:58.960 dbg] validate callback for /C=US/O=DigiCert Inc/CN=DigiCert SHA2 Secure Server CA
[C9FE3F56:00000000 2016.03.09 12:05:58.961 dbg] SSL cert initial error: unable to get local issuer certificate
[C9FE3F56:00000000 2016.03.09 12:05:58.961 dbg] SSL cert not trusted or unknown issuer: /C=US/O=DigiCert Inc/OU=www.digicert.com/CN=DigiCert Global Root CA
[C9FE3F56:00000000 2016.03.09 12:05:58.961 dbg] SSL validation result: 1 (0)
[C9FE3F56:00000000 2016.03.09 12:05:58.961 dbg] validate callback for /C=US/L=San Francisco/ST=California/O=Librato, Inc./CN=*.librato.com
[C9FE3F56:00000000 2016.03.09 12:05:58.961 dbg] SSL validation result: 1 (0)
[C9FE3F56:00000000 2016.03.09 12:05:59.195 dbg] add route GET *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route HEAD *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route PUT *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route POST *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route PATCH *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route DELETE *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route OPTIONS *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route TRACE *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route CONNECT *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route PROPFIND *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route PROPPATCH *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route MKCOL *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route COPY *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route MOVE *
[C9FE3F56:00000000 2016.03.09 12:05:59.196 dbg] add route LOCK *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route UNLOCK *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route VERSIONCONTROL *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route REPORT *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route CHECKOUT *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route CHECKIN *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route UNCHECKOUT *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route MKWORKSPACE *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route UPDATE *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route LABEL *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route MERGE *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route BASELINECONTROL *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route MKACTIVITY *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route ORDERPATCH *
[C9FE3F56:00000000 2016.03.09 12:05:59.197 dbg] add route ACL *
                  ... about 100 other routes here ...
[C9FE3F56:00000000 2016.03.09 12:05:59.210 dbg] dnsresolve 0.0.0.0
[C9FE3F56:00000000 2016.03.09 12:05:59.210 dbg] dnsresolve ret
[C9FE3F56:00000000 2016.03.09 12:05:59.210 INF] Listening for requests on http://0.0.0.0:8080/
[C9FE3F56:00000000 2016.03.09 12:05:59.210 dbg] Starting event loop.
[C9FE3F56:00000000 2016.03.09 12:06:05.334 INF] Received signal 2. Shutting down.
[CA003956:CA005356 2016.03.09 12:06:05.334 dbg] worker thread check
[CA003956:CA005356 2016.03.09 12:06:05.334 dbg] worker thread exit
[CA003956:CA005356 2016.03.09 12:06:05.334 dbg] Libevent2Driver.exitEventLoop called
[CA003B56:CA00F756 2016.03.09 12:06:05.334 dbg] worker thread check
[CA003B56:CA00F756 2016.03.09 12:06:05.334 dbg] worker thread exit
[CA003B56:CA00F756 2016.03.09 12:06:05.334 dbg] Libevent2Driver.exitEventLoop called
[CA003A56:CA00F556 2016.03.09 12:06:05.334 dbg] worker thread check
[CA003B56:00000000 2016.03.09 12:06:05.334 dbg] Event loop done.
[CA003B56:00000000 2016.03.09 12:06:05.334 dbg] Worker thread exit.
[CA003A56:CA00F556 2016.03.09 12:06:05.335 dbg] worker thread exit
[CA003A56:CA00F556 2016.03.09 12:06:05.335 dbg] Libevent2Driver.exitEventLoop called
[CA003A56:00000000 2016.03.09 12:06:05.335 dbg] Event loop done.
[CA003A56:00000000 2016.03.09 12:06:05.335 dbg] Worker thread exit.
[CA003956:00000000 2016.03.09 12:06:05.335 dbg] Event loop done.
[CA003956:00000000 2016.03.09 12:06:05.335 dbg] Worker thread exit.
[CA003756:CA005156 2016.03.09 12:06:05.335 dbg] worker thread check
[CA003756:CA005156 2016.03.09 12:06:05.335 dbg] worker thread exit
[CA003756:CA005156 2016.03.09 12:06:05.335 dbg] Libevent2Driver.exitEventLoop called
[CA003956:00000000 2016.03.09 12:06:05.335 dia] libevent version: 2.0.21-stable
[CA003B56:00000000 2016.03.09 12:06:05.336 dia] libevent version: 2.0.21-stable
[C9FE3F56:C9FEE356 2016.03.09 12:06:05.336 dbg] main thread exit
[C9FE3F56:C9FEE356 2016.03.09 12:06:05.336 dbg] Libevent2Driver.exitEventLoop called
[C9FE3F56:00000000 2016.03.09 12:06:05.336 dbg] Event loop done.
[C9FE3F56:00000000 2016.03.09 12:06:05.336 dia] Main thread still waiting for other threads: Vibe Task Worker #3 (worker thread)
[CA003B56:00000000 2016.03.09 12:06:05.336 dia] libevent is using epoll for events.
[CA003756:00000000 2016.03.09 12:06:05.336 dbg] Event loop done.
[CA003756:00000000 2016.03.09 12:06:05.336 dbg] Worker thread exit.
[CA003956:00000000 2016.03.09 12:06:05.336 dia] libevent is using epoll for events.
[CA003A56:00000000 2016.03.09 12:06:05.336 dia] libevent version: 2.0.21-stable
[CA003756:00000000 2016.03.09 12:06:05.337 dia] libevent version: 2.0.21-stable
[C9FE3F56:00000000 2016.03.09 12:06:05.337 dia] Main thread exiting
[CA003A56:00000000 2016.03.09 12:06:05.337 dia] libevent is using epoll for events.
[CA003756:00000000 2016.03.09 12:06:05.337 dia] libevent is using epoll for events.
[C9FE3F56:00000000 2016.03.09 12:06:05.340 dbg] TCP close request true open
[C9FE3F56:00000000 2016.03.09 12:06:05.340 dia] Actively closing TCP connection
core.exception.InvalidMemoryOperationError@src/core/exception.d(679): Invalid memory operation
----------------
core.exception.InvalidMemoryOperationError@src/core/exception.d(679): Invalid memory operation
----------------
core.exception.InvalidMemoryOperationError@src/core/exception.d(679): Invalid memory operation
----------------
...

Thanks Sönke!

@marcioapm
Copy link
Contributor Author

Also, the other obvious difference from my stable build is that it uses DMD 2.7.x instead of 2.69.2.
Though it's unlikely that would be the culprit here, right?

@s-ludwig
Copy link
Member

Interesting, so the connection close looks like it's done in a finalizer on program exit. It's really unfortunate that memory operation errors generally don't produce a stack trace. They are extremely involved to debug for that reason.

I wonder if the reason for the error appearing is that the TCPConnection is leaking whereas it wasn't with 0.7.25, or if there is something now that calls the GC that wasn't there before. Regarding that, do you have an idea who might be closing a TCP connection from either a finalizer, or from a module destructor? I did a quick search and couldn't find a place where this is done within vibe.d,

@marcioapm
Copy link
Contributor Author

mysql-lited's Connection struct closes it's TCPconnection in the finalizer. If I comment out this finalizer it no longer crashes. However, this doesn't seem right, and I am wondering if this will now break things during normal program execution i.e. leak socket descriptors?
Still wondering what changed, too...

s-ludwig added a commit that referenced this issue Mar 11, 2016
TCP connection should not crash the program if they are closed from within a finalizer. Instead, give a warning if such a class has leaked.
s-ludwig added a commit that referenced this issue Mar 11, 2016
.

Class instances that have leaked until Druntime shutdown and call Libevent2TCPConnection.close() would cause errors due to the event driver already being destroyed. This situation is detected now and an appropriate warning message is emitted to the console.
@s-ludwig
Copy link
Member

Okay, some more details: It happens because an assertion is fired due to the event driver already being shut down. I've proposed a fix in #1454, because this case seems rather easy to trigger, but at the same time the application needs to make sure that TCPConnections are not closed in GC finalizers, ideally all connections should be closed explicitly before shutdown (or not at all, letting the OS clean up). The same in theory applies to all other event primitives, they should never be used after shared static this ran.

@marcioapm
Copy link
Contributor Author

Nice! I can just safely drop the finalizer for now as it will this seems to fix my case.

See my comment on #1454 for additional concerns.

@s-ludwig
Copy link
Member

See my comment on #1454 for additional concerns.

Did you submit it? Can't see any comment over there.

marcioapm added a commit to eBookingServices/mysql-lited that referenced this issue Mar 14, 2016
s-ludwig added a commit that referenced this issue Apr 10, 2016
TCP connection should not crash the program if they are closed from within a finalizer. Instead, give a warning if such a class has leaked.

(cherry picked from commit 2c0adbc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants