Classloader leak can lead to PermGen/Metaspace OutOfMemoryError #104

artemy-osipov · 2017-01-23T13:24:42Z

I'm using logstash-gelf-1.11.0 configured via biz.paluch.logging.gelf.log4j.GelfLogAppender and deployed with war. As the application server is used Wildfly 10.1 but it must be reproduced in any server with worker pool.

After send log messages and redeploy application classes cannot be garbage collected due to ThreadLocal's that hold reference on custom class:

Because of this PermGen/Metaspace OutOfMemoryError can be occurred
screen from Eclipse MAT:

I think the simplest solution is not to use ThreadLocal's for these cases and always create objects or implement ObjectPool.
I don't know which way is more appropriate, so I don't sent pull request.

If don't use ThreadLocal's then will be some performance degradation (actually such result is not representative due to big errors).

With ThreadLocal's
# Run complete. Total time: 00:01:04

Benchmark                                                      Mode  Cnt     Score      Error  Units
GelfMessageAssemblerPerf.createMessage                         avgt    5    97.158 ±   24.867  ns/op
GelfMessageAssemblerPerf.createMessageToJSON                   avgt    5  1716.895 ± 2933.147  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledDirect       avgt    5   507.275 ±   66.002  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledHeap         avgt    5   584.729 ±   90.140  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBuffer              avgt    5  1676.081 ± 2678.656  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBufferPooledHeap    avgt    5   589.000 ±   62.012  ns/op
GelfMessageAssemblerPerf.createMessageToTCPPooledDirect        avgt    5   513.860 ±   42.689  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBuffer              avgt    5  9812.872 ± 3898.791  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledDirect  avgt    5  7447.911 ±  502.219  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledHeap    avgt    5  7433.450 ±  331.027  ns/op

Without ThreadLocal's
# Run complete. Total time: 00:01:04

Benchmark                                                      Mode  Cnt     Score      Error  Units
GelfMessageAssemblerPerf.createMessage                         avgt    5   107.833 ±   39.569  ns/op
GelfMessageAssemblerPerf.createMessageToJSON                   avgt    5  1755.542 ± 3001.065  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledDirect       avgt    5   536.674 ±  147.928  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledHeap         avgt    5   581.223 ±  146.843  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBuffer              avgt    5  1632.048 ± 2789.088  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBufferPooledHeap    avgt    5   616.336 ±  137.137  ns/op
GelfMessageAssemblerPerf.createMessageToTCPPooledDirect        avgt    5   538.507 ±   84.502  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBuffer              avgt    5  9884.823 ± 4631.045  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledDirect  avgt    5  9906.995 ± 3867.064  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledHeap    avgt    5  9888.346 ± 4327.754  ns/op

The text was updated successfully, but these errors were encountered:

mp911de · 2017-01-23T14:05:55Z

Thanks for the bug report. Not using pooling causes a lot of garbage and negatively affects performance. Have you tried setting logstash-gelf.buffer.size to 0? (System property, disables buffering).

artemy-osipov · 2017-01-23T16:02:36Z

When logstash-gelf.buffer.size=0 and logstash-gelf.message.pooling=false leak doesn't reproduced, but can the setting of these parameters lead to performance degradation and increase count of objects?
What do you think about using object pool instead of ThreadLocal's or about hook/method that clear all resources?

mp911de · 2017-01-23T17:27:56Z

Using an object pool requires several invasive changes and tbh runtime class reloading isn't something that is in the primary usage scope. Disabling pooling/buffering falls back to performance characteristics of 1.10 with the difference that 1.11 uses an own, improved JSON encoder.

artemy-osipov · 2017-01-23T19:22:28Z

ok. Thank you for help and for library

logstash-gelf now uses a PoolHolder instead of static final ThreadLocal instances to hold references to ThreadLocal's and the associated objects. Using an instance to hold ThreadLocal instances breaks the cycle so objects can be collected by the GC if the ClassLoader is no longer in use (i.e. because of a hot-redeploy without restarting the JVM). The object root and the PoolHolder are referenced from PoolingGelfMessageBuilder which is created per log appender/handler instance or as static instance (default setting). Holding pools per instance can increase memory usage. Pooling mode can be controlled via logstash-gelf.message.pooling (Environment variable, System property).

mp911de · 2017-03-15T21:07:35Z

I pushed an updated version of the library with configurable pooling options via logstash-gelf.message.pooling (Env-variable, System Property). Default behavior holds ThreadLocals statically to retain current behavior. You can set logstash-gelf.message.pooling to true so to keep ThreadLocals per instance, actually per log appender/log handler. If the log handler is discarded, the instance is released and able to be garbage collected.

Instance-held pools can consume more memory initially but the ClassLoader leak should be fixed. Care to give 1.11.1-SNAPSHOT a try?

ozeray · 2018-04-03T18:20:34Z

Although I set logstash-gelf.message.pooling to false, I am still having problem when redeploying in Tomcat (failure of garbage collection due to GelfUDPSender pooling threads).

I explicitly call

GelfSenderFactory.removeAllAddedSenderProviders();
MDC.clear();
final LoggerContext loggerContext = LoggerContext.getContext();
loggerContext.close();

with no success.

Do have any idea how I can overcome this?

Thanks in advance.

(I use version 1.11.2)

mp911de · 2018-04-04T07:11:27Z

Seems that something is not cleaned up properly. @ozeray Care to file a new ticket along a heap dump?

ozeray · 2018-04-04T07:30:25Z

Hello Mark,

Thanks for your quick reply.

Our heap dumps size up to 200MB. Sorry for that.

I am trying to inject my own implementation of GelfUDPSender. Is there a simple and elegant way to do that?

mp911de · 2018-04-04T08:53:16Z

200MB doesn't really sound as if that would be the culprit. Senders do not hold any static references. That said, if the instances are shut down properly by your logging framework, all should be fine.

mp911de added the type: bug A general bug label Jan 23, 2017

mp911de added this to the logstash-gelf 1.11.1 milestone Mar 15, 2017

mp911de closed this as completed Mar 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Classloader leak can lead to PermGen/Metaspace OutOfMemoryError #104

Classloader leak can lead to PermGen/Metaspace OutOfMemoryError #104

artemy-osipov commented Jan 23, 2017

mp911de commented Jan 23, 2017

artemy-osipov commented Jan 23, 2017

mp911de commented Jan 23, 2017

artemy-osipov commented Jan 23, 2017

mp911de commented Mar 15, 2017

ozeray commented Apr 3, 2018 •

edited

Loading

mp911de commented Apr 4, 2018

ozeray commented Apr 4, 2018

mp911de commented Apr 4, 2018

Classloader leak can lead to PermGen/Metaspace OutOfMemoryError #104

Classloader leak can lead to PermGen/Metaspace OutOfMemoryError #104

Comments

artemy-osipov commented Jan 23, 2017

mp911de commented Jan 23, 2017

artemy-osipov commented Jan 23, 2017

mp911de commented Jan 23, 2017

artemy-osipov commented Jan 23, 2017

mp911de commented Mar 15, 2017

ozeray commented Apr 3, 2018 • edited Loading

mp911de commented Apr 4, 2018

ozeray commented Apr 4, 2018

mp911de commented Apr 4, 2018

ozeray commented Apr 3, 2018 •

edited

Loading