Skip to content
This repository has been archived by the owner on Jun 29, 2023. It is now read-only.

Classloader leak can lead to PermGen/Metaspace OutOfMemoryError #104

Closed
artemy-osipov opened this issue Jan 23, 2017 · 9 comments
Closed
Labels
type: bug A general bug

Comments

@artemy-osipov
Copy link

I'm using logstash-gelf-1.11.0 configured via biz.paluch.logging.gelf.log4j.GelfLogAppender and deployed with war. As the application server is used Wildfly 10.1 but it must be reproduced in any server with worker pool.

After send log messages and redeploy application classes cannot be garbage collected due to ThreadLocal's that hold reference on custom class:

Because of this PermGen/Metaspace OutOfMemoryError can be occurred
screen from Eclipse MAT:
leak

I think the simplest solution is not to use ThreadLocal's for these cases and always create objects or implement ObjectPool.
I don't know which way is more appropriate, so I don't sent pull request.

If don't use ThreadLocal's then will be some performance degradation (actually such result is not representative due to big errors).

With ThreadLocal's
# Run complete. Total time: 00:01:04

Benchmark                                                      Mode  Cnt     Score      Error  Units
GelfMessageAssemblerPerf.createMessage                         avgt    5    97.158 ±   24.867  ns/op
GelfMessageAssemblerPerf.createMessageToJSON                   avgt    5  1716.895 ± 2933.147  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledDirect       avgt    5   507.275 ±   66.002  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledHeap         avgt    5   584.729 ±   90.140  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBuffer              avgt    5  1676.081 ± 2678.656  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBufferPooledHeap    avgt    5   589.000 ±   62.012  ns/op
GelfMessageAssemblerPerf.createMessageToTCPPooledDirect        avgt    5   513.860 ±   42.689  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBuffer              avgt    5  9812.872 ± 3898.791  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledDirect  avgt    5  7447.911 ±  502.219  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledHeap    avgt    5  7433.450 ±  331.027  ns/op

Without ThreadLocal's
# Run complete. Total time: 00:01:04

Benchmark                                                      Mode  Cnt     Score      Error  Units
GelfMessageAssemblerPerf.createMessage                         avgt    5   107.833 ±   39.569  ns/op
GelfMessageAssemblerPerf.createMessageToJSON                   avgt    5  1755.542 ± 3001.065  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledDirect       avgt    5   536.674 ±  147.928  ns/op
GelfMessageAssemblerPerf.createMessageToJSONPooledHeap         avgt    5   581.223 ±  146.843  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBuffer              avgt    5  1632.048 ± 2789.088  ns/op
GelfMessageAssemblerPerf.createMessageToTCPBufferPooledHeap    avgt    5   616.336 ±  137.137  ns/op
GelfMessageAssemblerPerf.createMessageToTCPPooledDirect        avgt    5   538.507 ±   84.502  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBuffer              avgt    5  9884.823 ± 4631.045  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledDirect  avgt    5  9906.995 ± 3867.064  ns/op
GelfMessageAssemblerPerf.createMessageToUDPBufferPooledHeap    avgt    5  9888.346 ± 4327.754  ns/op
@mp911de
Copy link
Owner

mp911de commented Jan 23, 2017

Thanks for the bug report. Not using pooling causes a lot of garbage and negatively affects performance. Have you tried setting logstash-gelf.buffer.size to 0? (System property, disables buffering).

@mp911de mp911de added the type: bug A general bug label Jan 23, 2017
@artemy-osipov
Copy link
Author

When logstash-gelf.buffer.size=0 and logstash-gelf.message.pooling=false leak doesn't reproduced, but can the setting of these parameters lead to performance degradation and increase count of objects?
What do you think about using object pool instead of ThreadLocal's or about hook/method that clear all resources?

@mp911de
Copy link
Owner

mp911de commented Jan 23, 2017

Using an object pool requires several invasive changes and tbh runtime class reloading isn't something that is in the primary usage scope. Disabling pooling/buffering falls back to performance characteristics of 1.10 with the difference that 1.11 uses an own, improved JSON encoder.

@artemy-osipov
Copy link
Author

ok. Thank you for help and for library

mp911de added a commit that referenced this issue Mar 15, 2017
logstash-gelf now uses a PoolHolder instead of static final ThreadLocal instances to hold references to ThreadLocal's and the associated objects. Using an instance to hold ThreadLocal instances breaks the cycle so objects can be collected by the GC if the ClassLoader is no longer in use (i.e. because of a hot-redeploy without restarting the JVM).

The object root and the PoolHolder are referenced from PoolingGelfMessageBuilder which is created per log appender/handler instance or as static instance (default setting). Holding pools per instance can increase memory usage. Pooling mode can be controlled via logstash-gelf.message.pooling (Environment variable, System property).
@mp911de
Copy link
Owner

mp911de commented Mar 15, 2017

I pushed an updated version of the library with configurable pooling options via logstash-gelf.message.pooling (Env-variable, System Property). Default behavior holds ThreadLocals statically to retain current behavior. You can set logstash-gelf.message.pooling to true so to keep ThreadLocals per instance, actually per log appender/log handler. If the log handler is discarded, the instance is released and able to be garbage collected.

Instance-held pools can consume more memory initially but the ClassLoader leak should be fixed. Care to give 1.11.1-SNAPSHOT a try?

@mp911de mp911de added this to the logstash-gelf 1.11.1 milestone Mar 15, 2017
@mp911de mp911de closed this as completed Mar 18, 2017
@ozeray
Copy link

ozeray commented Apr 3, 2018

Although I set logstash-gelf.message.pooling to false, I am still having problem when redeploying in Tomcat (failure of garbage collection due to GelfUDPSender pooling threads).

I explicitly call

GelfSenderFactory.removeAllAddedSenderProviders();
MDC.clear();
final LoggerContext loggerContext = LoggerContext.getContext();
loggerContext.close();

with no success.

Do have any idea how I can overcome this?

Thanks in advance.

(I use version 1.11.2)

@mp911de
Copy link
Owner

mp911de commented Apr 4, 2018

Seems that something is not cleaned up properly. @ozeray Care to file a new ticket along a heap dump?

@ozeray
Copy link

ozeray commented Apr 4, 2018

Hello Mark,

Thanks for your quick reply.

Our heap dumps size up to 200MB. Sorry for that.

I am trying to inject my own implementation of GelfUDPSender. Is there a simple and elegant way to do that?

@mp911de
Copy link
Owner

mp911de commented Apr 4, 2018

200MB doesn't really sound as if that would be the culprit. Senders do not hold any static references. That said, if the instances are shut down properly by your logging framework, all should be fine.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

3 participants