You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 29, 2023. It is now read-only.
Hope you can help us with the following issue with logstash-gelf. We use
it in AWS environment to collect logs from a couple dozen instances; our
Graylog2 farm is behind an AWS Elastic LoadBalancer w/ TCP balancing.
For apps that log less often than the timeout specified on the ELB (60 s
by default) we experience total log event loss after the initial event
batch on application startup is sent successfully; sometimes we do see
the next batch of events make it through, and sometimes we don't.
Playing with keepAlive and deliveryAttempts had no effect.
I can't claim to completely understand what's going on, but my current
hypothesis, supported by observing network-level traffic with Wireshark,
is as follows:
Appender establishes TCP connection to the ELB and starts sending
messages.
After 60 s of inactivity, ELB sends us the FIN/ACK, and connection is
dropped (as evidenced by ACK from our side). For some reason, this fact
doesn't get propagated to the SocketChannel used by the appender.
If application logs an event after that, appender tries to reuse the
already dropped connection (as evidenced by a number of PSH/TCP
Retransmit messages). ELB sends RST in response, finally killing the
connection.
However, very often appender doesn't learn about that fact, as - I
guess due to NIO - all the bytes it wanted to send are already handed
off to the OS buffer and the call to socketChannel.write() had returned
by the time that RST arrives.
When next event arrives, connection failure is finally detected, a
new connection is established, the event is logged, and the cycle
repeats itself.
NIO channels don't discover a disconnect without activity. logstash-gelf now performs a read operation before writing data. This way the socket can discover the connection state. Reading is non-blocking so the performance impact is minor.
NIO channels don't discover a disconnect without activity. logstash-gelf now performs a read operation before writing data. This way the socket can discover the connection state. Reading is non-blocking so the performance impact is minor.
NIO channels don't discover a disconnect without activity. logstash-gelf now performs a read operation before writing data. This way the socket can discover the connection state. Reading is non-blocking so the performance impact is minor.
Hello Mark,
Hope you can help us with the following issue with logstash-gelf. We use
it in AWS environment to collect logs from a couple dozen instances; our
Graylog2 farm is behind an AWS Elastic LoadBalancer w/ TCP balancing.
For apps that log less often than the timeout specified on the ELB (60 s
by default) we experience total log event loss after the initial event
batch on application startup is sent successfully; sometimes we do see
the next batch of events make it through, and sometimes we don't.
Playing with keepAlive and deliveryAttempts had no effect.
I can't claim to completely understand what's going on, but my current
hypothesis, supported by observing network-level traffic with Wireshark,
is as follows:
messages.
dropped (as evidenced by ACK from our side). For some reason, this fact
doesn't get propagated to the SocketChannel used by the appender.
already dropped connection (as evidenced by a number of PSH/TCP
Retransmit messages). ELB sends RST in response, finally killing the
connection.
guess due to NIO - all the bytes it wanted to send are already handed
off to the OS buffer and the call to socketChannel.write() had returned
by the time that RST arrives.
new connection is established, the event is logged, and the cycle
repeats itself.
Reported by @vdenisov
The text was updated successfully, but these errors were encountered: