Duplicate massages #128

shamil · 2017-05-11T20:02:53Z

Hi,

As I understand from docs, duplicate messages can happen for outputs that write in batches, like Kafka.
I'm using UDP output, and I still see lot's of duplicates.

Might be related, when I see duplicates I also see the following error:

ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large

What could be the reason?

The text was updated successfully, but these errors were encountered:

dwbutler · 2017-05-13T09:38:43Z

The UDP device also writes in batches because Ruby writes to sockets in batches. If you're seeing lots of duplicates, that means Ruby thought it couldn't write to the UDP socket and therefore threw an exception, and then LogStashLogger retried writing the whole batch. But in fact, the UDP packets were received on the listening side.

Errno::EMSGSIZE is a known issue with the UDP device which is documented in the troubleshooting section of the Readme

shamil · 2017-05-13T11:22:05Z

I'm confused with your answer. First I didn't see any exceptions regarding Ruby can't write to UDP socket. Secondly, even when I enabled autoflush mode, which effectively disables batch mode, I still get duplicates. Third, according to code, the Errno::EMSGSIZE is rescued and considered as unrecoverable error, therefore ignored and not causing retries.

Am I missing something?

dwbutler · 2017-05-13T11:53:13Z

Logging exceptions are written to STDERR by default. Are you checking there?

autoflush doesn't disable batching. It flushes the buffer after each log message is written. Log messages can still accumulate in the buffer if a flush takes a while. Anyway, even a single log message could be retried several times.

Errno::EMSGSIZE does indeed not cause retries. It's most likely a separate issue unrelated to the duplicate log messages.

I'll do some more research to see if it's normal/expected to get duplicate UDP packets on the receiving end even if the client only sent it once. I suspect that it is.

shamil · 2017-05-13T12:03:18Z

Yes I'm checking STDERR. The only error I see there is Errno::EMSGSIZE.

The problem that I can get thousands duplicate messages. I mean one message can get written thousand times. I did tcpdump and indeed those duplicate messages are coming out from the ruby (rails) process.

I know Errno::EMSGSIZE should not cause duplicates, but most of the time when I see duplicates I see a huge amount of Errno::EMSGSIZE messages in STDERR

dwbutler · 2017-05-13T16:37:47Z

That is indeed strange. It does appear that Errno::EMSGSIZE is not being correctly treated as unrecoverable. What version of LogStashLogger are you using? Dropping unrecoverable errors was released in 0.16.0.

shamil · 2017-05-13T16:56:15Z

I use 0.22.1 version

dwbutler · 2017-05-13T22:06:02Z

I tried a small test, and got the result I expected:

logger = LogStashLogger.new(type: :udp, port: 5228)
loop { 10.times { logger.info(Time.now); sleep(0.1) }; logger.info('1'*100_000); }
E, [2017-05-13T14:55:40.690990 #16663] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too long
W, [2017-05-13T14:55:40.691039 #16663]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write

I was not able to reproduce duplicate log messages being sent. Long messages were correctly discarded.

Can you post a few samples of the duplicate log messages?

dwbutler · 2017-05-13T22:10:00Z

What version of Ruby are you using? In MRI 2.4.1, I get the error message Message too long, not Message too large

dwbutler · 2017-05-13T22:15:34Z

Interesting... I just tried this test under JRuby and got E, [2017-05-13T15:12:57.713000 #17281] ERROR -- : [LogStashLogger::Device::UDP] SystemCallError - Unknown error - No message available. So it's not caught as an unrecoverable error and the whole batch is retried forever.

Looks like version-specific error handling will be required.

shamil · 2017-05-13T22:23:37Z

I'm using MRI 2.3.3. And strangely just see the message with the error, no warning like in your output (WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write)

dwbutler · 2017-05-13T22:35:22Z

I can't reproduce this behavior in MRI 2.3.3. I suspect the behavior might be platform specific. What OS are you using? I test on a Macbook Pro.

shamil · 2017-05-13T22:50:40Z

I run it in Docker container (Alpine Linux) on top of Debian distribution.

I ran the same test and I see that the message always get retried:

E, [2017-05-13T22:49:38.210640 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.210715 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
E, [2017-05-13T22:49:38.210878 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.210946 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
E, [2017-05-13T22:49:38.211094 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.211126 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
W, [2017-05-13T22:49:42.780342 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}
W, [2017-05-13T22:49:47.804145 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}
W, [2017-05-13T22:49:52.825159 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}

dwbutler · 2017-05-13T23:38:15Z

The test above writes multiple large messages. If you just write one message, you'll see that it fails and never retries again. The unrecoverable error, aborting write warning is the indicator that it will drop the message and not retry.

It looks to me like you're getting the expected behavior when running my small test, but unexpected behavior in your actual web app. Do you have any extra configuration, gems, etc. that might contribute to the issue?

shamil · 2017-05-14T00:16:19Z

Here are the configuration I use:

config.logstash.max_message_size = 4096
config.logstash.buffer_max_items = 8192
config.logstash.buffer_max_interval = 1

Also, I'm using lograge

dwbutler added the issue label May 13, 2017

dwbutler closed this as completed May 13, 2017

dwbutler reopened this May 13, 2017

dwbutler mentioned this issue May 14, 2017

JRuby raises SystemCallError instead of Errno::EMSGSIZE jruby/jruby#4606

Closed

dwbutler mentioned this issue May 31, 2017

Fix UDPSocket#(send|write) to raise EMSGSIZE jruby/jruby#4608

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicate massages #128

Duplicate massages #128

shamil commented May 11, 2017 •

edited

Loading

dwbutler commented May 13, 2017

shamil commented May 13, 2017

dwbutler commented May 13, 2017

shamil commented May 13, 2017 •

edited

Loading

dwbutler commented May 13, 2017

shamil commented May 13, 2017

dwbutler commented May 13, 2017

dwbutler commented May 13, 2017 •

edited

Loading

dwbutler commented May 13, 2017

shamil commented May 13, 2017

dwbutler commented May 13, 2017

shamil commented May 13, 2017 •

edited

Loading

dwbutler commented May 13, 2017

shamil commented May 14, 2017

Duplicate massages #128

Duplicate massages #128

Comments

shamil commented May 11, 2017 • edited Loading

dwbutler commented May 13, 2017

shamil commented May 13, 2017

dwbutler commented May 13, 2017

shamil commented May 13, 2017 • edited Loading

dwbutler commented May 13, 2017

shamil commented May 13, 2017

dwbutler commented May 13, 2017

dwbutler commented May 13, 2017 • edited Loading

dwbutler commented May 13, 2017

shamil commented May 13, 2017

dwbutler commented May 13, 2017

shamil commented May 13, 2017 • edited Loading

dwbutler commented May 13, 2017

shamil commented May 14, 2017

shamil commented May 11, 2017 •

edited

Loading

shamil commented May 13, 2017 •

edited

Loading

dwbutler commented May 13, 2017 •

edited

Loading

shamil commented May 13, 2017 •

edited

Loading