Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate massages #128

Open
shamil opened this issue May 11, 2017 · 14 comments
Open

Duplicate massages #128

shamil opened this issue May 11, 2017 · 14 comments
Labels

Comments

@shamil
Copy link

shamil commented May 11, 2017

Hi,

As I understand from docs, duplicate messages can happen for outputs that write in batches, like Kafka.
I'm using UDP output, and I still see lot's of duplicates.

Might be related, when I see duplicates I also see the following error:

ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large

What could be the reason?

@dwbutler dwbutler added the issue label May 13, 2017
@dwbutler
Copy link
Owner

The UDP device also writes in batches because Ruby writes to sockets in batches. If you're seeing lots of duplicates, that means Ruby thought it couldn't write to the UDP socket and therefore threw an exception, and then LogStashLogger retried writing the whole batch. But in fact, the UDP packets were received on the listening side.

Errno::EMSGSIZE is a known issue with the UDP device which is documented in the troubleshooting section of the Readme

@shamil
Copy link
Author

shamil commented May 13, 2017

I'm confused with your answer. First I didn't see any exceptions regarding Ruby can't write to UDP socket. Secondly, even when I enabled autoflush mode, which effectively disables batch mode, I still get duplicates. Third, according to code, the Errno::EMSGSIZE is rescued and considered as unrecoverable error, therefore ignored and not causing retries.

Am I missing something?

@dwbutler dwbutler reopened this May 13, 2017
@dwbutler
Copy link
Owner

Logging exceptions are written to STDERR by default. Are you checking there?

autoflush doesn't disable batching. It flushes the buffer after each log message is written. Log messages can still accumulate in the buffer if a flush takes a while. Anyway, even a single log message could be retried several times.

Errno::EMSGSIZE does indeed not cause retries. It's most likely a separate issue unrelated to the duplicate log messages.

I'll do some more research to see if it's normal/expected to get duplicate UDP packets on the receiving end even if the client only sent it once. I suspect that it is.

@shamil
Copy link
Author

shamil commented May 13, 2017

Yes I'm checking STDERR. The only error I see there is Errno::EMSGSIZE.

The problem that I can get thousands duplicate messages. I mean one message can get written thousand times. I did tcpdump and indeed those duplicate messages are coming out from the ruby (rails) process.

I know Errno::EMSGSIZE should not cause duplicates, but most of the time when I see duplicates I see a huge amount of Errno::EMSGSIZE messages in STDERR

@dwbutler
Copy link
Owner

That is indeed strange. It does appear that Errno::EMSGSIZE is not being correctly treated as unrecoverable. What version of LogStashLogger are you using? Dropping unrecoverable errors was released in 0.16.0.

@shamil
Copy link
Author

shamil commented May 13, 2017

I use 0.22.1 version

@dwbutler
Copy link
Owner

I tried a small test, and got the result I expected:

logger = LogStashLogger.new(type: :udp, port: 5228)
loop { 10.times { logger.info(Time.now); sleep(0.1) }; logger.info('1'*100_000); }
E, [2017-05-13T14:55:40.690990 #16663] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too long
W, [2017-05-13T14:55:40.691039 #16663]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write

I was not able to reproduce duplicate log messages being sent. Long messages were correctly discarded.

Can you post a few samples of the duplicate log messages?

@dwbutler
Copy link
Owner

dwbutler commented May 13, 2017

What version of Ruby are you using? In MRI 2.4.1, I get the error message Message too long, not Message too large

@dwbutler
Copy link
Owner

Interesting... I just tried this test under JRuby and got E, [2017-05-13T15:12:57.713000 #17281] ERROR -- : [LogStashLogger::Device::UDP] SystemCallError - Unknown error - No message available. So it's not caught as an unrecoverable error and the whole batch is retried forever.

Looks like version-specific error handling will be required.

@shamil
Copy link
Author

shamil commented May 13, 2017

I'm using MRI 2.3.3. And strangely just see the message with the error, no warning like in your output (WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write)

@dwbutler
Copy link
Owner

I can't reproduce this behavior in MRI 2.3.3. I suspect the behavior might be platform specific. What OS are you using? I test on a Macbook Pro.

@shamil
Copy link
Author

shamil commented May 13, 2017

I run it in Docker container (Alpine Linux) on top of Debian distribution.

I ran the same test and I see that the message always get retried:

E, [2017-05-13T22:49:38.210640 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.210715 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
E, [2017-05-13T22:49:38.210878 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.210946 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
E, [2017-05-13T22:49:38.211094 #13] ERROR -- : [LogStashLogger::Device::UDP] Errno::EMSGSIZE - Message too large
W, [2017-05-13T22:49:38.211126 #13]  WARN -- : [LogStashLogger::Device::UDP] unrecoverable error, aborting write
W, [2017-05-13T22:49:42.780342 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}
W, [2017-05-13T22:49:47.804145 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}
W, [2017-05-13T22:49:52.825159 #13]  WARN -- : [LogStashLogger::Device::UDP] Buffer Full - {:pending=>50, :outgoing=>0}

@dwbutler
Copy link
Owner

The test above writes multiple large messages. If you just write one message, you'll see that it fails and never retries again. The unrecoverable error, aborting write warning is the indicator that it will drop the message and not retry.

It looks to me like you're getting the expected behavior when running my small test, but unexpected behavior in your actual web app. Do you have any extra configuration, gems, etc. that might contribute to the issue?

@shamil
Copy link
Author

shamil commented May 14, 2017

Here are the configuration I use:

config.logstash.max_message_size = 4096
config.logstash.buffer_max_items = 8192
config.logstash.buffer_max_interval = 1

Also, I'm using lograge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants