Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging Suddenly Stops on Production #17

Closed
pctj101 opened this issue Aug 1, 2014 · 17 comments
Closed

Logging Suddenly Stops on Production #17

pctj101 opened this issue Aug 1, 2014 · 17 comments

Comments

@pctj101
Copy link

pctj101 commented Aug 1, 2014

So my production server is a puma cluster and it logs perfectly... until recently.

Currently, logging is successful for a number of hours, then suddenly stops.

I traced one cause back to Logstash::Event where Event#to_json was throwing errors on encoding to JSON, and usually that was the last log message. So I upgraded Logstash::Event to the latest master branch, fixed a couple logstash-event bugs, and the system is much more stable now.
[ Ref: https://github.com/pctj101/logstash/tree/event_timestamp ]

But... after a few more hours, logging still stops.

So I don't think it's a Logstash::Event#to_json problem any more.

Granted I use lograge to reduce my logs and format into json, but I don't see any problems there either.

And too be fair, I don't see any problems in logstash-logger either.

So it's not clear what to debug.

So while I'm looking into this I thought I'd ask a few simple questions:

Is there anything in general where a bug could cause some exception to be raised, and permanently disable logstash-logger or Rails.logger until the next server restart?

Any similar problems seen in the past?

Thanks!So my production server is a puma cluster and it logs perfectly... until recently.

Currently, logging is successful for a number of hours, then suddenly stops.

I traced one cause back to Logstash::Event where Event#to_json was throwing errors on encoding to JSON, and usually that was the last log message. So I upgraded Logstash::Event to the latest master branch, fixed a couple logstash-event bugs, and the system is much more stable now.
[ Ref: https://github.com/pctj101/logstash/tree/event_timestamp ]

But... after a few more hours, logging still stops.

So I don't think it's a Logstash::Event#to_json problem any more.

Granted I use lograge to reduce my logs and format into json, but I don't see any problems there either.

And too be fair, I don't see any problems in logstash-logger either.

So it's not clear what to debug.

So while I'm looking into this I thought I'd ask a few simple questions:

Is there anything in general where a bug could cause some exception to be raised, and permanently disable logstash-logger or Rails.logger until the next server restart?

Any similar problems seen in the past?

Thanks!

@pctj101
Copy link
Author

pctj101 commented Aug 1, 2014

FYI Sending my logs to redis. (instead of UDP directly to logstash)

Also making my puma cluster reconnect on worker launch:

on_worker_boot do
  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.establish_connection
  end
  logger_options = Rails.application.config.logstash
  logger = LogStashLogger.new(logger_options)
  Rails.application.config.logger = logger
end

So far, so good.

@pctj101 pctj101 changed the title Logging Stops on Production Logging Suddenly Stops on Production Aug 1, 2014
@pctj101
Copy link
Author

pctj101 commented Aug 1, 2014

Nope... still randomly stops logging :( Won't even send a message to redis.

@pctj101
Copy link
Author

pctj101 commented Aug 1, 2014

So when logstash-logger stops logging my rails server output, I decided to make a new controller#action that uses logstash directly to log a message.

Such as:

def check_if_logstash_logger_still_alive
Rails.logger.info {whatever}
end

However even that does not work.

Needless to say, no other logging works.

At this point, we're no longer using lograge. It's simply that logstash-logger will not send redis/udp messages outbound after a random amount of time.

@pctj101
Copy link
Author

pctj101 commented Aug 1, 2014

For Redis: Currently trying to ask @io.client.reconnect if @io.connected? is not true.
But redis client should already autoreconnect.

...

Nope that doesn't matter. Reconnecting does not help.
https://github.com/pctj101/logstash-logger/blob/v060redis_reconnect/lib/logstash-logger/device/redis.rb
Line 34

@dwbutler
Copy link
Owner

dwbutler commented Aug 1, 2014

If you could answer the following questions, that would help narrow down the possible problems.

  • Are you using MRI or JRuby? Which version? (You mentioned Puma is running in clustered mode, so I'm guessing MRI?)
  • Are you using any gems that monkey around with log levels? The way Rails silences logs is not thread safe, so the logger can easily be silenced forever if you're running a threaded server such as Puma. I've had the same issue as you when using the quiet_assets gem.
  • Which configuration(s) of logstash cause the problem? Any/all, or just certain ones?
  • Does the problem go away if you stop using LogStashLogger?

@dwbutler
Copy link
Owner

dwbutler commented Aug 1, 2014

Also, can you modify your controller to print out Rails.logger.level? That will help us figure out if the log level is the issue.

@pctj101
Copy link
Author

pctj101 commented Aug 1, 2014

Are you using MRI or JRuby? Which version? (You mentioned Puma is running in clustered mode, so I'm guessing MRI?)

MRI Ruby 2.1.2

Are you using any gems that monkey around with log levels? The way Rails silences logs is not thread safe, so the logger can easily be silenced forever if you're running a threaded server such as Puma.

Yes: https://github.com/roidrage/lograge
However, it shouldn't silence all logs. But I'll totally accept there might be something crazy in there. With that said, I've used lograge v0.3.0 for months without issue.

I've had the same issue as you when using the quiet_assets gem.
Which configuration(s) of logstash cause the problem? Any/all, or just certain ones?

UDP and Redis so far are all I tested. Both have the issue.

Does the problem go away if you stop using LogStashLogger?

I turned off logstash logger & lograge at the same time. Then put in a instrument listener on "process_action.action_controller" which directly logs to logstash. Works like a charm (minus all the console debug messages...)

Also, can you modify your controller to print out Rails.logger.level? That will help us figure out if the log level is the issue.

Totally can do after the weekend is over :)

From what you're saying, the idea of the logger.level getting changed unexpectedly sounds possible. I've grepped my code, and it's nothing I am doing. However... there is admittedly more than just my code. There are lots of gems too.

In a way, for me, logging the "process_action.action_controller" is all I really "need". The remaining console messages are helpful, but not my main goal.

However, the console messages become much more helpful when I really need to debug something, so there is value to getting this whole thing working.

More news next week!

Thanks for the insight!

@dwbutler
Copy link
Owner

@pctj101 Did you find anything out yet?

@chopmo
Copy link

chopmo commented Mar 20, 2015

@dwbutler It looks like I'm hitting the same issue.

I'm using MRI 2.2.0, Unicorn 4.5.0, lograge 0.3.1, and logstash-logger 0.8.0. Data is sent to logstash via UDP and also logged to a file:

  config.logstash = [
    {
      type: :file,
      path: 'log/logstash_production.log'
    },
    {
      type: :udp,
      port: 5228,
      host: ENV["LOGSTASH_HOST"]
    }
  ]

It looks like you're right about the log level being changed. Really good catch! Printing it reveals that it is raised to 2 for more and more Unicorn workers over time. Restarting the server obviously fixes the problem temporarily.

So now I'll try to hunt down who is changing the log level. In case you're interested, here is our "impressive" Gemfile.lock: https://gist.github.com/chopmo/88f10097f94db452089f

@chopmo
Copy link

chopmo commented Mar 20, 2015

Well that didn't take as long as expected. I'm 99% sure that we found the problem.

We're using the Savon gem, and when calling Savon.client it is possible to pass in a logger object and a log level as options.

We accidentally passed Rails.logger without duplicating it and the Logger::WARN level. Savon changed the level on the logger.

Simple as that :)

@dwbutler
Copy link
Owner

Ouch... It sounds like Savon is not being a good logging citizen. I'll add a note to the Troubleshooting section of the readme. Thanks for the tip!

@dwbutler
Copy link
Owner

I added a general note about gems changing the log level to the Troubleshooting section and linked to this issue. I think it's been well-established that this issue is caused by other gems. I can't think of any good way to protect one's logger level from getting stomped. So I'm going to close this issue.

If anyone finds another gem causing a problem, or has any ideas about how to protect the log level from getting changed, feel free to add more comments to this issue.

@glaszig
Copy link
Contributor

glaszig commented Dec 21, 2016

i'm also suffering this. i even followed instructions in a book.
it describes how to use lograge and logstash-logger to send logs off to logstash but logs are only being sent to logstash when restarting the server (apache + passener).
this looks to me like some sort of buffering issue and indeed if i set the sync: true option in the LogStashLogger initializer, logs land in my logstash.

the buffer by default is supposed to collect up to 50 messages and flush them every 5 seconds. but it doesn't for my case. anything i'm missing?

Rails.application.configure do
  config.lograge.enabled = true
  config.lograge.custom_options = lambda do |event|
    exceptions = %w(controller action format id)
    {
      params:      event.payload[:params].except(*exceptions).to_json,
      type:        :rails,
      environment: Rails.env
    }
  end
  config.lograge.formatter = Lograge::Formatters::Logstash.new
  config.lograge.logger    = LogStashLogger.new uri: 'udp://logstash-server:1234'
end

@dwbutler
Copy link
Owner

@glaszig I noticed that you are using UDP output. Based on the symptoms you're describing, I believe you are observing normal Ruby behavior. When Ruby writes to an IO device such as a file or socket, it doesn't write immediately. It keeps it's own internal buffer and flushes it "periodically." If you write infrequently enough and don't have enough memory pressure, "periodically" could mean "never." Ruby will, of course, flush all buffers when you exit the program (which is what happens when you restart the server). The only way to control this behavior is to set sync=true, as you did. I'll make a note of this behavior in the readme.

@glaszig
Copy link
Contributor

glaszig commented Dec 21, 2016

sorry, i didn't know enough about these kind of ruby internals but i was imagining something like that. there is indeed very low traffic on this system. thanks for getting back.

@glaszig
Copy link
Contributor

glaszig commented Dec 22, 2016

if anybody else comes across this and is interested in ruby's io buffering, watch this little screencast by @jstorimer.

@lloydwatkin
Copy link

Hi, I believe I'm hitting this issue now myself (boo!). Everything is fine on staging, but as soon as I get to production, nada.

Are there any tips/tools that would allow me to search through all my gem dependencies to find the culprit or is it just a case of lots of searching?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants