Skip to content

Commit

Permalink
handler-slack.rb: implement a retry-timeout strategy for contacting s…
Browse files Browse the repository at this point in the history
…lack webhook

In certain scenarios the slack webhook delivery might fail due to several reasons:
- network issues
- rate limit exceeded
- internal server errors on slack api side

On those cases the call to the webhook might fail and our message not get delivered, or worse, it can leave our handler hanging for too long.

This commit implements a customizable retry strategy that tries to deliver the message several times to the webhook, with a timeout to do so. It also implements a sleep time in between retries.
All of these 3 settings can be customized in the json config of the handler, with defaults to 5 retries with 5 second sleeps in between, and 10 seconds timeout for each try.

This should incidentally solve issue sensu-plugins#15
  • Loading branch information
kali-brandwatch committed Apr 10, 2019
1 parent 5ac1f1c commit 1fcd5d3
Showing 1 changed file with 59 additions and 20 deletions.
79 changes: 59 additions & 20 deletions bin/handler-slack.rb
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,21 @@ def slack_webhook_url
get_setting('webhook_url')
end

def slack_webhook_retries
# The number of retries to deliver the payload to the slack webhook
get_setting('webhook_retries') || 5
end

def slack_webhook_timeout
# The amount of time (in seconds) to give for the webhook request to complete before failing it
get_setting('webhook_timeout') || 10
end

def slack_webhook_retry_sleep
# The amount of time (in seconds) to wait in between webhook retries
get_setting('webhook_retry_sleep') || 5
end

def slack_icon_emoji
get_setting('icon_emoji')
end
Expand Down Expand Up @@ -149,27 +164,51 @@ def post_data(body)
end
http.use_ssl = true

req = Net::HTTP::Post.new("#{uri.path}?#{uri.query}", 'Content-Type' => 'application/json')

if payload_template.nil?
text = slack_surround ? slack_surround + body + slack_surround : body
req.body = payload(text).to_json
else
req.body = body
end

response = http.request(req)
verify_response(response)
end
# Implement a retry-timeout strategy to handle slack api issues like network. Should solve #15
begin # retries loop
tries = slack_webhook_retries
Timeout.timeout(slack_webhook_timeout) do

begin # main loop for trying to deliver the message to slack webhook
req = Net::HTTP::Post.new("#{uri.path}?#{uri.query}", 'Content-Type' => 'application/json')

if payload_template.nil?
text = slack_surround ? slack_surround + body + slack_surround : body
req.body = payload(text).to_json
else
req.body = body
end

response = http.request(req)

# replace verify_response with a rescue within the loop
rescue Net::HTTPServerException => error
if (tries -= 1) > 0
sleep slack_webhook_retry_sleep
puts "retrying incident #{incident_key}... #{tries} left"
retry
else
# raise error for sensu-server to catch and log
puts 'slack api failed (retries) ' + incident_key + ' : ' + error.response.code + ' ' + error.response.message + ': sending to channel "' + slack_channel + '" the message: ' + body
exit 1
end
end # of main loop for trying to deliver the message to slack webhook

end # of timeout:do loop

# if the timeout is exceeded, consider this try failed
rescue Timeout::Error
if (tries -= 1) > 0
puts "timeout hit, retrying... #{tries} left"
retry
else
# raise error for sensu-server to catch and log
puts 'slack webhook failed (timeout) ' + incident_key + ' : sending to channel "' + slack_channel + '" the message: ' + body
exit 1
end
end # of retries loop

def verify_response(response)
case response
when Net::HTTPSuccess
true
else
raise response.error!
end
end
end # of post_data

def payload(notice)
client_fields = []
Expand Down

0 comments on commit 1fcd5d3

Please sign in to comment.