Skip to content

Cloudwatch alarms

Michael Barton edited this page Jan 17, 2017 · 5 revisions

The cloudformation defines alarms for monitoring. When fired, the alarms ping a webhook specified as the AlertWebhook parameter, allowing integration with tools like Pager Duty.

MediaAtomMaker5XXAlarm

Fires if any 5XX responses have been seen within a 5 minute period.

Original Problem: Issues with auth and the internal Dynamo schemas

Rationale: It's a good idea to investigate every unexpected exception!

If such alerts become a regular occurrence and are ignored the alarm should be updated to be less trigger happy.

MediaAtomMaker4XXAlarm

Fires on repeated occurrence of 4XX responses. This is defined as more than 15 in a ten minute period.

Rationale: Repeated client error responses probably indicate a bug with a client.

The alarm criteria was somewhat arbitrarily copied from The Grid and should be reviewed based on application usage over time.

## MediaAtomMakerLatency

Fires if requests start taking more than 1 second on average across a 5 minute period.

Rationale: Unexpectedly slow requests need investigation!

## MediaAtomMAkerUnhealthy

Fires if any of the instances behind the ELB are considered unhealthy.

Original problem: The application went down without firing any of the existing alarms, because they were all request-based.

Rationale: Quick raw indicator of degradation of service (not necessarily full failure)

Clone this wiki locally