-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔍 Investigation Needed: Incident #386206 and Auto-Resolution #3193
Comments
SummaryControl Panel's RDS instance is The alarm is configured https://github.com/ministryofjustice/data-platform/blob/4b9253cd26e7f494b3374abc1064ee543941dc6e/terraform/aws/analytical-platform-production/cluster/cloudwatch-alarms.tf#L223-L240 to alert when freeable memory drops below 128MB https://github.com/ministryofjustice/data-platform/blob/4b9253cd26e7f494b3374abc1064ee543941dc6e/terraform/aws/analytical-platform-production/cluster/terraform.tfvars#L102 128MB is a sufficient threshold to alert on for a database with 1GB RAM Suggested ActionMigrate to |
Looking at the documentation for a RDS PostgreSQL instance there does not appear to be any memory structures that could be tuned to help with this alert when it occurs. If we did want to fix this (not currently occurring) issue we would have to increase the instance size as suggested. |
Looking at the incidents raised in pager duty i can see that this issue was first alerted 17/1/24 |
As there is currently no issue I would advise to take no action and investigate if it reoccurs in future. I will move the ticket into review and if the team agress I will close the ticket. If not i will increase the instance size in the dev cluster as a test before implementing into prod. |
Closing this ticket, new issue raised to look at log retention here |
Description
When I was on support on 02/02/24, I consistently received emails from PagerDuty indicating that I had an open incident assigned to me:
INCIDENT: #386206
Cluster: rds-eks-production-control-panel-psg-db-encrypted-low-freeable-memory
Issue Details
However, upon checking the incident details, it appears to be resolved. I suspect there might have been a memory issue in one of the clusters, and the system may have automatically resolved it.
Steps to Reproduce
N/A (incident was automatically resolved)
Expected Behavior
Provide insights or investigation on the potential memory issue in the mentioned cluster.
Additional Information
The text was updated successfully, but these errors were encountered: