Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔍 Investigation Needed: Incident #386206 and Auto-Resolution #3193

Closed
murad-ali-MoJ opened this issue Feb 5, 2024 · 6 comments
Closed
Assignees
Labels
data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools

Comments

@murad-ali-MoJ
Copy link
Contributor

Description

When I was on support on 02/02/24, I consistently received emails from PagerDuty indicating that I had an open incident assigned to me:

INCIDENT: #386206
Cluster: rds-eks-production-control-panel-psg-db-encrypted-low-freeable-memory

Issue Details

However, upon checking the incident details, it appears to be resolved. I suspect there might have been a memory issue in one of the clusters, and the system may have automatically resolved it.

Steps to Reproduce

N/A (incident was automatically resolved)

Expected Behavior

Provide insights or investigation on the potential memory issue in the mentioned cluster.

Additional Information

@murad-ali-MoJ murad-ali-MoJ added the data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools label Feb 5, 2024
@murad-ali-MoJ murad-ali-MoJ moved this to 🧐 To Do in Analytical Platform Feb 5, 2024
@murad-ali-MoJ murad-ali-MoJ changed the title Investigation Needed: Incident #386206 and Auto-Resolution 🔍 Investigation Needed: Incident #386206 and Auto-Resolution Feb 5, 2024
@jacobwoffenden
Copy link
Member

jacobwoffenden commented Feb 12, 2024

@BrianEllwood BrianEllwood self-assigned this Feb 12, 2024
@BrianEllwood BrianEllwood moved this from 🧐 To Do to 💨 In Progress in Analytical Platform Feb 12, 2024
@BrianEllwood
Copy link
Contributor

BrianEllwood commented Feb 12, 2024

looking at the logs this issue looks to have stopped on the 5/2/24

Image

when the freeable memory jumped from an average of around 127M to 212M.

I have not been able to find a reason for this in any logs still available.

@BrianEllwood
Copy link
Contributor

Looking at the documentation for a RDS PostgreSQL instance there does not appear to be any memory structures that could be tuned to help with this alert when it occurs.

If we did want to fix this (not currently occurring) issue we would have to increase the instance size as suggested.

@BrianEllwood
Copy link
Contributor

Looking at the incidents raised in pager duty i can see that this issue was first alerted 17/1/24

@BrianEllwood
Copy link
Contributor

BrianEllwood commented Feb 14, 2024

As there is currently no issue I would advise to take no action and investigate if it reoccurs in future. I will move the ticket into review and if the team agress I will close the ticket.

If not i will increase the instance size in the dev cluster as a test before implementing into prod.

@BrianEllwood BrianEllwood moved this from 💨 In Progress to 👀 In Review in Analytical Platform Feb 14, 2024
@BrianEllwood
Copy link
Contributor

Closing this ticket, new issue raised to look at log retention here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-platform-apps-and-tools This issue is owned by Data Platform Apps and Tools
Projects
Archived in project
Development

No branches or pull requests

3 participants