Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle RDS restarts (PP-991) #1689

Merged
merged 2 commits into from
Feb 21, 2024

Conversation

jonathangreen
Copy link
Member

@jonathangreen jonathangreen commented Feb 21, 2024

Description

This adds the pool_pre_ping arg to our DB connections. This should let us deal more gracefully with RDS restarts that happen automatically during maintenance windows.

See SQL alchemy docs here:
https://docs.sqlalchemy.org/en/14/core/pooling.html#dealing-with-disconnects

This also updates the ErrorHandler function to:

  • Also handle ProblemError exceptions, logging them as a warning, and returning the problem detail doc. We should handle these at another level of the stack, but its good to handle them correctly.
  • Log OperationalError as warnings. Since pool_pre_ping should reduce the number we see, but we will still get some happening when a restart happens in the middle of a transaction. pool_pre_ping will just make sure the rest of the connection pool doesn't experience the error because its holding on to a old connection.
  • Remove the debug functionality from ErrorHandler since its correctly flagged as a possible security issue. I think its unexpected to possibly leak stack traces if we were to set debug logging on a production instance. This is better handled by having the stack traces go to log, which they were already.
  • I believe that pool_pre_ping removes the need to kill the entire process when we have a database error. Since the next connection should be refreshed from the connection pool.

Motivation and Context

Looking at the top unhandled exceptions in our app in PP-991. I'm seeing this as the second most common, after the exception that is sorted out in #1688.

How Has This Been Tested?

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

core/app_server.py Fixed Show fixed Hide fixed
Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1be90db) 90.04% compared to head (fcb8aa5) 90.08%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1689      +/-   ##
==========================================
+ Coverage   90.04%   90.08%   +0.03%     
==========================================
  Files         245      245              
  Lines       28167    28162       -5     
  Branches     6423     6423              
==========================================
+ Hits        25364    25369       +5     
+ Misses       1855     1848       -7     
+ Partials      948      945       -3     
Flag Coverage Δ
Api 75.78% <18.75%> (-0.01%) ⬇️
Core 59.80% <100.00%> (+0.02%) ⬆️
migration 26.88% <0.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonathangreen jonathangreen force-pushed the feature/better-handle-rds-restarts branch from aca047c to fcb8aa5 Compare February 21, 2024 16:12
@jonathangreen jonathangreen requested a review from a team February 21, 2024 16:21
@jonathangreen jonathangreen marked this pull request as ready for review February 21, 2024 16:21
Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🎸

@jonathangreen jonathangreen merged commit ea51ddc into main Feb 21, 2024
28 checks passed
@jonathangreen jonathangreen deleted the feature/better-handle-rds-restarts branch February 21, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants