Better handle RDS restarts (PP-991) #1689

jonathangreen · 2024-02-21T15:03:16Z

Description

This adds the pool_pre_ping arg to our DB connections. This should let us deal more gracefully with RDS restarts that happen automatically during maintenance windows.

See SQL alchemy docs here:
https://docs.sqlalchemy.org/en/14/core/pooling.html#dealing-with-disconnects

This also updates the ErrorHandler function to:

Also handle ProblemError exceptions, logging them as a warning, and returning the problem detail doc. We should handle these at another level of the stack, but its good to handle them correctly.
Log OperationalError as warnings. Since pool_pre_ping should reduce the number we see, but we will still get some happening when a restart happens in the middle of a transaction. pool_pre_ping will just make sure the rest of the connection pool doesn't experience the error because its holding on to a old connection.
Remove the debug functionality from ErrorHandler since its correctly flagged as a possible security issue. I think its unexpected to possibly leak stack traces if we were to set debug logging on a production instance. This is better handled by having the stack traces go to log, which they were already.
I believe that pool_pre_ping removes the need to kill the entire process when we have a database error. Since the next connection should be refreshed from the connection pool.

Motivation and Context

Looking at the top unhandled exceptions in our app in PP-991. I'm seeing this as the second most common, after the exception that is sorted out in #1688.

How Has This Been Tested?

Checklist

I have updated the documentation accordingly.
All new and existing tests passed.

core/app_server.py

codecov · 2024-02-21T15:06:24Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1be90db) 90.04% compared to head (fcb8aa5) 90.08%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1689      +/-   ##
==========================================
+ Coverage   90.04%   90.08%   +0.03%     
==========================================
  Files         245      245              
  Lines       28167    28162       -5     
  Branches     6423     6423              
==========================================
+ Hits        25364    25369       +5     
+ Misses       1855     1848       -7     
+ Partials      948      945       -3

Flag	Coverage Δ
Api	`75.78% <18.75%> (-0.01%)`	⬇️
Core	`59.80% <100.00%> (+0.02%)`	⬆️
migration	`26.88% <0.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tdilauro

Looks good! 🎸

Better handle RDS restarts by setting pool_pre_ping arg to sqlalchemy.

2c157ed

github-advanced-security bot found potential problems Feb 21, 2024

View reviewed changes

core/app_server.py Fixed Show fixed Hide fixed

Update error handler

fcb8aa5

jonathangreen force-pushed the feature/better-handle-rds-restarts branch from aca047c to fcb8aa5 Compare February 21, 2024 16:12

jonathangreen requested a review from a team February 21, 2024 16:21

jonathangreen marked this pull request as ready for review February 21, 2024 16:21

tdilauro approved these changes Feb 21, 2024

View reviewed changes

jonathangreen merged commit ea51ddc into main Feb 21, 2024
28 checks passed

jonathangreen deleted the feature/better-handle-rds-restarts branch February 21, 2024 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handle RDS restarts (PP-991) #1689

Better handle RDS restarts (PP-991) #1689

jonathangreen commented Feb 21, 2024 •

edited

Loading

codecov bot commented Feb 21, 2024 •

edited

Loading

tdilauro left a comment

Better handle RDS restarts (PP-991) #1689

Better handle RDS restarts (PP-991) #1689

Conversation

jonathangreen commented Feb 21, 2024 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Checklist

codecov bot commented Feb 21, 2024 • edited Loading

Codecov Report

tdilauro left a comment

Choose a reason for hiding this comment

jonathangreen commented Feb 21, 2024 •

edited

Loading

codecov bot commented Feb 21, 2024 •

edited

Loading