You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#679 included a change from pingDB to getDBStats. One purpose of both routines was to verify a good connection to the database, or else kill the core-service process so that the Kubernetes manager could restart it and (hopefully) restore the database connection. pingDB used PingContext to verify connectivity whereas getDBStats checked TotalConns == 0. We verified that TotalConns == 0 occurs when the connection to the database is broken, but it turns out that also occurs when the connection to the database is idle. Because of this, core-service will die after tens of minutes of inactivity even though everything is still working. The simplest fix to this problem is to only warn when TotalConns == 0, and do not attempt to kill core-service upon bad database connection. Instead, if the motivating issue is detected again, the database client should be improved to attempt to restore the connection upon initial failure without returning an error (this will be necessary at higher request volume any way).
The text was updated successfully, but these errors were encountered:
#679 included a change from
pingDB
togetDBStats
. One purpose of both routines was to verify a good connection to the database, or else kill the core-service process so that the Kubernetes manager could restart it and (hopefully) restore the database connection.pingDB
usedPingContext
to verify connectivity whereasgetDBStats
checkedTotalConns == 0
. We verified thatTotalConns == 0
occurs when the connection to the database is broken, but it turns out that also occurs when the connection to the database is idle. Because of this, core-service will die after tens of minutes of inactivity even though everything is still working. The simplest fix to this problem is to only warn whenTotalConns == 0
, and do not attempt to kill core-service upon bad database connection. Instead, if the motivating issue is detected again, the database client should be improved to attempt to restore the connection upon initial failure without returning an error (this will be necessary at higher request volume any way).The text was updated successfully, but these errors were encountered: