core-service panics when database connection is still good #691

BenjaminPelletier · 2022-01-31T22:38:36Z

#679 included a change from pingDB to getDBStats. One purpose of both routines was to verify a good connection to the database, or else kill the core-service process so that the Kubernetes manager could restart it and (hopefully) restore the database connection. pingDB used PingContext to verify connectivity whereas getDBStats checked TotalConns == 0. We verified that TotalConns == 0 occurs when the connection to the database is broken, but it turns out that also occurs when the connection to the database is idle. Because of this, core-service will die after tens of minutes of inactivity even though everything is still working. The simplest fix to this problem is to only warn when TotalConns == 0, and do not attempt to kill core-service upon bad database connection. Instead, if the motivating issue is detected again, the database client should be improved to attempt to restore the connection upon initial failure without returning an error (this will be necessary at higher request volume any way).

The text was updated successfully, but these errors were encountered:

BenjaminPelletier added P0 Highest priority; blocking usage or development bug Software behaves incorrectly because of this issue labels Jan 31, 2022

This was referenced Jan 31, 2022

[core-service] Do not panic when TotalConns==0 #692

Merged

Ensure robust database connectivity #694

Open

BenjaminPelletier closed this as completed in #692 Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core-service panics when database connection is still good #691

core-service panics when database connection is still good #691

BenjaminPelletier commented Jan 31, 2022

core-service panics when database connection is still good #691

core-service panics when database connection is still good #691

Comments

BenjaminPelletier commented Jan 31, 2022