Skip to content
This repository has been archived by the owner on May 19, 2020. It is now read-only.

Fix redis health check ping. #1109

Merged
merged 1 commit into from
Jun 1, 2017
Merged

Conversation

jcscottiii
Copy link
Contributor

@jcscottiii jcscottiii commented Jun 1, 2017

Previous to this PR, we witnessed a failed redis ping but after redis
recovered, the health check /ping never corrected itself.
Incident: https://alerts.newrelic.com/accounts/907948/incidents/6561272

Test Case added: To prove this out, I initially added a test case where
the redis instance came back online and expected the health check to
recover. However it did not.

This is because previously, the /ping endpoint the redis client instance
for the ping was using the 1) default config and was a single Go obj.

The single Go obj did not know how to reconnect after a failed
connection.
Fix: this was fixed by using the same redis pool as the session store.

The default config did not use any time outs so any requests to redis
would wait forever.
Fix: Added timeouts for the read, write and connection attempts.

Also, during the debug, went back through the logs to see what was the exact error. Added more debug statements upon failure to make it easier when reviewing logs. https://github.com/18F/cg-dashboard/pull/1109/files#diff-661525df772e761479305d75690a20adR192

Previous to this PR, we witnessed a failed redis ping but after redis
recovered, the health check /ping never corrected itself.
Incident: https://alerts.newrelic.com/accounts/907948/incidents/6561272

Test Case added: To prove this out, I initially added a test case where
the redis instance came back online and expected the health check to
recover. However it did not.

This is because previously, the /ping endpoint the redis client instance
for the ping was using the 1) default config and was a single Go obj.

The single Go obj did not know how to reconnect after a failed
connection.
Fix: this was fixed by using the same redis pool as the session store.

The default config did not use any time outs so any requests to redis
would wait forever.
Fix: Added timeouts for the read, write and connection attempts.
@rememberlenny rememberlenny merged commit dd3c824 into master Jun 1, 2017
@rememberlenny rememberlenny deleted the js-fix-redis-health-check branch June 1, 2017 20:06
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants