Configure keystonemiddleware/oslo to deal with memcached pods failures #447

lmiccini · 2024-12-05T08:10:22Z

Whenever one of the mecached pods disappears, because of a rolling restart during a minor update or as result of a failure, APIs can take a long time to detect that the pod went away and keep trying to reconnect.

From a quick round of tests we saw downtimes up to ~150s.

By tuning memcache_pool_dead_retry and memcache_pool_conn_get_timeout the behavior seems much more acceptable.

Since neutron also uses memcached directly we also need to tweak the [cache] section enabling the retry mechanism in the client and apply similar defaults.

Jira: https://issues.redhat.com/browse/OSPRH-11935

openshift-ci · 2024-12-12T06:00:37Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lmiccini
Once this PR has been reviewed and has the lgtm label, please assign abays for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Whenever one of the mecached pods disappears, because of a rolling restart during a minor update or as result of a failure, APIs can take a long time to detect that the pod went away and keep trying to reconnect. From a quick round of tests we saw downtimes up to ~150s. By tuning memcache_pool_dead_retry and memcache_pool_conn_get_timeout the behavior seems much more acceptable. Since neutron also uses memcached directly we also need to tweak the [cache] section enabling the retry mechanism in the client and apply similar defaults. Jira: https://issues.redhat.com/browse/OSPRH-11935

openshift-ci bot requested review from lewisdenny and viroel December 5, 2024 08:10

openshift-merge-robot added the needs-rebase label Dec 10, 2024

lmiccini force-pushed the memcached-failover branch from 55c38e6 to 8b5705f Compare December 12, 2024 06:00

openshift-merge-robot removed the needs-rebase label Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure keystonemiddleware/oslo to deal with memcached pods failures #447

Configure keystonemiddleware/oslo to deal with memcached pods failures #447

lmiccini commented Dec 5, 2024

openshift-ci bot commented Dec 12, 2024

Configure keystonemiddleware/oslo to deal with memcached pods failures #447

Are you sure you want to change the base?

Configure keystonemiddleware/oslo to deal with memcached pods failures #447

Conversation

lmiccini commented Dec 5, 2024

openshift-ci bot commented Dec 12, 2024