You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We were running into some intermittent test failures when running the following:
make test-envoy-integ ENVOY_VERSIONS="1.9.1" FILTER_TESTS="centralconf" STOP_ON_FAIL=1
Some times we would fail to provide envoy with a proxy configuration in a timely manner. After extensive search I believe the problem resides in the agent cache and in particular with the interaction of the Cache.getWithIndex and Cache.fetch functions.
Race Condition in Agent Cache
getWithIndex
validate request
Get a read lock on the cache entries
Get current cache entry
Relinquish the read lock
Check if the entry is valid and other things to determine whether we need to refetch
Issue a fetch for the new value (do not pass through the minIndex)
fetch
looking the type/ validate the request
gain a write lock on the cache entries
defer unlocking until the end of the func
lookup the current entry
if being fetched then we return the current waiter
If no entry and we are allowing new entries then create an invalid entry
Set the entry to Fetching = true
Put entry back into the entries map
Spawn go routine to perform the real cache request
Return the waiter chan
go routine real request
Setup timer to zero out the RefreshLostContact time after 31 seconds because that’s when we assume that yamux would have killed our connection if we actually weren’t in contact due to keep-alives.
Setup the blocking query. Use the current entries index for the MinIndex
How things go wrong.
If a call to fetch finishes after the read lock on the entries is relinquished but before the next invocation of fetch and in particular before the second fetch gains the write lock, we will miss the update. Not only do we miss the update but because we are not passing the minIndex from getWithIndex into fetch we are going to wait until there is another update even though the currently cached value would be new enough.
I will be considering how to fix this but wanted to track it here in case someone else from the team picks it up.
The text was updated successfully, but these errors were encountered:
We were running into some intermittent test failures when running the following:
make test-envoy-integ ENVOY_VERSIONS="1.9.1" FILTER_TESTS="centralconf" STOP_ON_FAIL=1
Some times we would fail to provide envoy with a proxy configuration in a timely manner. After extensive search I believe the problem resides in the agent cache and in particular with the interaction of the
Cache.getWithIndex
andCache.fetch
functions.Race Condition in Agent Cache
getWithIndex
fetch
for the new value (do not pass through the minIndex)fetch
Fetching = true
go routine real request
How things go wrong.
If a call to
fetch
finishes after the read lock on the entries is relinquished but before the next invocation offetch
and in particular before the secondfetch
gains the write lock, we will miss the update. Not only do we miss the update but because we are not passing the minIndex from getWithIndex into fetch we are going to wait until there is another update even though the currently cached value would be new enough.I will be considering how to fix this but wanted to track it here in case someone else from the team picks it up.
The text was updated successfully, but these errors were encountered: