Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x/cache] Fix LRU cache mem leak (when used with no loader) #3806

Merged
merged 7 commits into from
Oct 4, 2021
15 changes: 10 additions & 5 deletions src/x/cache/lru_cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ func (c *LRU) PutWithTTL(key string, value interface{}, ttl time.Duration) {
c.mut.Lock()
defer c.mut.Unlock()

_, _ = c.updateCacheEntryWithLock(key, expiresAt, value, nil)
_, _ = c.updateCacheEntryWithLock(key, expiresAt, value, nil, true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i hate to over optimize, but I do wonder if scanning the entire cache for evictions on every put is going to be too much. I guess we can try this and if it causes performance issues, we can add some kind of eviction every N puts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should not be a problem - reserveCapacity removes entries while scanning them, so we either free up one slot for the current Put value, or free up more and then some subsequent Puts will be handled for free.

}

// Get returns the value associated with the key, optionally
Expand Down Expand Up @@ -390,7 +390,7 @@ func (c *LRU) cacheLoadComplete(
return c.handleCacheLoadErrorWithLock(key, expiresAt, err)
}

return c.updateCacheEntryWithLock(key, expiresAt, value, err)
return c.updateCacheEntryWithLock(key, expiresAt, value, err, false)
}

// handleCacheLoadErrorWithLock handles the results of an error from a cache load. If
Expand All @@ -402,15 +402,15 @@ func (c *LRU) handleCacheLoadErrorWithLock(
// If the loader is telling us to cache this error, do so unconditionally
var cachedErr CachedError
if errors.As(err, &cachedErr) {
return c.updateCacheEntryWithLock(key, expiresAt, nil, cachedErr.Err)
return c.updateCacheEntryWithLock(key, expiresAt, nil, cachedErr.Err, false)
}

// If the cache is configured to cache errors by default, do so unless
// the loader is telling us not to cache this one (e.g. it's transient)
var uncachedErr UncachedError
isUncachedError := errors.As(err, &uncachedErr)
if c.cacheErrors && !isUncachedError {
return c.updateCacheEntryWithLock(key, expiresAt, nil, err)
return c.updateCacheEntryWithLock(key, expiresAt, nil, err, false)
}

// Something happened during load, but we don't want to cache this - remove the entry,
Expand All @@ -430,10 +430,15 @@ func (c *LRU) handleCacheLoadErrorWithLock(
// updateCacheEntryWithLock updates a cache entry with a new value or cached error,
// and marks it as the most recently accessed and most recently loaded entry
func (c *LRU) updateCacheEntryWithLock(
key string, expiresAt time.Time, value interface{}, err error,
key string, expiresAt time.Time, value interface{}, err error, enforceLimit bool,
) (interface{}, error) {
entry := c.entries[key]
if entry == nil {
if enforceLimit && c.reserveCapacity(1) != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe check the returned error type is ErrCacheFull in case a future contributor changes reserveCapacity to return some other kind of error. or change the signature of reserveCapacity to return a bool instead to be more explicit it's not really an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, replacing return value error with bool for reserveCapacity makes lots of sense. Should have thought of this myself.

// Silently skip adding the new entry if we fail to free up space for it
// (which should never be happening).
return value, err
}
Copy link
Collaborator

@robskillington robskillington Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return the error from the reserve capacity call yeah? Otherwise the err would be nil yeah?

Suggested change
if enforceLimit && c.reserveCapacity(1) != nil {
// Silently skip adding the new entry if we fail to free up space for it
// (which should never be happening).
return value, err
}
if enforceLimit {
if err := c.reserveCapacity(1); err != nil {
// Silently skip adding the new entry if we fail to free up space for it
// (which should never be happening).
return value, err
}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this aspect is a bit vague. The contract of updateCacheEntryWithLock is to return its arguments value interface{}, err error unmodified, and I've tried to preserve that, commenting that we "silently skip" such situation (being unable to evict from cache - which from what I see would only happen in case there is a code bug).
Also, in the only place that calls updateCacheEntryWithLock with enforceLimit = true, those return values are ignored completely:

m3/src/x/cache/lru_cache.go

Lines 223 to 233 in c5f6237

func (c *LRU) PutWithTTL(key string, value interface{}, ttl time.Duration) {
var expiresAt time.Time
if ttl > 0 {
expiresAt = c.now().Add(ttl)
}
c.mut.Lock()
defer c.mut.Unlock()
_, _ = c.updateCacheEntryWithLock(key, expiresAt, value, nil, true)
}

So I'm not really sure if I should change this (but don't have a strong opinion either, even after jumping around this code for quire a long time). I guess that's the support of two modes (with/without loader) in a single data structure that makes it hard to come up with an elegant implementation. Perhaps @ryanhall07 will chime in with his insights as well.

Copy link
Collaborator

@robskillington robskillington Oct 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see yeah, that makes sense. Feel free to ignore my suggestion then 👍

The other question is, this only happens when there is a code bug yeah?

If so.. does that mean we have a remaining code bug we're not aware of? Which is fine if so, we can chase this up after the fact. Just wanted to understand current state of the world (post-merging this change).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what I meant, reserveCapacity is not supposed to return errors, unless there is some bug that we are not yet aware of.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, that makes sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left another comment. maybe it's just best to change reserveCapacity to return a bool so it's clear it can't be an error.

entry = &lruCacheEntry{}
c.entries[key] = entry
}
Expand Down
8 changes: 6 additions & 2 deletions src/x/cache/lru_cache_prop_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@ import (

func TestLRUPropertyTest(t *testing.T) {
testLRUPropFunc := func(input propTestMultiInput) (bool, error) {
size := 1000
maxSize := 1000

cacheOpts := &LRUOptions{
MaxEntries: size,
MaxEntries: maxSize,
Now: time.Now,
}

Expand Down Expand Up @@ -98,6 +98,10 @@ func TestLRUPropertyTest(t *testing.T) {

t.Logf("found=%d, not_found=%d\n", found.Load(), notFound.Load())

if actualSize := len(lru.entries); actualSize > maxSize {
return false, fmt.Errorf("cache size %d exceeded MaxSize %d", actualSize, maxSize)
}

return true, nil
}

Expand Down
14 changes: 14 additions & 0 deletions src/x/cache/lru_cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (
"context"
"errors"
"fmt"
"strconv"
"sync"
"sync/atomic"
"testing"
Expand Down Expand Up @@ -636,6 +637,19 @@ func TestLRU_TryGetExpired(t *testing.T) {
require.False(t, ok)
}

func TestLRU_EnforceMaxEntries(t *testing.T) {
var (
maxEntries = 2
lru = NewLRU(&LRUOptions{MaxEntries: maxEntries, TTL: time.Second, Now: time.Now})
)

for i := 0; i <= maxEntries; i++ {
lru.Put(strconv.Itoa(i), "foo")
}

assert.Len(t, lru.entries, maxEntries)
}

var defaultKeys = []string{
"key-0", "key-1", "key-2", "key-3", "key-4", "key-5", "key-6", "key-7", "key-8", "key-9", "key10",
}
Expand Down