-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[x/cache] Fix LRU cache mem leak (when used with no loader) #3806
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3806 +/- ##
======================================
Coverage 56.8% 56.8%
======================================
Files 552 552
Lines 63077 63077
======================================
Hits 35883 35883
Misses 23996 23996
Partials 3198 3198
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
src/x/cache/lru_cache.go
Outdated
if enforceLimit && c.reserveCapacity(1) != nil { | ||
// Silently skip adding the new entry if we fail to free up space for it | ||
// (which should never be happening). | ||
return value, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should return the error from the reserve capacity call yeah? Otherwise the err would be nil yeah?
if enforceLimit && c.reserveCapacity(1) != nil { | |
// Silently skip adding the new entry if we fail to free up space for it | |
// (which should never be happening). | |
return value, err | |
} | |
if enforceLimit { | |
if err := c.reserveCapacity(1); err != nil { | |
// Silently skip adding the new entry if we fail to free up space for it | |
// (which should never be happening). | |
return value, err | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this aspect is a bit vague. The contract of updateCacheEntryWithLock
is to return its arguments value interface{}, err error
unmodified, and I've tried to preserve that, commenting that we "silently skip" such situation (being unable to evict from cache - which from what I see would only happen in case there is a code bug).
Also, in the only place that calls updateCacheEntryWithLock
with enforceLimit = true
, those return values are ignored completely:
Lines 223 to 233 in c5f6237
func (c *LRU) PutWithTTL(key string, value interface{}, ttl time.Duration) { | |
var expiresAt time.Time | |
if ttl > 0 { | |
expiresAt = c.now().Add(ttl) | |
} | |
c.mut.Lock() | |
defer c.mut.Unlock() | |
_, _ = c.updateCacheEntryWithLock(key, expiresAt, value, nil, true) | |
} |
So I'm not really sure if I should change this (but don't have a strong opinion either, even after jumping around this code for quire a long time). I guess that's the support of two modes (with/without loader) in a single data structure that makes it hard to come up with an elegant implementation. Perhaps @ryanhall07 will chime in with his insights as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see yeah, that makes sense. Feel free to ignore my suggestion then 👍
The other question is, this only happens when there is a code bug yeah?
If so.. does that mean we have a remaining code bug we're not aware of? Which is fine if so, we can chase this up after the fact. Just wanted to understand current state of the world (post-merging this change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what I meant, reserveCapacity
is not supposed to return errors, unless there is some bug that we are not yet aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left another comment. maybe it's just best to change reserveCapacity
to return a bool so it's clear it can't be an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/x/cache/lru_cache.go
Outdated
) (interface{}, error) { | ||
entry := c.entries[key] | ||
if entry == nil { | ||
if enforceLimit && c.reserveCapacity(1) != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe check the returned error type is ErrCacheFull
in case a future contributor changes reserveCapacity
to return some other kind of error. or change the signature of reserveCapacity
to return a bool instead to be more explicit it's not really an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, replacing return value error
with bool
for reserveCapacity
makes lots of sense. Should have thought of this myself.
src/x/cache/lru_cache.go
Outdated
if enforceLimit && c.reserveCapacity(1) != nil { | ||
// Silently skip adding the new entry if we fail to free up space for it | ||
// (which should never be happening). | ||
return value, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left another comment. maybe it's just best to change reserveCapacity
to return a bool so it's clear it can't be an error.
@@ -229,7 +229,7 @@ func (c *LRU) PutWithTTL(key string, value interface{}, ttl time.Duration) { | |||
c.mut.Lock() | |||
defer c.mut.Unlock() | |||
|
|||
_, _ = c.updateCacheEntryWithLock(key, expiresAt, value, nil) | |||
_, _ = c.updateCacheEntryWithLock(key, expiresAt, value, nil, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i hate to over optimize, but I do wonder if scanning the entire cache for evictions on every put is going to be too much. I guess we can try this and if it causes performance issues, we can add some kind of eviction every N puts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should not be a problem - reserveCapacity
removes entries while scanning them, so we either free up one slot for the current Put value, or free up more and then some subsequent Puts will be handled for free.
What this PR does / why we need it:
A regression in LRU cache was introduced here:
m3/src/x/cache/lru_cache.go
Lines 356 to 366 in e2c6903
Cache entries were never being evicted when it was being used with no loader. Because of early return under
getWithNoLoader = true
the code that invokesreserveCapacity
was unreachable, and this was the only place where eviction would happen.Special notes for your reviewer:
I have added a call to
reserveCapacity
fromPut
(which assumes no loader is in use).Another possibility would be to fix
tryCache
implementation to reserve the space on cache miss, but thenPut
would happen not under the same lock, creating a race betweenGet
s (freeing up space in advance) andPut
s which would permit exceeding the cache limit (still being eventually consistent). I chose doing eviction fromPut
as this seemed more semantically correct.Does this PR introduce a user-facing and/or backwards incompatible change?:
NONE
Does this PR require updating code package or user-facing documentation?:
NONE