LFU Cache Implementation #7439

vmg · 2021-02-03T16:59:35Z

Hi everyone!

Here's the first version of the new LFU Cache implementation ready to review.

For context: The cache for planned queries, which exists both in the vtgates and the vttablets, has not been behaving well. We have had to disable these caches altogether because they were causing OOM errors. We believe the main cause triggering these errors are bulk insert queries, which come in different sizes for statement counts and hence will occupy too much cache space even after normalization before caching.

In order to fix this, and several other issues I've found while reviewing the code, I'm bringing a brand new cache implementation. There are 3 key features of this new cache implementation which are strict improvements over the old one:

The cache keeps track of its total memory usage in bytes, so production users can configure it to occupy a given amount of memory in the server, as opposed to the old cache, that could only be configured with a maximum amount of entries. This required a significant engineering effort to keep track of the total memory size of the cached objects which has already been merged into master.
The cache is more performant for both reads and writes: its storage layer is a sharded hash map, so it no longer grabs a global lock on each read, greatly increasing throughput. Its writes are performed asynchronously, so storing new data in the cache does not block the main request path.
The cache is resilient to pollution, which is the critical feature that fixes the overflow issues we were seeing with bulk inserts. With a TinyLFU admission policy, rare elements are not always inserted into the cache when it reaches capacity. This allows one-off queries, like a stream of bulk inserts, to not take over the cache like they would with a naive LRU algorithm.

In order to bring in the new cache implementation into the codebase, I've refactored all cache access behind interfaces. This required some minor changes to the LRUCache API, although its behavior should be identical to the one it had before (the tests verify just that). Afterwards, I brought in a new cache implementation that implements the same API as our new unified cache interface. The new implementation is a heavily modified version of Ristretto, which is an Apache v2 licensed library. I briefly considered bringing in the cache as an external dependency, but it required so many modifications in order to match the exact behaviors that Vitess was expecting that I leaned towards importing it whole (it clocks at a few thousand lines of code, including extensive tests).

Lastly, I updated the places caches are being created so that it's always safe to swap between the LRU and LFU cache implementations. In order to ensure backwards compatibility, this PR does not enable the new cache by default so its behavior should be identical to what's in the trunk right now. A global switch in cache.go is available to control the default value for cache capacity: changing that from entries to memory bytes will instantly switch the default configuration arguments, commandline arguments, tests, etc, to default to using the LFU cache. Of course, regardless of the default, users can explicitly specify a memory usage in bytes to opt in to the new cache behavior.

Checklist

Should this PR be backported?
Tests were added or are not required
Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

There is no need to lookup ta given plan in the cache twice (in its normalized and non-normalized representation for its key): if the plan is normalized, it'll be stored into the cache in its normalized form. If it's not normalized, it'll be stored in its original form. Either way, the initial lookup with its non-normalized form is redundant. The raw `sql` content of a query only changes if the query has been normalized. In cases where it hasn't, there is no need to lookup the same key twice on cache Signed-off-by: Vicent Marti <[email protected]>

We have experienced caching issues with batch inserts in Vitess clusters, whose plans were polluting the shared plan cache. Before we can consider the trivial fix for this issue, which would be simply disabling caching for `INSERT` statements, we need to find out what's going to be the impact of disabling caching for this plans. Unfortunately, it looks like there isn't a significant performance difference between preparing a plan for an INSERT statement vs a SELECT one. Here's the output of two comparisons with a random sample of 32 of each statement: BenchmarkSelectVsDML/DML_(random_sample,_N=32) BenchmarkSelectVsDML/DML_(random_sample,_N=32)-16 766 1640575 ns/op 511073 B/op 6363 allocs/op BenchmarkSelectVsDML/Select_(random_sample,_N=32) BenchmarkSelectVsDML/Select_(random_sample,_N=32)-16 746 1479792 ns/op 274486 B/op 7730 allocs/op BenchmarkSelectVsDML/DML_(random_sample,_N=32) BenchmarkSelectVsDML/DML_(random_sample,_N=32)-16 823 1540039 ns/op 496079 B/op 5949 allocs/op BenchmarkSelectVsDML/Select_(random_sample,_N=32) BenchmarkSelectVsDML/Select_(random_sample,_N=32)-16 798 1526661 ns/op 275016 B/op 7730 allocs/op There is not a noticeable performance difference when preparing the INSERT statements. The only consistent metric is that INSERT statement plans allocate more memory than SELECT plans. Signed-off-by: Vicent Marti <[email protected]>

The current public API for the cache makes some assumptions that do not hold for more efficient cache implementations with admission policies. The following APIs have been replaced with equivalent ones or removed altogether: - `LRUCache.SetIfAbsent`: removed, not used - `LRUCache.Peek`: replaced with LRUCache.ForEach, since the original Peek was only used to iterate through the contents of the cache - `LRUCache.Keys`: likewise replaced with `ForEach` since the keys were only being accessed for iteration Signed-off-by: Vicent Marti <[email protected]>

The `cache.LRUCache` struct has now been abstracted behind a generic Cache interface so that it can be swapped with more complex Cache implementations. Signed-off-by: Vicent Marti <[email protected]>

Signed-off-by: Vicent Marti <[email protected]>

The existing pattern for vttablet/vtgate cache configuration is a dangerous practice, because it lets the user configure the number of items that can be stored in the cache, as opposed to the total amount of memory (approximately) that the cache will consume. This makes tuning production systems complicated, and will skew more complex cache implementations that use size-aware eviction policies. To fix this, we're deprecating the original config settings for cache tuning, and introducing new ones where the total size of the cache is defined in BYTES as opposed to ENTRIES. To maintain backwards compatibility, if the user supplies the legacy config options with number of ENTRIES, we'll calculate an approximate total size for the cache based on the average size of a cache entry for each given cache. Signed-off-by: Vicent Marti <[email protected]>

Signed-off-by: Vicent Marti <[email protected]>

vmg · 2021-02-03T17:00:16Z

go/cache/cache.go

+// Regardless of the default value used here, the user can always override Vitess' configuration to
+// force a specific cache type (e.g. when passing a value in ENTRIES to vtgate, the service will use
+// a LRU cache).
+const DefaultCacheSize = SizeInEntries(5000)


This is the global toggle that will allow us to switch the default cache implementation.

This is clever! Correct me if I'm wrong, but it seems to me that if we toggle this, then the code in vtgate.go would have to change to allow use of the Entries option.

Correct! I should probably flag that. The code in vtgate.go is written so that when the LRU cache is the default, setting a LFU cache flag overrides it. The opposite needs to be true when the LFU cache is the default.

@deepthi Fixed in a3feeb8 so we don't have to change this in the future 👌

vmg · 2021-02-03T17:05:19Z

Open question: how do we ensure that the new cache implementation is green in CI? I've manually verified that switching to the LFU cache by default keeps all the tests green (as it's fully backwards compatible), but ideally there'd be a separate test suite runner that used the new cache by default. The alternative would be to simply duplicate all cache-related unit tests so they run in a single suite with both cache implementations: I don't quite like this idea because there are many side-effects of cache behavior that are tested in end2end tests and that wouldn't be run constantly if we simply add some extra unit tests for the new cache.

deepthi · 2021-02-04T05:46:30Z

In order to ensure backwards compatibility, this PR does not enable the new cache by default so its behavior should be identical to what's in the trunk right now.

If we think this is a better implementation, it is perfectly reasonable to default to the new cache while allowing users to choose the old cache through a command-line option.

deepthi · 2021-02-04T06:00:46Z

Open question: how do we ensure that the new cache implementation is green in CI? I've manually verified that switching to the LFU cache by default keeps all the tests green (as it's fully backwards compatible), but ideally there'd be a separate test suite runner that used the new cache by default. The alternative would be to simply duplicate all cache-related unit tests so they run in a single suite with both cache implementations: I don't quite like this idea because there are many side-effects of cache behavior that are tested in end2end tests and that wouldn't be run constantly if we simply add some extra unit tests for the new cache.

There is no perfect answer to this. If we agree that we should make the new cache the default, then the compromise solution would be

run all unit tests with both implementations
change endtoend tests to run with new cache
duplicate a small number of endtoend tests to run with old cache. If and when we delete the old cache implementation, these tests can then be deleted as well.

Signed-off-by: Vicent Marti <[email protected]>

vmg · 2021-02-04T10:39:27Z

If we think this is a better implementation,

I sure hope so! Otherwise this project would have been kind of pointless. :)

Signed-off-by: Vicent Marti <[email protected]>

vmg · 2021-02-04T11:04:51Z

Merged master which was conflicting after some stats changes.

vmg · 2021-02-04T17:11:19Z

If we think this is a better implementation,

Good news everyone! I brought The Good Graphs (TM) to prove this is a better implementation.

I've synthesized a test environment to reproduce the common case of cache pollution we've been seeing in production environments. It consists of two query streams: one is performing queries sampled from a fixed subset of queries (i.e. it plays the same queries over and over again in a random sequence, similar to a production system, which has a fixed number of "SQL queries" being performed), and the other performs essentially random SQL queries which would (in theory) clutter our cache. In an ideal system, the hit rate for the first stream would be near 100%, whilst the hit rate for the second one must necessarily be 0 since the queries are random, so they will not hit twice in cache.

Let's take a look at the behavior of our hit rates with the old and new cache implementations:

Old cache implementation (LRU)

The orange line represents the stream of polluting queries ramping up. Its effects in the hit rate for "good queries" (the purple line) are immediately noticeable. As soon as we start executing polluting queries, the hit rate for the good queries takes a nosedive: from 90%+ to 40%. The cache has been taken over by the random queries. As soon as we're done executing the polluting queries (when the orange line reaches 100%), the good queries start taking over the cache again, but do so way slower than at the beginning. The cache, at that point, is composed randomly of "good" production queries and pollution, and as more "good" queries come in, our LRU algorithm doesn't know what queries to evict to make space for them, so sometimes it evicts pollution, and sometimes it evicts good queries, further delaying the recovery of our cache back to 90%+ hit rates.

Also worth noting: the response times for the "good queries" is through the roof: it's always higher than the response times for polluting queries, even though the polluted queries have a 0% hit rate. This is because the good queries (in this synthetic benchmark composed of SELECTs) take longer to plan than the polluting queries, which are simple inserts. As soon as all the polluting queries are done executing, the response time for good queries improves dramatically, because they're no longer being planned, they're being served from cache. This is also evidenced by the progress of the two query streams (blue and orange lines): you can see how as soon as the polluting query stream starts ramping up, the "good queries" significantly slow down; a direct result of the reduced cache hit rates.

New cache implementation (LFU)

Now this is podracing. Things worth pointing out: the LFU cache handles the onslaught of polluting queries like a champ! Our hit rate for good queries never goes below 85%! It degrades very slowly over time, as the LFU admission algorithm is probabilistic and hence makes mistakes, but the degradation is extremely slow. Very few polluting queries make it into the cache! Likewise, look at the time it takes to execute the bulk of polluting queries (as measured by the ramp/inclination of the orange line). You can see how the execution time of good queries (blue line) is not affected by the polluting query stream, and how the polluting stream ramps up significantly slower than with the LRU cache. If we look at the scatter points for response times, we can see how the performance of good vs polluting queries has swapped places. The good queries are now consistently faster than the polluting ones, as they should, because they're always being served off cache.

Overall, I think these graphs prove the new implementation is a big win and meets all the success criteria I had designed beforehand. Really happy with the results here -- definitely good enough to switch the new LFU cache to be the new default in this PR, which I intend to do tomorrow morning. 👌

go/cache/cache.go

systay · 2021-02-04T19:31:20Z

is go/hack/runtime.s meant to be there?

systay

Really nice stuff. Could only find minor details and nitpicks.

go/cache/cache.go

go/vt/vttablet/endtoend/config_test.go

go/vt/vtgate/planbuilder/plan_test.go

go/hack/runtime.go

Signed-off-by: Vicent Marti <[email protected]>

vmg · 2021-02-05T11:29:18Z

is go/hack/runtime.s meant to be there?

Yes! This empty Goassembly file tricks the Go compiler into thinking that the other Go files in the local package have external linkage dependencies (which they do -- they're trying to link against the local runtime). I've updated the file with a comment that explains why it's there.

vmg · 2021-02-05T12:12:05Z

@systay Just handled all feedback and improved the way the cache is configured. I would like to merge this as-is with the legacy LRU cache as the default, and next think I'll open a separate PR to switch to the new cache, if that seems reasonable to you.

ben-manes · 2021-02-07T05:43:18Z

This would make for a very cool blog post 😉

vmg added 17 commits February 2, 2021 11:27

cache: abstract into a Cache interface

cbbba9e

The `cache.LRUCache` struct has now been abstracted behind a generic Cache interface so that it can be swapped with more complex Cache implementations. Signed-off-by: Vicent Marti <[email protected]>

tools: do not cache E2E tests between runs

76bd931

Signed-off-by: Vicent Marti <[email protected]>

cache: do not return nil stats

1093c69

Signed-off-by: Vicent Marti <[email protected]>

cache: fix flaky memory usage test

2e3e0c0

Signed-off-by: Vicent Marti <[email protected]>

cache: switch to a new implementation based on Ristretto

d6a13f1

Signed-off-by: Vicent Marti <[email protected]>

endtoend: fix test values

1cade52

Signed-off-by: Vicent Marti <[email protected]>

plan builder: use standard V3 planner

5f2a612

Signed-off-by: Vicent Marti <[email protected]>

cache: make unit test more reliable

4c35cfa

Signed-off-by: Vicent Marti <[email protected]>

cache: speed up clearing large caches

70c8038

Signed-off-by: Vicent Marti <[email protected]>

cache: make the cache implementation swappable

f681e00

Signed-off-by: Vicent Marti <[email protected]>

cache: fix DropUpdates test

7c0c56f

Signed-off-by: Vicent Marti <[email protected]>

cache: use the legacy LRU cache by default

f7ad521

Signed-off-by: Vicent Marti <[email protected]>

cache: remove more unused features

9fe7cc0

Signed-off-by: Vicent Marti <[email protected]>

vmg requested review from harshit-gangal, shlomi-noach, sougou and systay as code owners February 3, 2021 16:59

vmg commented Feb 3, 2021

View reviewed changes

vmg mentioned this pull request Feb 3, 2021

Plan Cache: Replace the cache implementation #7304

Closed

8 tasks

cache: handle default arguments for both cache types

a3feeb8

Signed-off-by: Vicent Marti <[email protected]>

Merge branch 'master' into lfu-cache

d9cd613

Signed-off-by: Vicent Marti <[email protected]>

systay reviewed Feb 4, 2021

View reviewed changes

go/cache/cache.go Outdated Show resolved Hide resolved

systay approved these changes Feb 4, 2021

View reviewed changes

vmg added 5 commits February 5, 2021 11:09

hack: document empty Goassembly file

15dae44

Signed-off-by: Vicent Marti <[email protected]>

cache: allow configuring both entries & size

a8b242d

Signed-off-by: Vicent Marti <[email protected]>

hack: update license header

37db725

Signed-off-by: Vicent Marti <[email protected]>

query_engine: add test for cache pollution

915e019

Signed-off-by: Vicent Marti <[email protected]>

cache: review feedback

431da53

Signed-off-by: Vicent Marti <[email protected]>

deepthi merged commit bca4d98 into vitessio:master Feb 5, 2021

askdba added the Component: Query Serving label Feb 8, 2021

askdba added this to the v10.0 milestone Feb 8, 2021

vmg mentioned this pull request Feb 10, 2021

LFU Cache Fixes #7479

Merged

8 tasks

systay deleted the lfu-cache branch May 6, 2021 16:15

systay restored the lfu-cache branch May 6, 2021 16:15

ajm188 mentioned this pull request Jul 13, 2021

slack vitess v10.pre tinyspeck/vitess#228

Merged

ben-manes mentioned this pull request May 17, 2022

Use Caffeine instead of GuavaCache apache/shardingsphere#17684

Closed

ben-manes mentioned this pull request Sep 26, 2022

Feature Request: cache line optimized CountMin4 #11351

Open

brendar mentioned this pull request Jun 14, 2023

Generate CachedSize function for Placement vindex Shopify/vitess#102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFU Cache Implementation #7439

LFU Cache Implementation #7439

vmg commented Feb 3, 2021

vmg Feb 3, 2021

deepthi Feb 4, 2021

vmg Feb 4, 2021

vmg Feb 4, 2021

vmg commented Feb 3, 2021

deepthi commented Feb 4, 2021

deepthi commented Feb 4, 2021

vmg commented Feb 4, 2021

vmg commented Feb 4, 2021

vmg commented Feb 4, 2021 •

edited

Loading

systay commented Feb 4, 2021

systay left a comment

vmg commented Feb 5, 2021

vmg commented Feb 5, 2021

ben-manes commented Feb 7, 2021

LFU Cache Implementation #7439

LFU Cache Implementation #7439

Conversation

vmg commented Feb 3, 2021

Checklist

Deployment Notes

Impacted Areas in Vitess

vmg Feb 3, 2021

Choose a reason for hiding this comment

deepthi Feb 4, 2021

Choose a reason for hiding this comment

vmg Feb 4, 2021

Choose a reason for hiding this comment

vmg Feb 4, 2021

Choose a reason for hiding this comment

vmg commented Feb 3, 2021

deepthi commented Feb 4, 2021

deepthi commented Feb 4, 2021

vmg commented Feb 4, 2021

vmg commented Feb 4, 2021

vmg commented Feb 4, 2021 • edited Loading

Old cache implementation (LRU)

New cache implementation (LFU)

systay commented Feb 4, 2021

systay left a comment

Choose a reason for hiding this comment

vmg commented Feb 5, 2021

vmg commented Feb 5, 2021

ben-manes commented Feb 7, 2021

vmg commented Feb 4, 2021 •

edited

Loading