Support a Least Frequently Used (LFU) based cache evictions #1110

gissuebot · 2014-10-31T17:35:27Z

Original issue created by chirino on 2012-08-15 at 09:27 PM

LFU based evictions are sometimes a better option than LRU evictions.

gissuebot · 2014-11-01T00:29:16Z

Original comment posted by wasserman.louis on 2012-08-15 at 09:46 PM

I'm having difficulty thinking of an actually efficient way to support that?

gissuebot · 2014-11-01T00:29:22Z

Original comment posted by chirino on 2012-08-16 at 12:58 AM

Perhaps this paper can supply some inspiration:
http://dhruvbird.com/lfu.pdf

gissuebot · 2014-11-01T00:31:14Z

Original comment posted by wasserman.louis on 2012-08-20 at 08:42 PM

...We may investigate this. That said, CacheBuilder does not specify its eviction ordering, and leaves it open for implementations to decide: "As the cache size grows close to the maximum, the cache evicts entries that are less likely to be used again. For example, the cache may evict an entry because it hasn't been used recently or very often."

gissuebot · 2014-11-01T00:41:38Z

Original comment posted by [email protected] on 2012-10-23 at 04:16 PM

Before we play with alternate eviction policies, we need to develop a means for evaluating the impact of the different policies.

Status: Research

gissuebot · 2014-11-01T00:41:44Z

Original comment posted by [email protected] on 2012-10-23 at 04:19 PM

(That note is just to explain why we won't be getting to this for a little while.)

gissuebot · 2014-11-01T00:42:42Z

Original comment posted by anthony.musyoki on 2012-10-24 at 07:01 AM

Maybe the below would speed up the research.

https://tech.dropbox.com/2012/10/caching-in-theory-and-practice/

gissuebot · 2014-11-01T00:42:53Z

Original comment posted by [email protected] on 2012-10-24 at 09:06 PM

Awesome, thank you.

gissuebot · 2014-11-01T01:10:29Z

Original comment posted by [email protected] on 2013-03-12 at 06:43 PM

(No comment entered for this change.)

CC: [email protected]

ben-manes · 2015-07-27T06:29:37Z

@kluever @lowasser @kevinb9n

Before we play with alternate eviction policies, we need to develop a means for evaluating the impact of the different policies.

I've been working on a simulator so that an evaluation can be made. What I primarily need help in is acquiring trace data so that we can make an informed decision.

The simulator is multi-threaded, uses either a synthetic or trace file data source, has simple interfaces for adding new policies and trace formats, writes a report of the stats to the console or file, and is driven entirely through a configuration file.

The simple eviction and admission policies are implemented and I hope to start on the more advanced ones soon. It is easy to capture additional statistics and for now I've kept it to the simple case of a maximum size. Adding weights, expiration, and coherence support shouldn't be too difficult but I thought they should be added after most of the policies were complete. It would also be useful to track the average eviction penalty, e.g. CLOCK-based policies have O(n) worst case evictions.

For example using the Wikipedia request trace yields the report below. First we see the upper bound due to compulsory misses (Unbounded) followed by the theoretical bound (Clairvoyant). We see that using an admission policy (TinyLfu) can cheaply give us near optimal efficiency, but other traces will highlight its weak temporal locality (making it good for secondary caches, but not as a primary cache's policy). We then see that 2Q style policies, which all advanced policies tend to follow, provide a close approximation. Finally we see that the sampled policies can, by mere luck, sometimes beat their strict versions. We can conclude that a member of the 2Q family (which includes LIRS and ARC) will probably be the approach to adopt.

Kevin's original proposal was to generalize this as a tool that developers could run to size their caches. This was back during the transition from soft reference collection to a maximum size as the preferred approach. I'm not sure how useful that would be, but it would be a fairly trivial enhancement.

╔═════════════════════════════╤══════════╤════════════╤═══════════╤═════════╗
║ Policy                      │ Hit rate │ Requests   │ Evictions │ Time    ║
╠═════════════════════════════╪══════════╪════════════╪═══════════╪═════════╣
║ opt.Unbounded               │ 85.90 %  │ 10,430,564 │ 0         │ 6.759 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ opt.Clairvoyant             │ 59.05 %  │ 10,430,564 │ 4,270,817 │ 12.57 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Lfu_TinyLfu         │ 57.07 %  │ 10,430,564 │ 4,477,281 │ 22.83 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Lru_TinyLfu         │ 57.07 %  │ 10,430,564 │ 4,477,778 │ 23.13 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Random_TinyLfu      │ 57.06 %  │ 10,430,564 │ 4,478,042 │ 19.18 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Clock_TinyLfu        │ 57.05 %  │ 10,430,564 │ 4,479,825 │ 13.01 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.SegmentedLru_TinyLfu │ 57.05 %  │ 10,430,564 │ 4,479,941 │ 12.41 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Lru_TinyLfu          │ 57.04 %  │ 10,430,564 │ 4,480,610 │ 12.94 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ irr.JackrabbitLirs          │ 56.44 %  │ 10,430,564 │ 4,543,447 │ 11.65 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Mru_TinyLfu         │ 56.08 %  │ 10,430,564 │ 4,580,282 │ 23.46 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Fifo_TinyLfu        │ 55.92 %  │ 10,430,564 │ 4,597,168 │ 21.98 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ two-queue.TwoQueue          │ 55.57 %  │ 10,430,564 │ 4,601,748 │ 6.642 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Mfu_TinyLfu         │ 55.06 %  │ 10,430,564 │ 4,687,117 │ 23.03 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Mru_TinyLfu          │ 54.06 %  │ 10,430,564 │ 4,791,403 │ 13.56 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Lfu                 │ 53.80 %  │ 10,430,564 │ 4,818,750 │ 16.07 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.SegmentedLru         │ 53.30 %  │ 10,430,564 │ 4,870,549 │ 6.710 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Clock                │ 47.80 %  │ 10,430,564 │ 5,444,026 │ 6.721 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Fifo_TinyLfu         │ 46.86 %  │ 10,430,564 │ 5,541,789 │ 12.36 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Lru                  │ 46.62 %  │ 10,430,564 │ 5,567,011 │ 7.074 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Lru                 │ 46.46 %  │ 10,430,564 │ 5,583,946 │ 17.31 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Fifo                 │ 41.87 %  │ 10,430,564 │ 6,062,391 │ 6.105 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Fifo                │ 41.87 %  │ 10,430,564 │ 6,062,579 │ 18.82 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Random              │ 41.87 %  │ 10,430,564 │ 6,062,860 │ 12.26 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Mfu                 │ 31.57 %  │ 10,430,564 │ 7,137,013 │ 20.46 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ sampled.Mru                 │ 26.75 %  │ 10,430,564 │ 7,639,960 │ 21.53 s ║
╟─────────────────────────────┼──────────┼────────────┼───────────┼─────────╢
║ linked.Mru                  │ 25.38 %  │ 10,430,564 │ 7,782,358 │ 7.025 s ║
╚═════════════════════════════╧══════════╧════════════╧═══════════╧═════════╝

Executed in 30.36 s

ben-manes · 2015-09-15T06:06:41Z

@kevinb9n
After exhaustive evaluation, the two best algorithms are LIRS and EdenQueue. The problem with LIRS is that it requires 3x the cache size to reach its full potential, where 2/3rds are non-resident entries (evicted keys). This incurs a lot of complexity, in addition to it being a non-trivial algorithm to implement correctly.

EdenQueue (name in flux) is my design based on TinyLfu. It uses an small LRU window that evicts to a main LRU queue, guarded by an optimized TinyLfu admission policy. I wrote a 4-bit CountMinSketch that is both memory and cpu efficient. This is used for retaining the frequency history and requires 1 long per unit of capacity (16 counters). The hit rate is near optimal across the board, is simple, has a low footprint, is O(1), and does not require retaining evicted keys.

Having talked to Doug about this, we think its the right approach to move forward on.

Jaskey · 2018-08-01T04:12:05Z

I need LFU in guava, is this implemented in the latest guava version?

ben-manes · 2018-08-01T04:40:29Z

@Jaskey https://github.com/ben-manes/caffeine/wiki/Efficiency

cpovirk · 2019-07-25T11:11:14Z

We recommend that people use Caffeine.

gissuebot added migrated type=enhancement Make an existing feature better status=research package=cache labels Nov 1, 2014

cgdecker removed the migrated label Nov 1, 2014

ben-manes mentioned this issue Jul 30, 2015

Wrong RemovalCause published on cache.invaldateAll() #2101

Closed

cpovirk closed this as completed Jul 25, 2019

cpovirk added status=will-not-fix and removed status=research labels Jul 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support a Least Frequently Used (LFU) based cache evictions #1110

Support a Least Frequently Used (LFU) based cache evictions #1110

gissuebot commented Oct 31, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

ben-manes commented Jul 27, 2015

ben-manes commented Sep 15, 2015

Jaskey commented Aug 1, 2018

ben-manes commented Aug 1, 2018

cpovirk commented Jul 25, 2019

Support a Least Frequently Used (LFU) based cache evictions #1110

Support a Least Frequently Used (LFU) based cache evictions #1110

Comments

gissuebot commented Oct 31, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

gissuebot commented Nov 1, 2014

ben-manes commented Jul 27, 2015

ben-manes commented Sep 15, 2015

Jaskey commented Aug 1, 2018

ben-manes commented Aug 1, 2018

cpovirk commented Jul 25, 2019