You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #7439 by @vmg, a TinyLfu cache was copied into the repository based on Ristretto (likely inactive due to leadership change).
The original frequency sketch in Caffeine was often ported directly or else used it as a reference if wanting to figure it out themselves. In all of these the counters are distributed uniformly across the table. This has the benefit of giving the greatest opportunity for avoiding hash collisions, which could increase the error rate for popularity estimates. However this also meant that each counter is at an unpredictable location and is very likely to cause a hardware cache miss. This was somewhat mitigated due to the maintenance performing a batch of work, so temporal locality for popular items may be helpful.
The improved sketch uniformly selects a 64 byte block from which all of an item's counters reside. This is the size of the L1 cache line so that only one memory access is needed. The counters are selected from 16 byte segments using a secondary hash. In my benchmarks this improves the frequency estimation and increment throughput by up to 2.5x. The eviction throughput is 17% faster, but the get/put is more difficult to observe since that the maintenance is non-blocking (observable on a low core CI, less so elsewhere).
Java does not yet support using SIMD instructions (preview), but that could be used for an additional speed up. A .NET developer ported an earlier prototype where he bitfaster/BitFaster.Caching#271 the operation times were cut in half.
It is unlikely that users will observe users a speedup, but was a fun optimization that I forgot to leverage until reviewing an AVX2 port.
Use Case(s)
Improved cache speed and cpu resource utilization.
The text was updated successfully, but these errors were encountered:
What a fantastic suggestion, Ben. Thanks for all your hard work and research around caching!
I'm not sure how fast I can get to this but I'm going to pin it on my TODO list. There' s a Vitess release shipping this week so hopefully I can improve this before our next release. 🥲
Feature Description
In #7439 by @vmg, a TinyLfu cache was copied into the repository based on Ristretto (likely inactive due to leadership change).
The original frequency sketch in Caffeine was often ported directly or else used it as a reference if wanting to figure it out themselves. In all of these the counters are distributed uniformly across the table. This has the benefit of giving the greatest opportunity for avoiding hash collisions, which could increase the error rate for popularity estimates. However this also meant that each counter is at an unpredictable location and is very likely to cause a hardware cache miss. This was somewhat mitigated due to the maintenance performing a batch of work, so temporal locality for popular items may be helpful.
The improved sketch uniformly selects a 64 byte block from which all of an item's counters reside. This is the size of the L1 cache line so that only one memory access is needed. The counters are selected from 16 byte segments using a secondary hash. In my benchmarks this improves the frequency estimation and increment throughput by up to 2.5x. The eviction throughput is 17% faster, but the get/put is more difficult to observe since that the maintenance is non-blocking (observable on a low core CI, less so elsewhere).
Java does not yet support using SIMD instructions (preview), but that could be used for an additional speed up. A .NET developer ported an earlier prototype where he bitfaster/BitFaster.Caching#271 the operation times were cut in half.
It is unlikely that users will observe users a speedup, but was a fun optimization that I forgot to leverage until reviewing an AVX2 port.
Use Case(s)
Improved cache speed and cpu resource utilization.
The text was updated successfully, but these errors were encountered: