Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add immutable LIRS Cache implementation #155

Merged
merged 22 commits into from
Dec 16, 2013
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
/*
* Copyright 2013 Twitter Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License. You may obtain
* a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package com.twitter.storehaus.cache

import scala.collection.SortedMap
import scala.annotation.tailrec

object Stack {
def apply[K](maxSize:Int,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that Stack has a maxSize, I wonder if it makes sense to not have an ever-increasing index value in backingIndexMap and backingKeyMap, but instead re-use the integers within (0 to maxSize) to avoid overflow for very large maps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, and I definitely agree. There's a bit more complication to it than than, though.
You could have something with the inc 1, then another thing with the inc 2, then you thing with inc 2 starts cycling, so it goes up to 3, then 4, then 5, then it rolls over...how do you know that it's safe to roll over?

I've seen general schemes for this from the world of threading. It's not too hard, but it will add a bit of complexity to the code. If it's worth it, I'll do it (I think it's worth it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought of that too. It's a bit tricky. I'm also curious what the clojure implementation of this does. Also, do we want to make this cache thread-safe? We could use the spin lock implementation Atomic.scala (same package) if we needed to.

On Oct 10, 2013, at 6:15 PM, Jonathan Coveney [email protected] wrote:

In storehaus-cache/src/main/scala/com/twitter/storehaus/cache/LIRSCache.scala:

  • * http://www.apache.org/licenses/LICENSE-2.0
  • * Unless required by applicable law or agreed to in writing, software
  • * distributed under the License is distributed on an "AS IS" BASIS,
  • * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  • * See the License for the specific language governing permissions and
  • * limitations under the License.
  • */
    +
    +package com.twitter.storehaus.cache
    +
    +import scala.collection.SortedMap
    +import scala.annotation.tailrec
    +
    +object Stack {
  • def apply[K](maxSize:Int,
    I thought about this, and I definitely agree. There's a bit more complication to it than than, though.
    You could have something with the inc 1, then another thing with the inc 2, then you thing with inc 2 starts cycling, so it goes up to 3, then 4, then 5, then it rolls over...how do you know that it's safe to roll over?

I've seen general schemes for this from the world of threading. It's not too hard, but it will add a bit of complexity to the code. If it's worth it, I'll do it (I think it's worth it).


Reply to this email directly or view it on GitHub.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it looks like the existing LRUCache is also using an unbounded index (although it's a Long, not an Int). So, might not really be something that would need to be changed before this cache gets merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a CyclicIncrementer which solves this problem in a pretty generic way. Want me to use it for LRUCache? Also, thread safe in what way? It's immutable! ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I'll check out the CyclicIncrementer now. It's up to Oscar if he
wants to use it for LRUCache. :) Also, sorry I was thinking of the thread
safety issue from the consumer's perspective, but I think that's probably
outside of the scope of what storehaus is providing. For example, if I want
to use this LIRSCache in a system where items are being inserted by
multiple threads, I have to ensure that I use something like Atomic (i.e.,
java.util.concurrent.AtomicReference) to ensure that multiple puts/writes
are not "lost" among the new instances of the cache getting returned on
each put operation.

On Fri, Oct 11, 2013 at 2:15 PM, Jonathan Coveney
[email protected]:

In
storehaus-cache/src/main/scala/com/twitter/storehaus/cache/LIRSCache.scala:

  • * http://www.apache.org/licenses/LICENSE-2.0
  • * Unless required by applicable law or agreed to in writing, software
  • * distributed under the License is distributed on an "AS IS" BASIS,
  • * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  • * See the License for the specific language governing permissions and
  • * limitations under the License.
  • */
    +
    +package com.twitter.storehaus.cache
    +
    +import scala.collection.SortedMap
    +import scala.annotation.tailrec
    +
    +object Stack {
  • def apply[K](maxSize:Int,

I added a CyclicIncrementer which solves this problem in a pretty generic
way. Want me to use it for LRUCache? Also, thread safe in what way? It's
immutable! ;)


Reply to this email directly or view it on GitHubhttps://github.com//pull/155/files#r6926979
.

backingIndexMap:SortedMap[Int, K] = SortedMap.empty[Int, K],
backingKeyMap:Map[K, Int] = Map.empty[K, Int]) =
new Stack[K](maxSize, backingIndexMap, backingKeyMap)
}

sealed class Stack[K](maxSize:Int, backingIndexMap:SortedMap[Int, K], backingKeyMap:Map[K, Int]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but did you mean to say final here instead of sealed? I don't see any subclasses of Stack in this file. I believe it's the same case for LIRSStacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood what sealed does. What's the preferred way to make a class private? Just to make it final? Make the constructor private?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sealed is usually used when you have a fixed number of implementations, and they must be defined within the same source file. In this case maybe you'd want to just say private[storehaus]. I'm not sure what the other caches do, but at least you're signaling that it's private to the storehaus package and isn't meant for public consumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oscar didn't love all the privates. He suggested tossing it all in a private class if I really care, otherwise just leaving it. What do you think is the best way? I just don't want people to depend on internals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it's funny that you mention that. Today I saw a pull request come
in for Scala's new pickling support that attempted to make classes private
that were not intended for external consumption. The initial approach was
to mark things as "private[pickling]", but someone commented that you can
actually still get access to them if you do something like "import
pickling._" (there's some esoteric bug that was linked to). So, instead he
suggested putting these private classes in an "internal" package like this:
https://github.com/scala/async/tree/master/src/main/scala/scala/async/internal

On Fri, Oct 11, 2013 at 2:16 PM, Jonathan Coveney
[email protected]:

In
storehaus-cache/src/main/scala/com/twitter/storehaus/cache/LIRSCache.scala:

  • * limitations under the License.
  • */
    +
    +package com.twitter.storehaus.cache
    +
    +import scala.collection.SortedMap
    +import scala.annotation.tailrec
    +
    +object Stack {
  • def apply[K](maxSize:Int,
  •           backingIndexMap:SortedMap[Int, K] = SortedMap.empty[Int, K],
    
  •           backingKeyMap:Map[K, Int] = Map.empty[K, Int]) =
    
  • new Stack[K](maxSize, backingIndexMap, backingKeyMap)
    +}

+sealed class Stack[K](maxSize:Int, backingIndexMap:SortedMap[Int, K], backingKeyMap:Map[K, Int]) {

Oscar didn't love all the privates. He suggested tossing it all in a
private class if I really care, otherwise just leaving it. What do you
think is the best way? I just don't want people to depend on internals.


Reply to this email directly or view it on GitHubhttps://github.com//pull/155/files#r6927010
.

/**
* Adds k to the top of the stack. If k is already in the Stack,
* it will be put on the top. If k was not in the Stack and the
* Stack was full, returns the evicted member.
*/
def putOnTop(k: K): (Option[K], Stack[K]) = {
val newInc = (if (backingIndexMap.isEmpty) 0 else backingIndexMap.last._1) + 1
backingKeyMap.get(k) match {
case Some(oldInc) => {
val newBackingIndexMap = backingIndexMap - oldInc + (newInc->k)
val newBackingKeyMap = backingKeyMap + (k->newInc)
(None, new Stack(maxSize, newBackingIndexMap, newBackingKeyMap))
}
case None => {
if (isFull) {
val (oldInc, lastK) = backingIndexMap.head
val newBackingIndexMap = backingIndexMap - oldInc + (newInc->k)
val newBackingKeyMap = backingKeyMap - lastK + (k->newInc)
(Some(lastK), new Stack(maxSize, newBackingIndexMap, newBackingKeyMap))
} else {
val newBackingIndexMap = backingIndexMap + (newInc->k)
val newBackingKeyMap = backingKeyMap + (k->newInc)
(None, new Stack(maxSize, newBackingIndexMap, newBackingKeyMap))
}
}
}
}

def dropOldest: (Option[K], Stack[K]) = {
if (isEmpty) {
(None, this)
} else {
val (lastInc, lastK) = backingIndexMap.head
(Some(lastK), new Stack(maxSize, backingIndexMap - lastInc, backingKeyMap - lastK))
}
}

def remove(k: K): (Option[K], Stack[K]) = {
backingKeyMap.get(k) match {
case Some(inc) => (Some(k), new Stack(maxSize, backingIndexMap - inc, backingKeyMap - k))
case None => (None, this)
}
}

def contains(k: K): Boolean = backingKeyMap.contains(k)

def isOldest(k: K): Boolean = if (!isEmpty) { backingIndexMap.head._2 == k } else false

def size = backingIndexMap.size

def isFull = size >= maxSize

def isEmpty = size <= 0

def empty = new Stack[K](maxSize, backingIndexMap.empty, backingKeyMap.empty)

override def toString = backingIndexMap.map { _._2 }.mkString(",")
}

object LIRSStacks {
def apply[K](sSize:Int, qSize:Int) = new LIRSStacks[K](Stack[K](sSize), Stack[K](qSize))
}

sealed class LIRSStacks[K](val stackS: Stack[K], val stackQ: Stack[K]) {
@tailrec
final def prune: LIRSStacks[K] = {
val (oldK, newStackS) = stackS.dropOldest
oldK match {
case Some(k) if stackQ.contains(k) => new LIRSStacks(newStackS, stackQ).prune
case _ => this //We don't need to remove as there is either nothing to remove, or an LIR block is on the bottom
}
}

def putOnTopOfS(k: K): (Option[K], LIRSStacks[K]) = {
val (optK, newStackS) = stackS.putOnTop(k)
(optK, new LIRSStacks(newStackS, stackQ))
}

def putOnTopOfQ(k: K): (Option[K], LIRSStacks[K]) = {
val (optK, newStackQ) = stackQ.putOnTop(k)
(optK, new LIRSStacks(stackS, newStackQ))
}

def dropOldestInS: (Option[K], LIRSStacks[K]) = {
val (optK, newStackS) = stackS.dropOldest
(optK, new LIRSStacks(newStackS, stackQ))
}

def dropOldestInQ: (Option[K], LIRSStacks[K]) = {
val (optK, newStackQ) = stackQ.dropOldest
(optK, new LIRSStacks(stackS, newStackQ))
}

def removeFromS(k: K): (Option[K], LIRSStacks[K]) = {
val (optK, newStackS) = stackS.remove(k)
(optK, new LIRSStacks(newStackS, stackQ))
}

def removeFromQ(k: K): (Option[K], LIRSStacks[K]) = {
val (optK, newStackQ) = stackQ.remove(k)
(optK, new LIRSStacks(stackS, newStackQ))
}

def isOldestInS(k: K): Boolean = stackS.isOldest(k)

def isOldestInQ(k: K): Boolean = stackQ.isOldest(k)

def isSFull: Boolean = stackS.isFull

def isQFull: Boolean = stackQ.isFull

def isInS(k: K): Boolean = stackS.contains(k)

def isInQ(k: K): Boolean = stackQ.contains(k)

def evict(k: K): LIRSStacks[K] = new LIRSStacks(stackS.remove(k)._2, stackQ.remove(k)._2)

def empty = new LIRSStacks(stackS.empty, stackQ.empty)

override def toString = "S:["+stackS+"] Q:["+stackQ+"]"
}

object LIRSCache {
def apply[K, V](maxSize:Int, sPercent:Double, backingMap:Map[K, V] = Map.empty[K,V]) = {
val sSize = (maxSize * sPercent).toInt
val qSize = maxSize - sSize
require(sSize > 0, "Size of S stack in cache must be >0")
require(qSize > 0, "Size of Q stack in cache must be >0")
new LIRSCache[K, V](LIRSStacks[K](sSize, qSize), backingMap)
}
}

/**
* This is an implementation of an immutable LIRS Cache based on the LIRS Cache impelementation
* in Clojure's core.cache:
* https://github.com/clojure/core.cache/blob/master/src/main/clojure/clojure/core/cache.clj.
* The cache is described in this paper:
* http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=EA23F554FDF98A258C6FDF0C8E98BFD1?doi=10.1.1.116.2184&rep=rep1&type=pdf
*/

class LIRSCache[K, V](lirsStacks:LIRSStacks[K], backingMap: Map[K, V]) extends Cache[K, V] {
def get(k: K): Option[V] = backingMap.get(k)

def put(kv: (K, V)): (Set[K], Cache[K, V]) = {
val (k, v) = kv
def miss:(Set[K], Cache[K, V]) = {
if (!lirsStacks.isSFull) {
// We know that S is not full, so we should not need to worry about eviction.
// In this case, S is not full, so we just add our key to S, and put the kv in the cache
val (older, newLirsStacks) = lirsStacks.putOnTopOfS(k)
older match {
case None => (Set.empty[K], new LIRSCache(newLirsStacks, backingMap + kv))
case _ => throw new IllegalStateException("Stack was not full, yet evicted when element was added")
}
} else {
val (evictedFromS, newLirsStacks1) = lirsStacks.putOnTopOfS(k)
evictedFromS match {
case Some(oldK) => {
// We know that S is full, and that k is not in S. We add our thing to Q, and take out of the cache anything that comes out.
// We then add it to S, and if we get something out, take it out of the cache if it is also not in Q. We add kv to the cache.
val (evictedFromQ, newLirsStacks2) = newLirsStacks1.putOnTopOfQ(k)
val (evicted1, newBackingMap1) = evictedFromQ match {
case Some(oldK) => (Set(oldK), backingMap - oldK)
case None => (Set.empty[K], backingMap)
}
val (evicted2, newBackingMap2) =
if (newLirsStacks2.isInQ(oldK)) {
(evicted1, newBackingMap1)
} else {
(evicted1 + oldK, newBackingMap1 - oldK)
}
(evicted2, new LIRSCache(newLirsStacks2, newBackingMap2 + kv))
}
case None => {
// We know that S is full, and that k is in S. This is a non-resident HIR block. We have bumped k in S and got nothing
// back. Then we take the oldest element of S and add it to Q. If we get something back, we take it out of the cache.
// Then we prune.
val (oldestInS, newLirsStacks2) = newLirsStacks1.dropOldestInS
oldestInS match {
case Some(oldKinS) => {
val (evictedFromQ, newLirsStacks3) = newLirsStacks2.putOnTopOfQ(oldKinS)
val (evicted, newBackingMap1) = evictedFromQ match {
case Some(oldKinQ) => (Set(oldKinQ), backingMap - oldKinQ)
case None => (Set.empty[K], backingMap)
}
(evicted, new LIRSCache(newLirsStacks3.prune, newBackingMap1))
}
case None => (Set.empty[K], new LIRSCache(newLirsStacks2.prune, backingMap))
}
}
}
}
}
get(k) match {
case Some(oldV) => if (oldV != v) miss else (Set.empty[K], this)
case None => miss
}
}

def hit(k: K): Cache[K, V] =
get(k).map { v =>
if (lirsStacks.isInS(k) && !lirsStacks.isInQ(k)) {
// In the case where k is in S but not in Q, it is a LIR block. We push it to the top of S, and
// prune if it was the oldest in S.
val (evictedFromS, newLirsStacks) = lirsStacks.putOnTopOfS(k)
if (evictedFromS.isDefined) {
throw new IllegalStateException("Nothing should have been evicted from S when k was bumped as it was already present")
}
new LIRSCache(if (lirsStacks.isOldestInS(k)) newLirsStacks.prune else newLirsStacks, backingMap)
} else if (lirsStacks.isInS(k) && lirsStacks.isInQ(k)) {
// In the case where k is in S and Q, it is an HIR block. We bump k to the top of S and remove it from Q.
// We then move the oldest value in S to Q. Then we prune.
val (evictedFromS, newLirsStacks) = lirsStacks.putOnTopOfS(k)
if (evictedFromS.isDefined) {
throw new IllegalStateException("Nothing should have been evicted from S when k was bumped as it was already present")
}
val (evictedFromQ, newLirsStacks2) = newLirsStacks.removeFromQ(k)
if (!evictedFromQ.isDefined) {
throw new IllegalStateException("Key was not evicted from Q despite it being present")
}
val (oldestInS, newLirsStacks3) = newLirsStacks2.dropOldestInS
oldestInS match {
case Some(oldK) => {
val (evictedFromQ2, newLirsStacks4) = newLirsStacks3.putOnTopOfQ(oldK)
if (evictedFromQ2.isDefined) {
throw new IllegalStateException("Nothing should have been evicted when we put the value from S on top of Q")
}
new LIRSCache(newLirsStacks4.prune, backingMap)
}
case None => throw new IllegalStateException("We dropped the oldest value in S but got nothing back")
}
} else if (!lirsStacks.isInS(k) && lirsStacks.isInQ(k)) {
// In the case where k is not in S but it is in Q it is a non-resident HIR block. We bump it to the top of Q, and put it in
// S. If anything is evicted from S in the process, we check if it is in Q. If it is not, we remove it from the cache.
val (evictedFromQ, newLirsStacks) = lirsStacks.putOnTopOfQ(k)
if (evictedFromQ.isDefined) {
throw new IllegalStateException("We bumped the value to the top of Q, so nothing should have come back")
}
val (evictedFromS, newLirsStacks2) = newLirsStacks.putOnTopOfS(k)
val newBackingMap = evictedFromS match {
case Some(oldK) if !newLirsStacks2.isInQ(k) => backingMap - oldK
case _ => backingMap
}
new LIRSCache(newLirsStacks2, newBackingMap)
} else {
throw new IllegalStateException("Key in cache, but not in Stack S or Stack Q. Key: " + k)
}
}.getOrElse(this)

def evict(k: K): (Option[V], Cache[K, V]) =
(get(k), new LIRSCache(lirsStacks.evict(k), backingMap - k))

def iterator: Iterator[(K, V)] = backingMap.iterator

def empty: Cache[K, V] = new LIRSCache(lirsStacks.empty, backingMap.empty)

override def toString = {
val pairStrings = iterator.map { case (k, v) => k + " -> " + v }
"LIRSCache(" + pairStrings.toList.mkString(", ") + ")"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -61,5 +61,6 @@ object CacheProperties extends Properties("Cache") {
}

property("LRUCache obeys the cache laws") = cacheLaws(LRUCache[String,Int](10))
property("LIRSCache obeys the cache laws") = cacheLaws(LIRSCache[String,Int](10, .8))
property("TTLCache obeys the cache laws") = cacheLaws(TTLCache[String, Int](10))
}