-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add immutable LIRS Cache implementation #155
Conversation
new Stack[K](maxSize, backingIndexMap, backingKeyMap) | ||
} | ||
|
||
sealed class Stack[K](maxSize:Int, backingIndexMap:SortedMap[Int, K], backingKeyMap:Map[K, Int]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor, but did you mean to say final
here instead of sealed
? I don't see any subclasses of Stack
in this file. I believe it's the same case for LIRSStacks
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I misunderstood what sealed does. What's the preferred way to make a class private? Just to make it final? Make the constructor private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sealed is usually used when you have a fixed number of implementations, and they must be defined within the same source file. In this case maybe you'd want to just say private[storehaus]
. I'm not sure what the other caches do, but at least you're signaling that it's private to the storehaus package and isn't meant for public consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oscar didn't love all the privates. He suggested tossing it all in a private class if I really care, otherwise just leaving it. What do you think is the best way? I just don't want people to depend on internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's funny that you mention that. Today I saw a pull request come
in for Scala's new pickling support that attempted to make classes private
that were not intended for external consumption. The initial approach was
to mark things as "private[pickling]", but someone commented that you can
actually still get access to them if you do something like "import
pickling._" (there's some esoteric bug that was linked to). So, instead he
suggested putting these private classes in an "internal" package like this:
https://github.com/scala/async/tree/master/src/main/scala/scala/async/internal
On Fri, Oct 11, 2013 at 2:16 PM, Jonathan Coveney
[email protected]:
In
storehaus-cache/src/main/scala/com/twitter/storehaus/cache/LIRSCache.scala:
- * limitations under the License.
- */
+
+package com.twitter.storehaus.cache
+
+import scala.collection.SortedMap
+import scala.annotation.tailrec
+
+object Stack {- def apply[K](maxSize:Int,
backingIndexMap:SortedMap[Int, K] = SortedMap.empty[Int, K],
backingKeyMap:Map[K, Int] = Map.empty[K, Int]) =
- new Stack[K](maxSize, backingIndexMap, backingKeyMap)
+}
+sealed class Stack[K](maxSize:Int, backingIndexMap:SortedMap[Int, K], backingKeyMap:Map[K, Int]) {
Oscar didn't love all the privates. He suggested tossing it all in a
private class if I really care, otherwise just leaving it. What do you
think is the best way? I just don't want people to depend on internals.—
Reply to this email directly or view it on GitHubhttps://github.com//pull/155/files#r6927010
.
Hey @jcoveney, so I took a look at the object BoundedStack {
def apply[A](maxSize: Int): BoundedStack[A] =
new BoundedStack[A](maxSize, Vector.empty[A], Map.empty[A, Int])
}
class BoundedStack[A](
maxSize: Int,
backingStack: Vector[A] = Vector.empty,
backingIndexMap: Map[A, Int] = Map.empty) {
def putOnTop(elem: A): (Option[A], BoundedStack[A]) = {
backingIndexMap.get(elem) match {
case Some(idx) =>
val newBackingStack = dropFromStack(idx) :+ elem
val newIdx = newBackingStack.size - 1
val newBackingIndexMap = backingIndexMap + (elem -> newIdx)
(None, new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
case None =>
if (isFull) {
val oldElem = backingStack.head
val newBackingStack = dropFromStack(0) :+ elem
val newIdx = newBackingStack.size - 1
val newBackingIndexMap = backingIndexMap - oldElem + (elem -> newIdx)
(Some(oldElem), new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
} else {
val newBackingStack = backingStack :+ elem
val newIdx = newBackingStack.size - 1
val newBackingIndexMap = backingIndexMap + (elem -> newIdx)
(None, new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
}
}
}
def dropOldest: (Option[A], BoundedStack[A]) = {
backingStack.headOption match {
case e @ Some(elem) =>
val newBackingStack = dropFromStack(0)
val newBackingIndexMap = backingIndexMap - elem
(e, new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
case None =>
(None, this)
}
}
def remove(elem: A): (Option[A], BoundedStack[A]) = {
backingIndexMap.get(elem) match {
case Some(idx) =>
val newBackingStack = dropFromStack(idx)
val newBackingIndexMap = backingIndexMap - elem
(Some(elem), new BoundedStack(maxSize, newBackingStack, newBackingIndexMap))
case None =>
(None, this)
}
}
def contains(elem: A): Boolean = backingIndexMap.contains(elem)
def isOldest(elem: A): Boolean = backingStack.headOption.exists { _ == elem }
def size: Int = backingStack.size
def isFull: Boolean = size >= maxSize
def isEmpty: Boolean = !isFull
def empty: BoundedStack[A] = new BoundedStack[A](maxSize, Vector.empty, Map.empty)
override def toString = backingStack.mkString(",")
private def dropFromStack(idx: Int): Vector[A] = backingStack.patch(idx, Seq.empty, 1)
} |
Also, |
Hey, I really like this idea. I actually originally implemented it with a Vector as well, as they are a sweet, underused data type. The one thing I'm not sure about is how patch is implemented. It seems like vectors are optimized for efficient operations on the ends, but I'm not sure how they do random removals. If that is efficient, then it is definitely a great data structure to use. I just need to look at the implementation of patch. Do you know if they can drop things efficiently? Because if not, that is the big loss here. |
So, I did a bit of digging into this and came across this thread: https://groups.google.com/forum/#!topic/scala-user/fZ1TTNgneW4 It looks like some people suggest using However, I wonder if we are just being overly paranoid in general? I think we could probably safely just convert your original implementation to use a Long instead of an Int. Let's think about it: Let's assume that a LARSCache gets accessed roughly 100K times a second (very aggressive for a single machine). And, let's assume that each access causes the index to increment. If the cache consistently gets hit 100K times a second for an entire year, that's 3153600000000 (100K * # of seconds in a year = 100K * 31536000). Long.MaxValue is 9223372036854775807. So, at that aggressive rate of 100K hits/s you would end up with Long.MaxValue / 3153600000000 = 2924712 years before it overflows! I am leaning towards switching to |
How about we split the difference: trait IdProvider[T] {
def next: (T, IdProvider[T])
def cull(t: T): IdProvider[T]
}
case class LongProvider(next: Long) extends IdProvider[Long] {
def next = (next, LongProvider(next + 1L))
def cull(l: Long) = this
}
object IdProvider {
implicit def longCounter: IdProvider[Long] = LongProvider(0L)
} Then implement this typeclass (which you have already done) for CyclicIncrement[T] so we have: object CyclicIncrement {
implicit def idProvider[T](implicit anyNeededHere, or make in just T == Int): IdProvider[CyclicIncrement[T]] = // wrap your CyclicProvider
} Lastly, Stack can be made to use a generic provider with the typeclass pattern, and Ryan can inject the LongProvider by using the Id type == Long, and you can use Cyclic with Id = CyclicIncrement[Int] |
That's a really nice way to split the difference and a sweet use of the typeclass pattern. I really wish Scala had a import scala.collection.mutable
object BoundedStack {
def apply[A](maxSize: Int): BoundedStack[A] = new BoundedStack[A](maxSize)
}
class BoundedStack[A](maxSize: Int) {
private val backingMap = mutable.LinkedHashMap.empty[A, A]
def putOnTop(elem: A): Option[A] = {
backingMap.get(elem) match {
case Some(_) =>
backingMap -= elem
insert(elem)
None
case None =>
if (isFull) {
val oldElem = dropOldest
insert(elem)
oldElem
} else {
insert(elem)
None
}
}
}
def dropOldest: Option[A] = {
backingMap.headOption match {
case Some((elem, _)) =>
backingMap -= elem
Some(elem)
case None =>
None
}
}
def remove(elem: A): Option[A] = {
backingMap.get(elem) match {
case s @ Some(_) =>
backingMap -= elem
s
case None =>
None
}
}
def contains(elem: A): Boolean = backingMap.contains(elem)
def isOldest(elem: A): Boolean = backingMap.keys.headOption.exists { _ == elem }
def size: Int = backingMap.size
def isFull: Boolean = size >= maxSize
def isEmpty: Boolean = !isFull
def empty: BoundedStack[A] = new BoundedStack[A](maxSize)
override def toString = backingMap.keys.mkString(",")
private def insert(elem: A) { backingMap += elem -> elem }
} |
Another data structure that could work is the one that clojure uses, the priority map: https://github.com/clojure/data.priority-map |
As far as my thoughts: I agree that going with the trait is the way to go, good suggestion oscar. That said, I don't think we should go with the vector approach. And yeah, I'd be curious to compare the immutable implementation with the mutable one...though the wins from making it immutable, of course, transcend performance concerns. |
Indeed. :) Agree with everything! |
Looks like we are converging here. Nice. One thought: rather than Just an idea. Maybe IdProvider is more to the point, but I liked the analogy to a kind of abstract Time. |
I incorporated the clock idea, added some tests, and I think it's good to go. If someone (maybe me!) implements a priority map, we can use that in these cache implementations (and in scala period, hah) and it'd be really cool. |
Nice! I like it. Great work. Immutable data structures are so much fun. I had fun writing an immutable patricia trie (prefix map) over the weekend: https://gist.github.com/ryanlecompte/7287415 |
} | ||
|
||
trait IdProvider[@specialized(Int, Long) T] extends Clock[T, IdProvider[T]] { | ||
def cull(oldId: T): IdProvider[T] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on the meaning here. (A bit like "free" right? in a malloc sense).
Note that this cannot be included until a new Algebird is pushed b/c it uses Successible. |
With the new algebird release, this now passes and can be merged :) |
object CyclicIncrementProvider { | ||
def intIncrementer: CyclicIncrementProvider[Int] = CyclicIncrementProvider(0) | ||
|
||
def apply[@specialized(Int, Long) K: Ordering: Successible](zero: K): CyclicIncrementProvider[K] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't Successible have an ordering? You could weaken this to Successible and just get the ordering from there, right?
Add immutable LIRS Cache implementation
By the way, it would be nice to have some benchmarking code for cache hit-rate for a variety of distributions on key space (e.g. probability of seeing key i is ~ 1/i for N keys). |
This fixes #97