-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sketch package #210
Add sketch package #210
Conversation
Pull Request Test Coverage Report for Build 1487Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question, why is a probabilistic data structure needed? In particular, given we're using Redis, I would expect a sliding window counter with sorted sets in Redis to result in a more precise and straightforward implementation.
Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)
sketch/interface.go
line 9 at r2 (raw file):
Go interfaces generally belong in the package that uses values of the interface type, not the package that implements those values.
Source. Should the interface be in the consumer?
sketch/countmin_test.go
line 12 at r2 (raw file):
) func setupTestRedis(t *testing.T) (*redis.Pool, func()) {
This is helpful 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded a bit more on Slack, but this was my original reasoning:
- Memory usage: a count-min sketch size does not depend on how many entries are added
- It's relatively easy for an abusive client to randomize the UA, not only making the rate limiting useless but also causing an explosion in the number of keys
- The individual Redis operations used in this implementation are O(1) -- turns out this is misleading, see below
As I found after re-implementing this with ZSETs and benchmarking both solutions:
- This mapping of the CM Sketch semantics on Redis hashmaps is quite inefficient (with depth=7, we need 7 operations to increment and 7 operations to count)
- The memory usage benefit isn't really there unless the number of keys is in the millions.
- ZSETs are drastically faster (~4x) and the memory usage is actually lower for the scale we operate at. We likely won't have enough entries to start seeing the benefits of a CM sketch.
I would probably still advocate for a CM Sketch if we had access to Redis Enterprise, which provides a native implementation written in C with O(1) operations and much lower memory footprint, but this is not the case.
Considering the above, I'm closing this PR to propose the ZSET-based implementation instead. Thanks!
Reviewable status: 0 of 1 approvals obtained
This PR adds a new package,
sketch
, with a Count-Min Sketch implementation backed by Redis. It's the first part of multiple PRs that will:This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)