Skip to content

Latest commit

 

History

History
227 lines (142 loc) · 6 KB

README.md

File metadata and controls

227 lines (142 loc) · 6 KB

Introduction

This gem makes it easier to do bit-wise operations on large Redis bitsets, usually called bitmaps, with a natural expression syntax. It also supports huge sparse bitmaps by storing data in multiple keys, called chunks, per bitmap.

The typical use is real-time web analytics where each bit in a bitmap/bitset corresponds to a user (introductory article here). This library isn't an analytic package though, it's more low level than that and you can use it for anything.

This library is under development and its interface might change.

Quick start/pitch

Why use the library?

require 'redis/bitops'
redis = Redis.new

b1 = redis.sparse_bitmap("b1")
b1[128_000_000] = true
b2 = redis.sparse_bitmap("b2")
b2[128_000_000] = true
result = b1 & b2

Memory usage: about 20kb because it uses a sparse bitmap implementation using chunks of data.

Let's go crazy with super-complicated expressions:

...
result = (b1 & ~b2) | b3 (b4 | (b5 & b6 & b7 & ~b8))

Imagine writing this expression using Redis#bitop!

Installation

To install the gem:

gem install redis-bitops

To use it in your code:

require 'redis/bitops'

Usage

Reference: here

Basic example

An example is often better than theory so here's one. Let's create a few bitmaps and set their individual bits; we'll use those bitmaps in the examples below:

redis = Redis.new

a = redis.bitmap("a")
b = redis.bitmap("b")
result = redis.bitmap("result")

b[0] = true; b[2] = true; b[7] = true # 10100001
a[0] = true; a[1] = true; a[7] = true # 11000001

So, now here's a very simple expression:

c = a & b

You may be surprised but the above statement does not query Redis at all! The expression is lazy-evaluated when you access the result:

puts c.bitcount # => 2
puts c[0] # => true
puts c[1] # => false
puts c[2] # => false
puts c[7] # => false

So, in the above example, the call to c.bitcount happens to be the first moment when Redis is queried. The result is stored under a temporary unique key.

puts c.root_key # => "redis:bitops:8eef38u9o09334"

Let's delete the temporary result:

c.delete!

If you want to store the result directly under a specific key:

result << c

Or, more adventurously, we can use the following more complex one-liner:

result << (~c & (a | b))

**Note: ** expressions are optimized by reducing the number of Redis commands and using as few temporary keys to hold intermediate values as possible. See below for details.

Sparse bitmaps

Usage

You don't have to do anything special, simply use Redis#sparse_bitmap instead of Redis#bitmap:

a = redis.sparse_bitmap("a")
b = redis.sparse_bitmap("b")
result = redis.sparse_bitmap("result")

b[0] = true; b[2] = true; b[7] = true # 10100001
a[0] = true; a[1] = true; a[7] = true # 11000001

c = a & b

result << c

or just:

result << (a & b)

You can specify the chunk size (in bytes).

Use the size consistently. Note that it cannot be re-adjusted for data already saved to Redis:

x = redis.sparse_bitmap("x", 1024 * 1024) # 1 MB per chunk.
x[0] = true
x[1000] = true

Important: Do not mix sparse bitmaps with regular ones and never mix sparse bitmaps with different chunk sizes in the same expressions.

Rationale

If you want to store a lot of huge but sparse bitsets, with not many bits set, using regular Redis bitmaps doesn't work very well. It wastes a lot of space. In analytics, it's a reasonable requirement, to be able to store data about several million users. A bitmap for 10 million users weights over 1MB! Imagine storing hourly statistics and using up memory at a rate of 720MB per month.

For, say, 100 million users it becomes outright prohibitive!

But even with a fairly popular websites, I dare say, you don't often have one million users per hour :) This means that the majority of those bits is never sets and a lot of space goes wasted.

Enter sparse bitmaps. They divide each bitmap into chunks thus minimizing memory use (chunks' size can be configured, see Configuration below).

Creating and using sparse bitmaps is identical to using regular bitmaps:

huge = redis.sparse_bitmap("huge_bitmap")
huge[128_000_000] = true

The only difference in the above example is that it will allocate two 32kb chunks as opposed to 1MB that would be allocated if we used a regular bitmap (Redis#bitmap). In addition, setting the bit is nearly instantaneous.

Compare:

puts Benchmark.measure {
  sparse = redis.sparse_bitmap("huge_sparse_bitmap")
  sparse[500_000_000] = true
}

which on my machine this generates:

0.000000   0.000000   0.000000 (  0.000366)

It uses just 23kb memory as opposed to 120MB (megabytes!) to store the bit using a regular Redis bitmap:

regular = redis.bitmap("huge_regular_bitmap")
regular[500_000_000] = true

Configuration

Here's how to configure the gem:

Redis::Bitops.configure do |config|
  config.default_bytes_per_chunk = 8096 # Eight kilobytes.
  config.transaction_level = :bitmap # allowed values: :bitmap or :none.
end

Implementation & efficiency

Optimization phase

Prior to evaluation, the expression is optimized by combining operators into single BITOP commands and reusing temporary keys (required to store intermediate results) as much as possible.

This silly example:

result << (a & b & c | a | b)

translates into simply:

BITOP AND result a b c
BITOP OR result result a b

and doesn't create any temporary keys at all!

Materialization phase

At this point, the calculations are carried out and the result is saved under the destination key. Note that, for sparse bitmaps, multiple keys may be created.

Transaction levels

TBD

Contributing/feedback

Please send in your suggestions to [email protected]. Pull requests, issues, comments are more than welcome.