Add sketch package #210

robertodauria · 2025-02-07T18:24:58Z

This PR adds a new package, sketch, with a Count-Min Sketch implementation backed by Redis. It's the first part of multiple PRs that will:

Use this data structure for rate limiting based on IP addresses or IP+user-agent
Add rate limiting to the Locate handlers

This change is

coveralls · 2025-02-07T18:29:00Z

Pull Request Test Coverage Report for Build 1487

Details

74 of 78 (94.87%) changed or added relevant lines in 1 file are covered.
4 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.3%) to 96.733%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
sketch/countmin.go	74	78	94.87%

Files with Coverage Reduction	New Missed Lines	%
cmd/heartbeat/main.go	4	83.21%

Totals
Change from base Build 1473:	-0.3%
Covered Lines:	1954
Relevant Lines:	2020

💛 - Coveralls

cristinaleonr

General question, why is a probabilistic data structure needed? In particular, given we're using Redis, I would expect a sliding window counter with sorted sets in Redis to result in a more precise and straightforward implementation.

Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)

sketch/interface.go line 9 at r2 (raw file):

Go interfaces generally belong in the package that uses values of the interface type, not the package that implements those values.

Source. Should the interface be in the consumer?

sketch/countmin_test.go line 12 at r2 (raw file):

)

func setupTestRedis(t *testing.T) (*redis.Pool, func()) {

This is helpful 👍.

robertodauria

I expanded a bit more on Slack, but this was my original reasoning:

Memory usage: a count-min sketch size does not depend on how many entries are added
It's relatively easy for an abusive client to randomize the UA, not only making the rate limiting useless but also causing an explosion in the number of keys
The individual Redis operations used in this implementation are O(1) -- turns out this is misleading, see below

As I found after re-implementing this with ZSETs and benchmarking both solutions:

This mapping of the CM Sketch semantics on Redis hashmaps is quite inefficient (with depth=7, we need 7 operations to increment and 7 operations to count)
The memory usage benefit isn't really there unless the number of keys is in the millions.
ZSETs are drastically faster (~4x) and the memory usage is actually lower for the scale we operate at. We likely won't have enough entries to start seeing the benefits of a CM sketch.

I would probably still advocate for a CM Sketch if we had access to Redis Enterprise, which provides a native implementation written in C with O(1) operations and much lower memory footprint, but this is not the case.

Considering the above, I'm closing this PR to propose the ZSET-based implementation instead. Thanks!

Reviewable status: 0 of 1 approvals obtained

robertodauria added 2 commits February 7, 2025 16:58

Implement CMSketch based on Redis

a97deae

Update go.mod/sum

be1298a

robertodauria requested a review from cristinaleonr February 7, 2025 18:24

Update comment

38d882a

cristinaleonr reviewed Feb 7, 2025

View reviewed changes

robertodauria commented Feb 9, 2025

View reviewed changes

robertodauria closed this Feb 9, 2025

robertodauria mentioned this pull request Feb 9, 2025

Implement rate limiting based on IP+UA #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sketch package #210

Add sketch package #210

robertodauria commented Feb 7, 2025 •

edited by stephen-soltesz

Loading

coveralls commented Feb 7, 2025 •

edited

Loading

cristinaleonr left a comment

robertodauria left a comment

Add sketch package #210

Add sketch package #210

Conversation

robertodauria commented Feb 7, 2025 • edited by stephen-soltesz Loading

coveralls commented Feb 7, 2025 • edited Loading

Pull Request Test Coverage Report for Build 1487

Details

💛 - Coveralls

cristinaleonr left a comment

Choose a reason for hiding this comment

robertodauria left a comment

Choose a reason for hiding this comment

robertodauria commented Feb 7, 2025 •

edited by stephen-soltesz

Loading

coveralls commented Feb 7, 2025 •

edited

Loading