Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sketch package #210

Closed
wants to merge 3 commits into from
Closed

Add sketch package #210

wants to merge 3 commits into from

Conversation

robertodauria
Copy link
Contributor

@robertodauria robertodauria commented Feb 7, 2025

This PR adds a new package, sketch, with a Count-Min Sketch implementation backed by Redis. It's the first part of multiple PRs that will:

  1. Use this data structure for rate limiting based on IP addresses or IP+user-agent
  2. Add rate limiting to the Locate handlers

This change is Reviewable

@coveralls
Copy link
Collaborator

coveralls commented Feb 7, 2025

Pull Request Test Coverage Report for Build 1487

Details

  • 74 of 78 (94.87%) changed or added relevant lines in 1 file are covered.
  • 4 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.3%) to 96.733%

Changes Missing Coverage Covered Lines Changed/Added Lines %
sketch/countmin.go 74 78 94.87%
Files with Coverage Reduction New Missed Lines %
cmd/heartbeat/main.go 4 83.21%
Totals Coverage Status
Change from base Build 1473: -0.3%
Covered Lines: 1954
Relevant Lines: 2020

💛 - Coveralls

Copy link
Contributor

@cristinaleonr cristinaleonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question, why is a probabilistic data structure needed? In particular, given we're using Redis, I would expect a sliding window counter with sorted sets in Redis to result in a more precise and straightforward implementation.

Reviewable status: 0 of 1 approvals obtained (waiting on @robertodauria)


sketch/interface.go line 9 at r2 (raw file):

Go interfaces generally belong in the package that uses values of the interface type, not the package that implements those values.

Source. Should the interface be in the consumer?


sketch/countmin_test.go line 12 at r2 (raw file):

)

func setupTestRedis(t *testing.T) (*redis.Pool, func()) {

This is helpful 👍.

Copy link
Contributor Author

@robertodauria robertodauria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded a bit more on Slack, but this was my original reasoning:

  • Memory usage: a count-min sketch size does not depend on how many entries are added
  • It's relatively easy for an abusive client to randomize the UA, not only making the rate limiting useless but also causing an explosion in the number of keys
  • The individual Redis operations used in this implementation are O(1) -- turns out this is misleading, see below

As I found after re-implementing this with ZSETs and benchmarking both solutions:

  • This mapping of the CM Sketch semantics on Redis hashmaps is quite inefficient (with depth=7, we need 7 operations to increment and 7 operations to count)
  • The memory usage benefit isn't really there unless the number of keys is in the millions.
  • ZSETs are drastically faster (~4x) and the memory usage is actually lower for the scale we operate at. We likely won't have enough entries to start seeing the benefits of a CM sketch.

I would probably still advocate for a CM Sketch if we had access to Redis Enterprise, which provides a native implementation written in C with O(1) operations and much lower memory footprint, but this is not the case.

Considering the above, I'm closing this PR to propose the ZSET-based implementation instead. Thanks!

Reviewable status: 0 of 1 approvals obtained

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants