Add digest scores for faster deletes in sorted sets #835

ezekg · 2024-02-13T22:08:36Z

Closes #668. First pass on my idea of using scores to "skip ahead" so that we don't have to iterate the entire sorted set when a unique job is deleted. Let me know what you think. I'm sure I'm missing some edge cases, since I don't have the domain knowledge you do here. This should be backwards compatible, reverting back to previous behavior if a score doesn't exist.

I still want to add a performance test with a large schedule queue to make sure this actually works.

lib/sidekiq_unique_jobs/locksmith.rb

lib/sidekiq_unique_jobs/key.rb

mhenrixon · 2024-02-14T16:36:27Z

Let's think about this for a while. We already store a score when the digest is created. It has both a score and a job_id. We could simplify this by looking up the current digest's score and job id.

We also have this issue: Sidekiq only checks with the heartbeat method every ten seconds. This means that for ten seconds, things can be missing.

ezekg · 2024-02-14T17:14:55Z

We already store a score when the digest is created. It has both a score and a job_id.

Ah, I wasn't aware we were already storing the score. I looked around but I guess I missed it. That makes things easier.

Sidekiq only checks with the heartbeat method every ten seconds. This means that for ten seconds, things can be missing.

What does Sidekiq's heartbeat system do? I'm not familiar with it and what "missing" means here.

mhenrixon · 2024-02-14T17:23:59Z

We already store a score when the digest is created. It has both a score and a job_id.

Ah, I wasn't aware we were already storing the score. I looked around but I guess I missed it. That makes things easier.

Sidekiq only checks with the heartbeat method every ten seconds. This means that for ten seconds, things can be missing.

What does Sidekiq's heartbeat system do? I'm not familiar with it and what "missing" means here.

Both of your comments can be explained here: https://github.com/mhenrixon/sidekiq-unique-jobs/pull/830/files.

Here, I am optimizing the ruby reaper to use the digest score (zrange byscore) to limit the number of jobs we look through and prevent removing locks that appear to have no job because of being missing for ten seconds.

There are some linked issues (I hope) that talk about this, and somewhere, there is a link to a sidekiq issue where @mperham explains this.

We could apply similar thinking to your issue.

EDIT: @ezekg this is the sidekiq issue I was talking about: sidekiq/sidekiq#6153

lib/sidekiq_unique_jobs/lua/shared/_delete_from_sorted_set.lua

ezekg · 2024-02-16T22:23:19Z

@mhenrixon I added a rudimentary performance test that fails for the old algorithm.

For a schedule queue of 100,000 jobs, we go from this:

1708122488.686405 [0 lua] "ZRANGE" "schedule" "0" "49"
1708122488.686549 [0 lua] "ZRANGE" "schedule" "50" "99"
1708122488.686651 [0 lua] "ZRANGE" "schedule" "100" "149"
1708122488.686728 [0 lua] "ZRANGE" "schedule" "150" "199"
1708122488.686830 [0 lua] "ZRANGE" "schedule" "200" "249"
... 1,991 lines omitted
1708122488.824228 [0 lua] "ZRANGE" "schedule" "99800" "99849"
1708122488.824349 [0 lua] "ZRANGE" "schedule" "99850" "99899"
1708122488.824406 [0 lua] "ZRANGE" "schedule" "99900" "99949"
1708122488.824463 [0 lua] "ZRANGE" "schedule" "99950" "99999"
1708122488.824502 [0 lua] "ZRANGE" "schedule" "100000" "100049"
1708122488.824523 [0 lua] "ZREM" "schedule" "{\"retry\":true,\"queue\":\"customqueue\",\"lock\":\"until_executing\",\"on_conflict\":\"replace\",\"class\":\"UniqueJobOnConflictReplace\",\"args\":[100000,{\"type\":\"extremely unique\"}],\"jid\":\"4f2abef9a3c66954b92499c8\",\"created_at\":1708122488.6721134,\"lock_timeout\":0,\"lock_ttl\":null,\"lock_prefix\":\"uniquejobs\",\"lock_args\":[100000,{\"type\":\"extremely unique\"}],\"lock_digest\":\"uniquejobs:6a7e9a8bcee1870891c2e9b633fb4f86\"}

To this:

1708122430.856488 [0 lua] "ZRANGE" "schedule" "1710714430.8422594" "+inf" "BYSCORE" "LIMIT" "0" "50"
1708122430.856549 [0 lua] "ZREM" "schedule" "{\"retry\":true,\"queue\":\"customqueue\",\"lock\":\"until_executing\",\"on_conflict\":\"replace\",\"class\":\"UniqueJobOnConflictReplace\",\"args\":[100000,{\"type\":\"extremely unique\"}],\"jid\":\"49d4e63dcb6d767c17af0470\",\"created_at\":1708122430.8423865,\"lock_timeout\":0,\"lock_ttl\":null,\"lock_prefix\":\"uniquejobs\",\"lock_args\":[100000,{\"type\":\"extremely unique\"}],\"lock_digest\":\"uniquejobs:6a7e9a8bcee1870891c2e9b633fb4f86\"}"

mhenrixon · 2024-02-17T06:34:54Z

I really appreciate you taking a stab at this @ezekg! You spotted a couple of bugs!

Unnecessary looping due to missing break.
zrange is super slow in this case; we should be using zscan, of course.

Can you check if the linked PR fixes your problem, too? I prefer a simpler fix than adding these extra keys. If it still isn't good enough, I'm happy to pair on sorting this out!

spec/performance/unique_job_on_conflict_replace_spec.rb

mhenrixon · 2024-02-20T14:20:53Z

If you have a look here: https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/lock.lua#L66 we already add a score for the digest, we also store the timestamp in he hash with digest + job_id (this is to be able to allow concurrent jobs of a specified number): https://github.com/mhenrixon/sidekiq-unique-jobs/blob/main/lib/sidekiq_unique_jobs/lua/lock.lua#L70

Between those two, it is a hard sell to add more timestamps. I am more inclined to attempt to reduce the number of commands than to increase them.

Could you find a way to make do with what is there or do a great job at selling the extra key to me @ezekg? I want to help you, I really do!

ezekg · 2024-02-21T16:48:57Z

@mhenrixon the current implementation does not store the actual score of the job in the sorted set (i.e. the timestamp at which a scheduled job is scheduled to run at), but rather the current time at which the job is added to the sorted set. I updated the implementation to now store the job's score when available. I needed this to match the job's score in Sidekiq's schedule sorted set, otherwise it's useless in performing a divide-and-conquer search.

Let me know when you have time to review the new approach. I don't know if the previous timestamp was actually used for anything, but it didn't look like it at first glance.

ezekg · 2024-02-21T17:27:25Z

lib/sidekiq_unique_jobs/lua/lock.lua

@@ -62,8 +63,16 @@ if lock_type == "until_expired" and pttl and pttl > 0 then
  log_debug("ZADD", expiring_digests, current_time + pttl, digest)
  redis.call("ZADD", expiring_digests, current_time + pttl, digest)


I left this alone because I don't fully understand what the expiring digests set is used for and if this optimization would be applicable to jobs utilizing an until_expired lock strategy.

spec/support/sidekiq_unique_jobs/testing.rb

mhenrixon

This is exactly what I had in mind. Sorry that I haven't been able to get to it.

Your assumption about the job score in the hash is correct, I just wanted something in the hash and wasn't clear with how to use it. It was one of those: "I believe I will need this".

As soon as the German IRS is off my back (they want money I don't have), I'll see about greatly optimizing the gem.

I never got around to looking at the performance.

Perhaps this would be better solved in ruby (like the reaper).

I'm definitely not opposed to use some batching from the ruby layer if there are more than n number in a sorted set.

I believe there are plenty to optimize and for the performance tests I need to remember that my machine is as fast as a laptop comes.

Not fair to compare locally, should probably write a bunch of these performance tests and have them run on GitHub actions.

ezekg · 2024-02-21T20:05:57Z

Do you want me to loop in the changes from #837? That had improvements to delete_from_queue as well, which I didn't touch here. Let me know if you want me to cherry-pick, or if you want to do that separately.

mhenrixon · 2024-02-21T21:49:20Z

Do you want me to loop in the changes from #837?

That would be lovely, I forgot about the queue! I'm too exhausted to cut a release and test this tonight but I will do it first thing tomorrow morning.

Much appreciated!!

ezekg · 2024-02-21T22:37:41Z

Done. Let me know what you find whenever you have a chance to test.

mhenrixon · 2024-02-22T08:52:55Z

Looks like it will do the job just fine! Can't wait to optimize everything!

ezekg commented Feb 13, 2024

View reviewed changes

lib/sidekiq_unique_jobs/locksmith.rb Outdated Show resolved Hide resolved

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from e2e1b02 to 565f1c4 Compare February 13, 2024 22:41

ezekg commented Feb 14, 2024

View reviewed changes

lib/sidekiq_unique_jobs/locksmith.rb Outdated Show resolved Hide resolved

ezekg commented Feb 14, 2024

View reviewed changes

lib/sidekiq_unique_jobs/key.rb Outdated Show resolved Hide resolved

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 565f1c4 to 60e380f Compare February 14, 2024 17:20

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 4 times, most recently from 664c259 to c1f52e8 Compare February 16, 2024 22:20

ezekg commented Feb 16, 2024

View reviewed changes

lib/sidekiq_unique_jobs/lua/shared/_delete_from_sorted_set.lua Show resolved Hide resolved

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 4 times, most recently from 851a8f5 to bed17eb Compare February 17, 2024 03:25

mhenrixon mentioned this pull request Feb 17, 2024

fix(perf): improve delete_from_sorted_set speed #837

Closed

ezekg commented Feb 17, 2024

View reviewed changes

spec/performance/unique_job_on_conflict_replace_spec.rb Outdated Show resolved Hide resolved

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from bed17eb to 203f087 Compare February 17, 2024 20:32

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 203f087 to 13d562f Compare February 21, 2024 15:19

add performance test for replace conflict strategy

3f473f7

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 13d562f to f53f91c Compare February 21, 2024 16:44

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 3 times, most recently from 2548e5c to 0fb19b6 Compare February 21, 2024 17:25

ezekg commented Feb 21, 2024

View reviewed changes

spec/support/sidekiq_unique_jobs/testing.rb Show resolved Hide resolved

mhenrixon approved these changes Feb 21, 2024

View reviewed changes

ezekg marked this pull request as ready for review February 21, 2024 18:33

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch 2 times, most recently from 11f17a8 to 8131bed Compare February 21, 2024 22:30

fix performance issue when deleting from large sets

5da7d08

ezekg force-pushed the feature/add-score-to-unique-jobs-for-sorted-sets branch from 8131bed to 5da7d08 Compare February 21, 2024 22:39

mhenrixon enabled auto-merge (squash) February 22, 2024 07:14

mhenrixon disabled auto-merge February 22, 2024 07:15

mhenrixon merged commit 1bfba2f into mhenrixon:main Feb 22, 2024
18 checks passed

mjooms mentioned this pull request Mar 18, 2024

Read Timeouts and high CPU spike after upgrading from 8.0.5 to 8.0.10 #840

Open

ezekg mentioned this pull request May 27, 2024

Add Sidekiq Enterprise/Pro keygen-sh/keygen-api#798

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add digest scores for faster deletes in sorted sets #835

Add digest scores for faster deletes in sorted sets #835

ezekg commented Feb 13, 2024 •

edited

Loading

mhenrixon commented Feb 14, 2024 •

edited

Loading

ezekg commented Feb 14, 2024

mhenrixon commented Feb 14, 2024 •

edited

Loading

ezekg commented Feb 16, 2024 •

edited

Loading

mhenrixon commented Feb 17, 2024 •

edited

Loading

mhenrixon commented Feb 20, 2024

ezekg commented Feb 21, 2024 •

edited

Loading

ezekg Feb 21, 2024

mhenrixon left a comment

ezekg commented Feb 21, 2024

mhenrixon commented Feb 21, 2024

ezekg commented Feb 21, 2024

mhenrixon commented Feb 22, 2024

		@@ -62,8 +63,16 @@ if lock_type == "until_expired" and pttl and pttl > 0 then
		log_debug("ZADD", expiring_digests, current_time + pttl, digest)
		redis.call("ZADD", expiring_digests, current_time + pttl, digest)

Add digest scores for faster deletes in sorted sets #835

Add digest scores for faster deletes in sorted sets #835

Conversation

ezekg commented Feb 13, 2024 • edited Loading

mhenrixon commented Feb 14, 2024 • edited Loading

ezekg commented Feb 14, 2024

mhenrixon commented Feb 14, 2024 • edited Loading

ezekg commented Feb 16, 2024 • edited Loading

mhenrixon commented Feb 17, 2024 • edited Loading

mhenrixon commented Feb 20, 2024

ezekg commented Feb 21, 2024 • edited Loading

ezekg Feb 21, 2024

Choose a reason for hiding this comment

mhenrixon left a comment

Choose a reason for hiding this comment

ezekg commented Feb 21, 2024

mhenrixon commented Feb 21, 2024

ezekg commented Feb 21, 2024

mhenrixon commented Feb 22, 2024

ezekg commented Feb 13, 2024 •

edited

Loading

mhenrixon commented Feb 14, 2024 •

edited

Loading

mhenrixon commented Feb 14, 2024 •

edited

Loading

ezekg commented Feb 16, 2024 •

edited

Loading

mhenrixon commented Feb 17, 2024 •

edited

Loading

ezekg commented Feb 21, 2024 •

edited

Loading