Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace dict with new hashtable for sets datatype #1176

Merged
merged 2 commits into from
Dec 14, 2024

Conversation

SoftlyRaining
Copy link
Contributor

@SoftlyRaining SoftlyRaining commented Oct 16, 2024

The new hashtable provides faster lookups and uses less memory than dict.

A TCL test case "SRANDMEMBER with a dict containing long chain" is deleted because it's covered by a hashtable unit test "test_random_entry_with_long_chain", which is already present.

This change also moves some logic from dismissMemory (object.c) to zmadvise_dontneed (zmalloc.c), so the hashtable implementation which needs the dismiss functionality doesn't need to depend on object.c and server.h.

This PR follows #1186.

Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 75.96567% with 56 lines in your changes missing coverage. Please review.

Project coverage is 66.64%. Comparing base (5f7fe9e) to head (1f51a2f).
Report is 10 commits behind head on unstable.

Files with missing lines Patch % Lines
src/module.c 0.00% 25 Missing ⚠️
src/debug.c 0.00% 14 Missing ⚠️
src/defrag.c 68.18% 7 Missing ⚠️
src/rdb.c 85.71% 3 Missing ⚠️
src/server.c 0.00% 3 Missing ⚠️
src/db.c 92.85% 2 Missing ⚠️
src/t_set.c 97.29% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1176      +/-   ##
============================================
- Coverage     70.90%   66.64%   -4.27%     
============================================
  Files           119      119              
  Lines         64631    64693      +62     
============================================
- Hits          45828    43116    -2712     
- Misses        18803    21577    +2774     
Files with missing lines Coverage Δ
src/hashtable.c 75.88% <100.00%> (+1.67%) ⬆️
src/lazyfree.c 86.11% <100.00%> (ø)
src/object.c 82.13% <100.00%> (+1.46%) ⬆️
src/server.h 100.00% <ø> (ø)
src/t_zset.c 95.66% <100.00%> (-0.01%) ⬇️
src/zmalloc.c 82.74% <100.00%> (+0.13%) ⬆️
src/db.c 89.55% <92.85%> (+0.04%) ⬆️
src/t_set.c 97.84% <97.29%> (+0.33%) ⬆️
src/rdb.c 76.74% <85.71%> (+0.28%) ⬆️
src/server.c 10.44% <0.00%> (-77.02%) ⬇️
... and 3 more

... and 14 files with indirect coverage changes

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review. I'll look more later.

@zuiderkwast
Copy link
Contributor

Sorry for force-pushing the hashset branch again, to fix a DCO issue.

Can you rebase and force-push again? (I guess it's better than merge when we have a DCO issue following us.)

@zuiderkwast zuiderkwast linked an issue Oct 22, 2024 that may be closed by this pull request
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good. Just a few nits.

@SoftlyRaining
Copy link
Contributor Author

Thanks Viktor for taking care of the rebase - your changes look good to me. :)

@zuiderkwast
Copy link
Contributor

You're welcome. :) The main thing I don't like right now is OBJ_ENCODING_HT for dict vs OBJ_ENCODING_HASHTABLE for hashtable. Maybe we should rename the former to OBJ_ENCODING_DICT and the latter to OBJ_ENCODING_HT instead?

@SoftlyRaining
Copy link
Contributor Author

That's a fair point, but OBJECT_ENCODING_HT will be removed completely when all data types have been converted to hashset, so it is only an issue short term, I think. I don't want to touch extra files just to rename it temporarily - I feel like that might make the PR more confusing and harder to review. :-/

@zuiderkwast zuiderkwast force-pushed the hashset branch 6 times, most recently from c957864 to d6da54f Compare December 2, 2024 21:41
@zuiderkwast
Copy link
Contributor

Are you updating this? It becomes very hard to rebase it if you wait this long between each rebase.

@zuiderkwast zuiderkwast changed the title [hashset feature] Convert SET datatype to use hashset instead of dict [hashtable feature] Convert SET datatype to use hashtable instead of dict Dec 6, 2024
@SoftlyRaining
Copy link
Contributor Author

SoftlyRaining commented Dec 7, 2024 via email

@zuiderkwast zuiderkwast force-pushed the hashset branch 2 times, most recently from d090f8e to 3559ba8 Compare December 10, 2024 12:55
@zuiderkwast zuiderkwast changed the base branch from hashset to unstable December 10, 2024 20:34
@zuiderkwast
Copy link
Contributor

I rebased it (started from unstable and applied commit with cherry-pick) but then I forgot git cherry-pick --continue after solving the merge conflict. When I pushed, I just pushed unstable, causing the diff to become empty and then the PR closed. Now I can't push to your branch to fix it. 😢

@SoftlyRaining SoftlyRaining reopened this Dec 11, 2024
@SoftlyRaining
Copy link
Contributor Author

I got the PR fixed and back up - ready to review and merge! Thanks @zuiderkwast for your work rebasing this again. 😊

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits.

Changes to the merged code after code review include avoiding the cast to void **. (Related to the strict aliasing rule or just nicer code?) We should do the corresponding changes here.

When I rebased, I just made it compile. I didn't update anything else, like hs variable names.

@zuiderkwast zuiderkwast added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Dec 11, 2024
@SoftlyRaining SoftlyRaining force-pushed the set-datatype branch 2 times, most recently from 1138fac to ded690a Compare December 13, 2024 05:40
@SoftlyRaining
Copy link
Contributor Author

It seems like most of these check failures are also happening in unstable? :(

@zuiderkwast
Copy link
Contributor

It seems like most of these check failures are also happening in unstable? :(

Yes, it's not a blocker for this PR. We should fix them in unstable.

@zuiderkwast zuiderkwast changed the title [hashtable feature] Convert SET datatype to use hashtable instead of dict Replace dict with new hashtable for sets datatype Dec 14, 2024
The variable just means "a hashtable-backed set". It was used for dict and it's not wrong to use it for the new hashtable.

Signed-off-by: Viktor Söderqvist <[email protected]>
@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes performance labels Dec 14, 2024
@zuiderkwast zuiderkwast merged commit 88942c8 into valkey-io:unstable Dec 14, 2024
50 of 56 checks passed
@SoftlyRaining SoftlyRaining deleted the set-datatype branch December 17, 2024 19:34
kronwerk pushed a commit to kronwerk/valkey that referenced this pull request Jan 27, 2025
The new `hashtable` provides faster lookups and uses less memory than
`dict`.

A TCL test case "SRANDMEMBER with a dict containing long chain" is
deleted because it's covered by a hashtable unit test
"test_random_entry_with_long_chain", which is already present.

This change also moves some logic from dismissMemory (object.c) to
zmadvise_dontneed (zmalloc.c), so the hashtable implementation which
needs the dismiss functionality doesn't need to depend on object.c and
server.h.

This PR follows valkey-io#1186.

---------

Signed-off-by: Rain Valentine <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use new hashtable for sets
2 participants