Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvement to zerotrie hash function #5106

Merged
merged 1 commit into from
Jun 25, 2024

Conversation

sffc
Copy link
Member

@sffc sffc commented Jun 23, 2024

I reviewed our little perfect hash function with @Amanieu who highlighted the modulo operator as likely the performance bottleneck. We brainstormed some various approaches including whether we could clamp the hash table size to a power of 2 so that we could use a bitmask instead of a division. However, there were a few downsides to that approach:

  1. This would make the function non-minimal, meaning it would take more storage (mainly directly, but also indirectly by pushing indices to be bigger varints).
  2. The non-minimal hash function may require either extra bits to store whether entries exist, or an unfortunate workaround such as storing present keys as the values in the wrong buckets to prevent those entries from being found.
  3. A division operator on small integers might not be too expensive depending on the processor.
  4. It wasn't clear that this hash function was the bottleneck. Loads of bytes in the zerotrie are what I believe to be the main bottleneck. I've already focused on improving data locality.

What I did in this PR is change the usize division to u8 division. It results in slightly smaller assembly:

Old:
	cmp	rcx, 1
	adc	rcx, 0
	movzx	eax, dl
	cmp	rax, rcx
	jb	.LBB8_9
	xor	edx, edx
	div	ecx

New:

	movzx	eax, cl
	div	dl
	movzx	edi, ah

@Victoronz also pointed me to https://orlp.net/blog/worlds-smallest-hash-table/ which is a good read about perfect hash functions.

@sffc sffc marked this pull request as ready for review June 25, 2024 14:06
@sffc sffc requested a review from a team as a code owner June 25, 2024 14:06
@sffc sffc merged commit a4895e1 into unicode-org:main Jun 25, 2024
28 checks passed
@sffc sffc deleted the small-zerotrie-improvement branch June 25, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants