Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Token::make_word #1588

Open
davisp opened this issue Dec 11, 2024 · 1 comment
Open

Optimize Token::make_word #1588

davisp opened this issue Dec 11, 2024 · 1 comment

Comments

@davisp
Copy link
Member

davisp commented Dec 11, 2024

While working on #1587 I noticed that Instruments is showing Token::make_word as the second hottest single function, right after alloc::raw_vec::finish_grow.

Looking into the implementation I saw that its just doing a binary search across all keywords to find if its a known keyword or not. This is a fairly classical case where we have a known set of strings and want to check if a given string is in that list. There are a bunch of ways that we could speed this up. This issue is to figure out a good compromise between those possible speedups and other project constraints like maintaining a no_std ability.

My first approach at speeding this up was to create a table for the first byte in every keyword to reduce the number of entries that need to be searched. This small optimization managed to shave off about 400ms of time (of the 1.4ish seconds total).

However, there are other approaches that could speed this up even more. Either by generating parsing/lookup tables or using something like phf to do the heavy lifting for us.

@LorrensP-2158466
Copy link
Contributor

I don't know if this is useful, but I remember watching a video from Strager: Perfect Hash Tables. He had the same kind of problem and created a "custom" hash table by using the information he already has, the known keywords

In short, because he knew which words are keywords, he built his hash table around that by combining the first 2 and last 2 bytes and hashing that. And after that still compare. (This is of the top of my head)

I did notice there are about ~800 keywords, so I don't know if it's feasible, but maybe it's worth looking into it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants