Optimize `Token::make_word` #1588

davisp · 2024-12-11T18:22:05Z

While working on #1587 I noticed that Instruments is showing Token::make_word as the second hottest single function, right after alloc::raw_vec::finish_grow.

Looking into the implementation I saw that its just doing a binary search across all keywords to find if its a known keyword or not. This is a fairly classical case where we have a known set of strings and want to check if a given string is in that list. There are a bunch of ways that we could speed this up. This issue is to figure out a good compromise between those possible speedups and other project constraints like maintaining a no_std ability.

My first approach at speeding this up was to create a table for the first byte in every keyword to reduce the number of entries that need to be searched. This small optimization managed to shave off about 400ms of time (of the 1.4ish seconds total).

However, there are other approaches that could speed this up even more. Either by generating parsing/lookup tables or using something like phf to do the heavy lifting for us.

The text was updated successfully, but these errors were encountered:

LorrensP-2158466 · 2025-01-12T13:01:29Z

I don't know if this is useful, but I remember watching a video from Strager: Perfect Hash Tables. He had the same kind of problem and created a "custom" hash table by using the information he already has, the known keywords

In short, because he knew which words are keywords, he built his hash table around that by combining the first 2 and last 2 bytes and hashing that. And after that still compare. (This is of the top of my head)

I did notice there are about ~800 keywords, so I don't know if it's feasible, but maybe it's worth looking into it?

This was referenced Dec 11, 2024

POC to show performance improvements of not copying token #1561

Draft

Find keywords using perfect hashing #1590

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `Token::make_word` #1588

Optimize `Token::make_word` #1588

davisp commented Dec 11, 2024

LorrensP-2158466 commented Jan 12, 2025

Optimize Token::make_word #1588

Optimize Token::make_word #1588

Comments

davisp commented Dec 11, 2024

LorrensP-2158466 commented Jan 12, 2025

Optimize `Token::make_word` #1588

Optimize `Token::make_word` #1588