[WIP] free speed/mem optimizations with ahash, dary_heap, and compact_str #1618
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Given that this library is largely an interface to hash maps of strings in rust, we can get "free" 5-25% free speedups by using stable, well-tested drop-ins like
ahash::HashMap
,dary_heap::NHeap
, andCompactString
.The improvements span both training and subsequent encode/decode.
Notes
smol
or a custom Huffman encoding for shorter lengths, using a BiHashMap like frombimap
, etc. This was the best performing.benches
look good (shown below).valgrind
(although there was already ~420K leaked withcargo bench
on HEAD).Issue
Because of the way that the interface is organized across the core rust library and py/node bindings, there isn't an easy way to merge this with support for encode/decode.
For example, because
Model
is defined on the rust side andVocab
traits are used differently between different models, we'd have to usepyo3
within the rust library forPyFromObject
.In theory, we could implement these changes only within the trainer, but the real user-facing/environmental impact would be to implement into the encode/decode bindings where most usage probably occurs.
Choices
Assuming you want to merge something like this, I think we have a few choices:
Trainers
.Example Benchmark (i7-12700K)
NB: We replaced data/big.txt with a much larger text corpus (271M vs 6.2M) but results were comparable for original data/big.txt.
Results:
Before
After
time -v
comparisons (new vs old):