Improve performance of hashing and reduce memory #489
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
👋 I had a look at the use of hashing here for detecting duplicate images. Currently, this will allocate an array of bytes per pixel-row of each image, which can a significant amount of memory usage.
Instead, we can rely on
Span
to remove the allocations all together. Benchmarking a 3024 x 4032 image (the size my iPhone currently takes, so seems representative):Before
After
This effectively makes the memory usage 1.2 kilobytes regardless of the image size, down from 46 megabytes.
A smaller optimization is to place the hash in a stack buffer before converting it to hex. That just saves one small array allocation for the hash itself. This uses
Convert.ToHexString
since it can natively operate on aReadOnlySpan<byte>
and also uses upper-case lettering, but if you prefer I can create an overload of your extension method that works off of span as well.Additionally, this fixes a tiny issue where
IncrementalHash
is not being disposed. This does not leak memory, but results in finalizers getting run during garbage collection.