Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate switching global allocator to mimalloc #81

Closed
bradlarsen opened this issue Sep 25, 2023 · 8 comments
Closed

Investigate switching global allocator to mimalloc #81

bradlarsen opened this issue Sep 25, 2023 · 8 comments
Assignees
Labels
performance Related to runtime performance

Comments

@bradlarsen
Copy link
Collaborator

Using musl instead of glibc when building Nosey Parker results in a significant drop in scan performance, presumably due to the allocator implementation in musl not supporting threaded workloads very well (see here).

It may be possible to sidestep this by using a different global allocator in Nosey Parker. In particular, it appears that jemalloc does not build with musl. But mimalloc does build there, and there is a Rust crate for it already.

Is it easy to switch Nosey Parker to use mimalloc as its global allocator?

How does switching impact performance of native-code builds? How does it affect performance of Docker-based builds, particularly the Alpine-based build in #77?

@bradlarsen bradlarsen added the performance Related to runtime performance label Sep 25, 2023
@bradlarsen bradlarsen self-assigned this Sep 25, 2023
@bradlarsen
Copy link
Collaborator Author

Making the switch to use mimalloc as a global allocator so far seems to be easier than I thought -- a simple drop-in replacement, and addition of this in main.rs:

use mimalloc::MiMalloc;

#[global_allocator]
static GLOBAL: MiMalloc = MiMalloc;

Further performance investigation and validation on various platforms will be required.

@bradlarsen
Copy link
Collaborator Author

bradlarsen commented Sep 26, 2023

The performance of using mimalloc vs the default allocator in an Alpine Docker-based build is significant: a 5x increase in scan throughput when using 32 worker threads on an Intel-based Ubuntu 22.04 machine (from ~500MB/s with the default allocator up to ~2.6GB/s with mimalloc).

@bradlarsen
Copy link
Collaborator Author

Eyeballing some output from a few runs under /usr/bin/time, It looks like peak memory use when running 32-ways parallel on big inputs may also be reduced when using mimalloc instead of the system allocator (less heap fragmentation?).

@munntjlx
Copy link
Contributor

munntjlx commented Oct 3, 2023

Thanks for the work on this. For me the 'pull' time saved vs. the few seconds (since its only a single repo) almost makes the non-debian image attractive (even with the 3x slowdown). Since most repos we scan are smallish, and scan times are measured in seconds.

@bradlarsen
Copy link
Collaborator Author

@munntjlx I have a branch I'll be pushing and merging "soon", which switches the allocator to mimalloc, sidesteps the alpine performance issue, and seems to all around work. I'm thinking the next Nosey Parker release could provide both the glibc-based Docker image as well as an Alpine-flavored one.

@munntjlx
Copy link
Contributor

munntjlx commented Oct 5, 2023

Great news! Its funny how much difference an allocator can make!

@praetorian-inc praetorian-inc deleted a comment Oct 10, 2023
@bradlarsen
Copy link
Collaborator Author

This has been merged back to main, and mimalloc is used in the v0.15.0 release.

@munntjlx
Copy link
Contributor

I leave an emoticon with my thanks and gratitude:

😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to runtime performance
Projects
None yet
Development

No branches or pull requests

2 participants