Simple solution for 1 billion rows challenge.
I assumed to have a limitation to use only standard library and no crates.
The solution has multithread mode (default, 8 threads) and single thread modes (activated by -s
flag). Also printing result could be disabled by -q
flag. Because I'm not using any crates cli flags processing is very basic.
On my mac single thread solution takes 80 seconds, multithread solution takes 12 seconds (input file is 14 GB).
- Map file with data into memory (
memmap2
crate is required) - Faster hashmap (
hashbrown
crate is required)
File measurements.txt
is required to be in the repo to measure performance.
It could be generated by steps from 1brc's instruction.
I used docker, because I don't have java on my mac:
git clone https://github.com/gunnarmorling/1brc/
cd 1brc
docker run -it --mount source=$(pwd),target=/home eclipse-temurin:21 /bin/bash
# inside docker:
cd home
./mvnw clean verify
./create_measurements.sh 1000000000
exit
# on host
mv ./measurements.txt <1brc_rust path>
- @timClicks for the video showing the most simple solution
- @RagnarGrootKoerkamp for the article (webarchive) showing probably all the possible optimisations