1br

1 Billion Row challenge with R:

This is the repo inspired by Gunnar Morlng's 1 billion row challenge to see which functions / libraries are quickest in summarizing the mean, min and max of a 1 billion rows of record
This work is based on alejandrohagan/1br and #5, but I've only used 1e8 rows.
I added some duckdb options and polars scan option. In order to do it I've added a file copy and file reading steps in each benchmark method to be sure to compare the pipelines without caching and a maximum of 8 threads.
If you see any issues or have suggestions of improvements, please let me know.

If you need, execute install_required_packages(install = TRUE) from install.R file.
Generate 1e5, 1e6, 1e7, 1e8, 1e9 data running: ./generate_data.sh
Run the benchmark running: Rscript run.R or Rscript run.1e9.R
Check the generated plots and the results.

readr::read_rds("2024-02-29_all.rds")  %>% 
  group_split(n)

If you want, you have time and you have enough memory available in your computer, then you can try get the results for 1e9 rows:

Uncomment 1e9 lines on ./generate_data.sh
Comment run.R:25 and uncomment run.R:26
Generate 1e6, 1e7, 1e8 and 1e9 data running: ./generate_data.sh
Run the benchmark running: Rscript run.R
Check the generated plots.
Compare with other languages and solutions (Look at compare.php or onebrc for for rust)

Feedback is welcome. You can open an issue in this repo.

Provide feedback