-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add h2o bench groupby queries #2881
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2881 +/- ##
==========================================
+ Coverage 85.25% 85.26% +0.01%
==========================================
Files 275 277 +2
Lines 49010 49371 +361
==========================================
+ Hits 41782 42095 +313
- Misses 7228 7276 +48
Continue to review full report at Codecov.
|
benchmarks/src/bin/h2o.rs
Outdated
let start = Instant::now(); | ||
let path = config.path.to_str().unwrap(); | ||
let ctx = SessionContext::new(); | ||
ctx.register_csv("x", path, CsvReadOptions::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The h2o abenchmark allows to read the data into memory (otherwise it would be mostly measuring the csv parser).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, the results were already looking pretty great but that would be even better, Do you have a link to where they mention this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nm, I see it now
Solutions are using in-memory data storage to achieve best timing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you found it. Some solutions also use specific (such as dictionary) encoding to get better results and lower memory usage.
The old PR for the rust solution is here:
h2oai/db-benchmark#182
This also includes some performance tweaks like changing the allocator, compile flags, prepartioning and batch size.
Reading from CSV
Reading from memory
|
@Dandandan This is ready for another review |
Benchmark runs are scheduled for baseline = fd64e6f and contender = 6dc9dea. 6dc9dea is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #2879
Rationale for this change
I would like to make it easier to run h20 benchmarks
What changes are included in this PR?
Add h20 benchmark
Are there any user-facing changes?
No