-
Notifications
You must be signed in to change notification settings - Fork 224
Improved performance of utf8 validation of large strings via simdutf8
(-40%)
#426
Conversation
Codecov Report
@@ Coverage Diff @@
## main #426 +/- ##
==========================================
- Coverage 80.80% 80.78% -0.03%
==========================================
Files 353 372 +19
Lines 22649 22651 +2
==========================================
- Hits 18302 18299 -3
- Misses 4347 4352 +5
Continue to review full report at Codecov.
|
Looks great! I am tempted to define |
My benchmarks indicate it's not an improvement on benchmarks / the TPC-H files. I don't think the overhead is very high but it seems less useful beyond larger documents in that case. |
Let me add some >= 64 bytes examples to the parquet benchmark first. I think it makes sense to have proof of being faster first. |
This reverts commit 1fbe8e3.
@jorgecarleitao |
Looks great. IMO we can make |
simdutf8
(-40%)
simdutf8
(-40%)simdutf8
(-40%)
I've updated the title accordingly. Could you paste the summary of the benches on the description so that when someone visits this PR can have a quick glance? |
This is helpful for strings >= 8 bytes.