Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in string readuntil benchmark #50615

Open
gbaraldi opened this issue Jul 20, 2023 · 1 comment
Open

Regression in string readuntil benchmark #50615

gbaraldi opened this issue Jul 20, 2023 · 1 comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version

Comments

@gbaraldi
Copy link
Member

run(BaseBenchmarks.SUITE[["string", "readuntil", "target length 2"]]) regressed in #48273 @stevengj it gained some allocations and became 2x slower.

@gbaraldi gbaraldi added the regression Regression in behavior compared to a previous version label Jul 20, 2023
@brenhinkeller brenhinkeller added the performance Must go faster label Aug 6, 2023
@stevengj
Copy link
Member

stevengj commented Nov 28, 2023

This is apparently benchmarking:

buffer = IOBuffer("A" ^ 50000)

g = addgroup!(SUITE, "readuntil")
for len in (1, 2, 1000, 50000)
    g["target length $len"] = @benchmarkable readuntil(seekstart($buffer), $("A" ^ len))
end

Reading 2-byte strings from an IOBuffer probably got slower because of this diff, which traded off worse performance for reading extremely short lines for better performance reading lines with 20 characters or more. (See the comment "A single loop is 2x faster for nout=5.") Actually that portion of the PR sholdn't be relevant here because that is only for delim::UInt8, whereas this function uses a string delimiter, which is a completely different part of the codebase.

For string delimiters, one change in that PR is that it changed the initial allocated size to 70 bytes, to match the initial allocation in other parts of the code: https://github.com/JuliaLang/julia/pull/48273/files#diff-74f71402c994a78b21ded0ac040485d12bfc27e9ffaf538fa0f71f6d284e2991R1038 — maybe that is slowing us down for reading 2-byte strings? Someone could try changing that line back to StringVector(0) to see if it helps. (Of course, that might slow things down for reading longer strings.)

(In general, my feeling is that if you care about performance for reading lots of very short strings, you are better off using the new functionality with something like StringViews.jl to read the strings in-place into a re-used buffer.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

3 participants