Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Make endof() robust to invalid UTF-8 (#17276)
When an invalid string contains only continuation bytes, endof() tried to index the underlying array at position 0. Instead of relying on bounds checking, explicitly check for > 0. Returning 0 when only continuation bytes where encountered is consistent with the definition of endof(), which gives the last valid index. This also allows removing the i == 0 check. The new code appears to be slightly faster than the old one.
- Loading branch information
fa5af23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Executing the daily benchmark build, I will reply here when finished:
@nanosoldier
runbenchmarks(ALL, isdaily = true)
EDIT by @jrevels: master broke JLD, which BenchmarkTools/Nanosoldier currently uses for (de)serialization of benchmark parameters/results. See JuliaCI/BenchmarkTools.jl#15.
fa5af23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've deployed the recent JLD fix (JuliaIO/JLD.jl#78) to nanosoldier, let's see if things work now:
@nanosoldier
runbenchmarks(ALL, isdaily = true)
fa5af23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels
fa5af23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing related to strings at least.
fa5af23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strings benchmarks do not appear to be particularly thorough. The total "test" content of
BaseBenchmarks/src/string/StringBenchmarks.jl
is:It would be great to have more complete benchmarks, especially given all the churn in the world of strings and some reports (which I can't find right now) of performance regressions.
fa5af23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both strings and IO are woefully under-tested by our existing suite, especially compared to our indexing/linalg benchmarks.
fa5af23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't take this as a proof of anything. Actually, I didn't expect any regressions, as that method only has a very well-defined use, and in my tests all scenarios were faster. Just wanted to note that among the "possible regressions" nothing was string-related (which would have been a problem).