-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inconsistencies of findnext and findprev functions #40120
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some styling nits.
Co-authored-by: Shuhei Kadowaki <[email protected]>
This looks great! Adding a triage label since it technically is breaking. |
This might be kind of dumb, but could we do the string versions of this by reinterpreting to |
Codecov Report
@@ Coverage Diff @@
## master #40120 +/- ##
==========================================
+ Coverage 87.33% 87.39% +0.05%
==========================================
Files 390 390
Lines 75796 76077 +281
==========================================
+ Hits 66197 66487 +290
+ Misses 9599 9590 -9
Continue to review full report at Codecov.
|
In fact, the original code does that by wrapping a string |
triage says ARGHHHHHHH |
|
I think Lines 534 to 542 in 1897e08
That is, the third argument of |
bump |
Need to actually discuss this on a triage call. |
The examples, fixes & consistency improvements look good, so I'd be in favor, but I haven't looked at the code in detail. Would need a rebase & some squashing though. @oscardssmith do you happen to remember what there was to AAAAAAARGH about? |
basically the problem is that there doesn't seem to be a good api that is fully consistent. |
yeah, we added those tests here: and I remember we added those tests because some code rely on that behavior and earlier version of that PR broke those code, thus we added a test. |
Triage was re-looking at this and the answer here is that for |
after reviewing this with fresh eyes (and having found more bugs in the current implimentation) triage now believes that this is completely correct and should be merged (assuming pkgeval doesn't hate it). One thing not made clear in the rationalle above is that we should have
proposed:
We also should make |
IIUC, in all cases, |
I wonder if people are really comfortable with currently, if user is handed a |
this proposal is to make the semantics of all vectors consistent with the semantics for strings. |
but didn't you say
are you saying: currentjulia> findprev(==(UInt8(1)), UInt8[1,2,3], 4)
julia> findprev(==(1), UInt8[1,2,3], 4)
ERROR: BoundsError: attempt to access 3-element Vector{UInt8} at index [4]
Stacktrace:
[1] getindex
@ ./essentials.jl:13 [inlined]
[2] findprev(testf::Base.Fix2{typeof(==), Int64}, A::Vector{UInt8}, start::Int64)
@ Base ./array.jl:2253
[3] top-level scope
@ REPL[23]:1 proposed?julia> findprev(==(UInt8(1)), UInt8[1,2,3], 4)
1
julia> findprev(==(1), UInt8[1,2,3], 4)
1 |
yeah. sorry if that was unclear. I think @LilithHafner's description of triage intent is exactly correct. for all collection types, if there is a matching value before the index, findprev will return it, and will otherwise return nothing. (and likewise for findnext/after) |
This pull request fixes several inconsistent behaviors . The proposed changes are breaking in some edge cases.
There are mainly three behavioral changes:
findprev
/findlast
with empty search string #39940 and find(next|prev|last|first) for failed matches #36768).Fixes #39940, #36768, #40006, and #40244.
Empty strings
The first change will be best depicted by the following example.
master:
proposed:
This also changes
findlast("", b)
ifb
is not an empty string.Bounds check
The second change makes it possible to specify any position larger than the last index of the second string:
master:
proposed:
I'm not sure why the current behavior is so strict, but the proposed behavior is more consistent with the following generic method of
findnext
:julia/base/array.jl
Lines 1855 to 1866 in 6913f9c
findprev
is also changed in the symmetric way.master:
proposed:
Index alignment
The third change allows starting a search in the middle of a character in multibyte strings.
master:
proposed:
This change simplify some code that uses
findnext
orfindprev
. For example,findall
can be implemented in an obvious way:In my opinion, the index check is a kind of implementation detail and should be hidden from the user.
Performance
I also refactored the current code almost from scratch because it was difficult to fix inconsistencies without that. So, I quickly checked the performance degradation and discovered that the performance was slightly improved for string-string search.
This is a benchmark script.
master:
proposed:
Breaking changes
I found several questionable behaviors while working on this pull request; some of them seem to be intentional because they are explicitly tested. One is about splitting a string with an empty string, which is already filed at #40117. Another is the following two test cases:
julia/test/strings/search.jl
Lines 420 to 421 in 6913f9c
This suggests
findprev
should be more permissive in the right bound thanfindnext
should be in the left bound (findnext("a", "abc", 0)
throwsBoundsError
). This is also expected in other places such asjulia/test/strings/search.jl
Line 319 in 6913f9c
findprev("a", "abc", 4)
(for example) throws anBoundsError
exception, which is consistent withfindnext
and other methods of thefindprev
function. Technically speaking, this is a breaking change, and thus I'm not perfectly sure if this is acceptable.