-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve SubString nextind/prevind #24255
Conversation
I don't understand the changes you made for the 2-argument methods. Didn't you completely remove the generic |
I do not have to define Actually I had to do something opposite - in |
To see this you can look at dispatch of The problem is that using |
test/strings/basic.jl
Outdated
let strs = Any["∀α>β:α+1>β", GenericString("∀α>β:α+1>β")] | ||
for i in 1:2 | ||
let str = "∀α>β:α+1>β", strs = Any[str, GenericString(str), | ||
SubString(str, 1), SubString(str, 1, 4), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those multibyte characters - on my terminal it looked correct :(.
test/strings/basic.jl
Outdated
@test nextind(SubString("1234567", 3), 3, 2) == 5 | ||
@test nextind(SubString("1234567", 3), 4, 2) == 6 | ||
|
||
@test prevind(SubString(GenericString("1234567"), 1, 3), 10) == 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't these be merged with the previous tests using a loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I have done previously, but the discontinuity at 1
and endof(s)
made it problematic (the test code would be very messy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must be missing something. These lines are identical to those above except for the GenericString
part, so what's the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have put into the loop all similar cases and left the differences below.
OK. Can you add commit messages explaining changes like you did in the PR? It would be useful to mention what you just said. Better keep two separate commits as it makes changes clearer. |
Maybe I will add comments in the source code? The changes are complex and I could not think of good brief commit messages. OK? |
base/strings/types.jl
Outdated
function nextind(s::SubString{String}, i::Integer) | ||
# handle the case when i+s.offset is greater than 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"smaller"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewritten, hopefully it is clearer now
base/strings/types.jl
Outdated
i < 1 && return 1 | ||
nextind(s.string, i+s.offset)-s.offset | ||
end | ||
|
||
function prevind(s::SubString{String}, i::Integer) | ||
e = endof(s) | ||
# handle the case when i+s.offset is greater than endof(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Than endof(s.string)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewritten, hopefully it is clearer now
base/strings/types.jl
Outdated
i > e && return e | ||
prevind(s.string, i+s.offset)-s.offset | ||
end | ||
|
||
function nextind(s::SubString{String}, i::Integer, nchar::Integer) | ||
# nextind for nchar=1 must return nonnegative value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this would apply only to "nchar=1".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewritten, hopefully it is clearer now
Commit messages don't need to be brief. I'd even advocate the contrary. |
65eb76d
to
a38df2e
Compare
OK - I have reorganized the commits. |
base/strings/types.jl
Outdated
# make sure that value not greater than endof(s) is returned | ||
j = Int(i) | ||
j > e && return e | ||
# the transofrmation below is valid if j<=endof(s)+1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"transformation". Same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
test/strings/basic.jl
Outdated
end | ||
end | ||
|
||
@test prevind(SubString("1234567", 1, 3), 0) == -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return -1
in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because prevind("1234567", 0) == -1
. It is an inconsistency between prevind
for String
and for AbstractString
.
CI e87ec73 produces strange StackOverflow due to recursion in tests (when serialize is called - do not understand why in this case it is called at all).
produced:
I assume that the way |
base/strings/types.jl
Outdated
@@ -46,6 +46,8 @@ function SubString(s::SubString, i::Int, j::Int) | |||
j <= endof(s) || throw(BoundsError(s, j)) | |||
SubString(s.string, s.offset + i, s.offset + j) | |||
end | |||
SubString{T}(s::T, i::Int, j::Int) where {T<:SubString} = SubString(s, i, j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this method if it's not needed (since it didn't fix the problem).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I will move it to a separate PR
This looks good in terms of implementation, but can you tell me more about the logic behind the rules themselves? Is there a strong reason to fix incorrect indices? If the user passed an out of bounds index, we could decide that the only guaranty we make is that an out of bounds index will be returned, without specifying the exact rule? Overall I'd be tempted to choose the rule that is the simplest and the most efficient to implement: it would be too bad to lose a bit of performance because of our handling of invalid cases. There's also the slightly weird discrepancy between |
The key element of the logic in this PR is the following: Currently we allow out of range indices to
In both cases there is a jump to these two values no matter how low or high With String and GenericString discrepancy is due to the fact that they return different values for out-of-bounds indices (so as you say - if we do not specify what should be returned - different types can return different things and tests have to take this into account). Some time ago I tried to unify it but the only fix would be to make PS. I am still working on making tests pass because of |
I have rewritten the offending part of the tests to check the contract only (negative or greater than |
Thanks for the summary. You shouldn't be afraid of putting these details in commit messages, that can't hurt. It would be interesting indeed if you can come up with a PR unifying behaviors. The code which relies on these implementation details would better be fixed anyway. |
PR passing the tests finally. The key difficulty was that it is natural for |
905354c
to
5d9fa65
Compare
rebased |
I wouldn't care too much about what is "natural". An out of bounds index is invalid, we don't really care whether it's the first index after the end of the string or not, anyway you can't use it for anything. We just have to ensure any invalid index gives the same result e.g. when passed to |
The CI fail seems to be unrelated. Should it be merged (part of this PR is fixing a bug in nexting/prevind that is present in 0.6) |
I don't know, there might well be code out there that relies on Is that behavior changed by this PR? |
@stevengj The change in the area you are asking about is from:
to
It is needed because the old code could produce negative values of
In the new code
So the question is (especially with relation to #24613 - I do not know the details of that PR) - what is the desired functionality of |
Returning 1 seems like the desired behavior to me, thanks. (I think in general we want |
In general – and I think we should spell this out explicitly – it seems like strings should behave as if there is a 1-code-unit character before the string at index |
@StefanKarpinski And how about merging? :-) |
👍 Needs a conflict resolution that wasn't immediately obvious to me. |
@StefanKarpinski regarding codeunit arithmetics. +1 for the proposal to set Regarding:
How |
…and prevind returns an index not greater than endof(s)
…t for SubString{String}
5d9fa65
to
f114c04
Compare
@StefanKarpinski conflict resolved (nothing required changing - git got confused by merge of |
Another model we could use here is that there are an infinite number of single-code-unit characters before and after the end of the string. That way |
I guess another benefit of the infinite 1-code-unit chars model is that |
I am OK also with such an approach, but it also requires fixing In general now |
I plan on taking another pass through all of this stuff to make it more consistent and try to document and test things more thoroughly. Long overdue but now happening in the eleventh hour :) |
CI failure seems unrelated. |
Having thought about this for a while, I think the most conservative approach is probably best for now: allow returning and accepting |
This PR is outdated and should be closed, but I am not sure if current implementation is what was intended (I was unable to follow the whole discussion during @StefanKarpinski What I find inconsistent is the following behavior of
I guess it was not intended (but maybe it is). I would assume that
Also, at least for my version of Julia:
The docstring for
throws |
nextind(s::SubString, i::Integer) = nextind(s.string, i+s.offset)-s.offset | ||
prevind(s::SubString, i::Integer) = prevind(s.string, i+s.offset)-s.offset | ||
# need to define nextind and prevind only for SubString{String} | ||
# as other cases are handled by definitions for AbstractString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These generic definitions for SubString{T}
should be more efficient than the generic nextind
for AbstractString
if there is an efficient nextind
for T
. So, better to leave them in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh never mind, I guess we already discussed that above?:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is outdated (it was prepared before "string overhaul"), but the problem it solved is still present in the new code.
Therefore my thinking is to first decide on the desired behavior (I guess @StefanKarpinski will have an opinion here).
Next there is an issue of maximum efficiency of implementation. My experience is that at least SubString{String}
should have a specialized method - as this will be most common case in practice and probably there is a room for efficiency improvement by using custom code.
But first we should have a defined contract we want the implementation to follow.
Closed by #25531. |
This PR introduces the following changes:
nextind
andprevind
withnchar
forSubString{String}
(now it does not use genericprevind
/nextind
but a specialized method forString
)prevind
andnextind
work consistently with any type ofSubString
:string
field) isString
orDirectIndexString
thenprevind
/nextind
simply shifts the output byoffset
as in this caseprevind
can return negative values;String
then we have to handle the restriction thatprevind
in this case should not return negative values (i.e. the minimum value ofprevind
is0
);nextind
returned a non-positive value, e.g. in the expressionnextind(SubString("1234234", 3), -6)
; the general contract is thatnextind
must always return a positive index; now it is checked; a similar fix is implemented forprevind
returning something beyondendof
(e.g.prevind(SubString("12345678", 1, 1), 10)
).(the first change improves performance of new functionality in 0.7, the second and third change fix wrong behavior that was present in 0.6)
@nalimilan The only area I do not touch are behavior of
prevind
andnextind
forDirectIndexString
as I understand they are to be removed anyway (and they do not work consistently with all other functionality for strings now anyway).The test code for this is a bit complex - I can separate different cases (but the test code will be more verbose) if more clarity is required.