Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get(string, index, default) #22500

Merged
merged 1 commit into from
Jun 26, 2017
Merged

Add get(string, index, default) #22500

merged 1 commit into from
Jun 26, 2017

Conversation

staticfloat
Copy link
Member

I think it would be nice to provide this functionality with a wider set of collections. With this change we now get:

julia> get("Julia", 1, ' ')
'J': ASCII/Unicode U+004a (category Lu: Letter, uppercase)

julia> get("Julia", 20, ' ')
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)

julia> get("Julia", -5, ' ')
' ': ASCII/Unicode U+0020 (category Zs: Separator, space)

@@ -39,6 +39,7 @@ getindex(s::AbstractString, v::AbstractVector{<:Integer}) =
getindex(s::AbstractString, v::AbstractVector{Bool}) =
throw(ArgumentError("logical indexing not supported for strings"))

get(s::AbstractString, i::Integer, default) = start(s) <= i <= endof(s) ? s[i] : default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the index check isvalid(s, i)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I tried checkbounds() but that throws an error. isvalid() works great.

@ararslan ararslan added the strings "Strings!" label Jun 23, 2017
@nalimilan
Copy link
Member

Good idea. Can you add a test?

@nalimilan nalimilan added the needs tests Unit tests are required for this change label Jun 24, 2017
@tkelman tkelman removed the needs tests Unit tests are required for this change label Jun 24, 2017
# issue #22500 (using `get()` to safely index strings)
@test get("Julia", 1, ' ') == 'J'
@test get("Julia", -1, ' ') == ' '
@test get("Julia", 10, ' ') == ' '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth testing that the default is returned when indexing at a position which is within the bounds of the string, but doesn't correspond to the beginning of a character?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent. This is exactly the kind of sideways thinking I want when I open a PR. Addressed in my new squashed commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! But there's some spacing issue which made the CI fail.

@KristofferC KristofferC merged commit 6e93321 into master Jun 26, 2017
@KristofferC KristofferC deleted the sf/stringget branch June 26, 2017 01:04
DrTodd13 pushed a commit to IntelLabs/julia that referenced this pull request Jun 26, 2017
@StefanKarpinski
Copy link
Member

This feels a bit undermotivated to me. Was there an actual use case here?

@staticfloat
Copy link
Member Author

I don't remember the original use case for this, but it seems a good tool for defensive programming. It allows me to do things like write verbs such as "split paragraph X into sentences, then take the 2nd character of the 3rd word" in a way that allows me to treat failure paths the same between characters in a word vs. words in a sentence.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 1, 2017

Eh, sure. If there's even a vague motivation for it, I guess it's ok. It treats indexing into the middle of a character the same as out-of-bounds indexing, however, which seems pretty sketchy to me.

@staticfloat
Copy link
Member Author

While that's true, the purpose behind get(collection, index, default) always seemed to me to be "give me collection[index] and if anything goes wrong, return default". If we need ways to disambiguate different failure modes, we should probably come up with a different API than this simple get() method.

@StefanKarpinski
Copy link
Member

the purpose behind get(collection, index, default) always seemed to me to be "give me collection[index] and if anything goes wrong, return default".

I don't think that's quite true – there's a difference between asking about a nonsense index and asking about an index that makes sense but doesn't exist. Again, an actual motivating use case would help understand what to do here.

@staticfloat
Copy link
Member Author

there's a difference between asking about a nonsense index and asking about an index that makes sense but doesn't exist.

I agree with you; these are two different classes with errors that have different meanings. What I'm saying is that if your API only takes in collection, index, default, then to me it makes sense that the contract the API provides is "no matter what error I encounter, I will only ever return collection[index] or default". If you need something that is finer grained than this, then you should be using a different API. I see it similarly to wrapping code in try ... end; the contract here is no matter what goes wrong, don't throw an exception. If you want an exception, use a different API, e.g. try ... catch ... end.

I agree that indexing into the middle of wide characters is fundamentally a very different failure mode than indexing off the end of a string, but from the try ... end standpoint, it's just another error.

Seeing as this is a pretty simple case, I would suggest that if someone wants to differentiate between the different errors, they should just deal with those directly, and not use the get(collection, index, default) API.

@TotalVerb
Copy link
Contributor

For what it's worth

julia> A = []
0-element Array{Any,1}

julia> get(A, "foo", 0)
ERROR: MethodError: no method matching get(::Array{Any,1}, ::String, ::Int64)
Closest candidates are:
  get(::ObjectIdDict, ::Any, ::Any) at associative.jl:541
  get(::Base.EnvDict, ::AbstractString, ::Any) at env.jl:79
  get(::AbstractArray, ::Integer, ::Any) at abstractarray.jl:1032
  ...

@nalimilan
Copy link
Member

I think we should know the concrete use case before making a decision.

In general, I don't see in what case it would be legitimate to call get with an index which points in the middle of a code point and expect to get default. This kind of behavior is likely to hide bugs for people who don't take care of handling Unicode properly. We could return default only when the index is out of bounds, not when it is inside the bounds but invalid.

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Dec 3, 2017
@StefanKarpinski
Copy link
Member

I changed this in #24999: it now returns the default if you use an out-of-bounds index and throws an error if you use an invalid, in-bounds index.

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Dec 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants