-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Char * String and Char * Char #22532
Conversation
Perhaps you could also add some tests of the kind |
added extra tests
I've added a few more tests. I tried to squash the extra commit, but I just made a horrible mess. How should I do that? |
It's OK, it can be squashed on merge with the github ui. |
base/strings/basic.jl
Outdated
@@ -68,6 +68,9 @@ julia> "Hello " * "world" | |||
``` | |||
""" | |||
(*)(s1::AbstractString, ss::AbstractString...) = string(s1, ss...) | |||
(*)(c::Char, s::AbstractString) = string(c, s) | |||
(*)(s::AbstractString, c::Char) = string(s, c) | |||
(*)(c1::Char, c2::Char) = string(c1, c2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might subsume all these with
(*)(s::AbstractString, ss::Union{Char,AbstractString}...) = string(s, ss...)
(*)(c::Char, ss::Union{Char,AbstractString}...) = string(c, ss...)
and take advantage of the string(::Union{Char,String}...)
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reduced it further to one method (see next commit)
Sorry I misread :) |
PS |
docstring should be updated here - it isn't formatted correctly on master, signatures should be indented, not backtick fenced |
base/strings/basic.jl
Outdated
``` | ||
*(s::AbstractString, t::AbstractString) | ||
``` | ||
*(s::Union{Char, AbstractString}, t::Union{Char, AbstractString}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... on the second input
Does this need anything else? |
There was a timeout on linux 32-bit so I restated the build (gist backed-up here https://gist.github.com/pabloferz/4cb263e0967ba9a5c3256bc53d3619ee) |
base/strings/basic.jl
Outdated
|
||
Concatenate strings. The `*` operator is an alias to this function. | ||
Concatenate strings and characters. The `*` operator is an alias to this function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"[...] and characters to a [`String`](@ref)
." perhaps?
Also, this is the *
function, its an alias for itself!?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make it like this:
"""
*(s::Union{AbstractString, Char}, t::Union{AbstractString, Char})
Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent
to calling the [`string`](@ref) function on the arguments.
"""
base/strings/basic.jl
Outdated
Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent | ||
to calling the [`string`](@ref) function on the arguments. | ||
|
||
# Examples | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the blank line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had been putting lines between the headers and contents in docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recently moving towards getting rid of them everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K, thanks for the heads up
``` | ||
""" | ||
(*)(s1::AbstractString, ss::AbstractString...) = string(s1, ss...) | ||
(*)(s1::Union{Char, AbstractString}, ss::Union{Char, AbstractString}...) = string(s1, ss...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from some of the linalg code, typically there isn't a space after the comma in Union
. Probably best to keep consistency within this file and alike.
Unless there are objections, I plan to merge this in 24 hours. |
base/strings/basic.jl
Outdated
@@ -56,9 +56,12 @@ sizeof(s::AbstractString) = error("type $(typeof(s)) has no canonical binary rep | |||
eltype(::Type{<:AbstractString}) = Char | |||
|
|||
""" | |||
*(s::Union{Char, AbstractString}, t::Union{Char, AbstractString}...) | |||
*(s::Union{AbstractString, Char}, t::Union{AbstractString, Char}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why where the three dots removed?
Once the |
Just add it back (with collaborator access to the branch) and merge? |
[ci skip]
Good idea, @KristofferC. Done. |
Thanks for the contribution, @adamslc! Nice work here. |
Two problems:
|
Regarding |
@ararslan, no, that is not correct. e.g. One the other hand, it is true that |
We should definitely have a specialized |
The following specialized version of function repeat(s::Char, r::Integer)
r < 0 && throw(ArgumentError("can't repeat a char $r times"))
out = _string_n(r)
ccall(:memset, Ptr{Void}, (Ptr{UInt8}, Cint, Csize_t), out, s, r)
return out
end
There isn't much speed improvement over |
(That only works for |
Calling |
@musm, I understand that, but since |
To implement an efficient character repeating operator, it's sufficient to figure out what 1-4 byte pattern the character produces in UTF-8 and then copy that as many times as the character needs to be repeated. Not entirely straightforward, but not crazy to implement either. |
Is there something relevant already implemented in utf8proc? |
Stefan, I know its possible, but I don't think it is worth the trouble |
Sure, can always be done as an optimization in the future some time. |
@ScottPJones sent me the following version which does not allocate a while back. I don't think he has had the time to open a PR on his branch so I am posting this here in the hopes that someone opens a PR with the change function repeat(c::Char, r::Integer)
r < 0 && throw(ArgumentError("can't repeat a character $r times"))
r == 0 && return ""
ch = UInt(c)
if ch < 0x80
out = Base._string_n(r)
ccall(:memset, Ptr{Void}, (Ptr{UInt8}, Cint, Csize_t), out, c, r)
elseif ch < 0x800
out = _string_n(2r)
p16 = reinterpret(Ptr{UInt16}, pointer(out))
u16 = ((ch >> 0x6) | (ch & 0x3f) << 0x8) % UInt16 | 0x80c0
@inbounds for i = 1:r
unsafe_store!(p16, u16, i)
end
elseif ch < 0x10000
(0xd800 ≥ ch ≤ 0xdfff) || throw(ArgumentError("invalid character 0x$(hex(ch))"))
out = _string_n(3r)
p = pointer(out)
b1 = (ch >> 0xc) % UInt8 | 0xe0
b2 = ((ch >> 0x6) & 0x3f) % UInt8 | 0x80
b3 = (ch & 0x3f) % UInt8 | 0x80
@inbounds for i = 1:r
unsafe_store!(p, b1)
unsafe_store!(p, b2, 2)
unsafe_store!(p, b3, 3)
p += 3
end
elseif ch < 0x110000
out = _string_n(4r)
p32 = reinterpret(Ptr{UInt32}, pointer(out))
u32 = ((ch >> 0x12) | ((ch >> 0x4) & 0x03f00) |
((ch << 0xa) & 0x3f0000) | ((ch & 0x3f) << 0x18)) % UInt32 | 0x808080f0
@inbounds for i = 1:r
unsafe_store!(p32, u32)
p32 += 4
end
else
throw(ArgumentError("invalid character 0x$(hex(ch))"))
end
return out
end |
Thanks, PR: #23787. |
Solves #22512. I am definitely a Github novice, so let me know if I made a stupid mistake somewhere...