-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chop optional number of characters #17457
Conversation
@@ -914,9 +914,9 @@ generate an array of such random numbers. | |||
randexp | |||
|
|||
""" | |||
chop(string) | |||
chop(string, i = 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signature needs to be updated in the rst, then run make docs
to populate the docstring content and commit that too
also i
should have single backticks around it in the text of the docstring below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since i
tends to refer to an index maybe n
or len
would be a better choice for the variable name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change if you really want but I like the consistency of using s for string and i for integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. with all these non traditional indexing changes, does it still make sense to annotate i as a integer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that i
should be changed, but len
sounds like it would be the desired length of the resulting string rather than the length of the section to remove. I think n
would be the more natural choice.
This is going to sound stupid, but how do I run make docs? |
@@ -333,7 +333,7 @@ | |||
|
|||
``strings`` can be any iterable over elements ``x`` which are convertible to strings via ``print(io::IOBuffer, x)``\ . | |||
|
|||
.. function:: chop(string) | |||
.. function:: chop(string, i = 1) | |||
|
|||
.. Docstring generated from Julia source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't have a source build, you can just make the change below. though rst has different formatting than markdown so the i
will need double backticks in the rst file
Ok, I think I did it? |
@@ -36,7 +36,7 @@ startswith(a::Vector{UInt8}, b::Vector{UInt8}) = | |||
|
|||
# TODO: fast endswith | |||
|
|||
chop(s::AbstractString) = s[1:end-1] | |||
chop(s::AbstractString, i::Int = 1) = s[1:end-i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably be Integer
rather than Int
I'm curious: what is your use case for this? I feel like you may want to be using I believe Perl is the origin of the |
I've included my current code below. chopn(s, n) = s[1:(end - n)]
function remove_suffix(f, suffix)
f_string = string(f)
if !(endswith(f_string, suffix))
error("f must end in $suffix")
end
symbol(chopn(f_string, length(suffix)))
end
nonstandard_1(f) =
quote
macro $f(args...)
esc($f(args...) )
end
end
function multiblock(f)
f_chop = remove_suffix(f, "_1")
quote
function $f_chop(fs...)
Expr(:block, map($f, fs)...)
end
end
end
function safe_1(f)
f_chop = remove_suffix(f, "!")
quote
$f_chop(x, args...; kwargs...) =
$f(copy(x), args...; kwargs...)
end
end
copykw(kw) = (kw[1], copy(kw[2]) )
function allsafe_1(f)
f_chop = remove_suffix(f, "!")
quote
$f_chop(args...; kwargs...) =
$f(map(copy, args)...;
map(copykw, kwargs)...)
end
end
eval(nonstandard_1(:nonstandard_1))
@nonstandard_1 multiblock
@multiblock nonstandard_1 safe_1 allsafe_1
@nonstandard_1 nonstandard
@nonstandard safe allsafe |
This approach does not work for removing multiple Unicode characters: julia> "🐵🐵🐵"[1:end-2]
"🐵🐵" |
Is there a way to fix that? Maybe there should be a special unicode indexing function where |
hm. it seems that existing function should have been defined as |
I think the existing function works because the |
PS I'm sure someone has already thought about this but negative indexing of strings would be handy. Like |
We wouldn't do negative indexing for strings and nowhere else. We're unlikely to do negative indexing for general arrays, at least not without custom range types that could be defined in packages. |
Ok, I've thought a bit more about unicode subsetting, and I don't see how what julia has makes sense. Who would want to subset part of a character? I think if you really wanted to do that, one syntax could be |
String indices are a partial function, for performance reasons; see that section of the manual. That is, the user is responsible for ensuring that only valid indices are passed. One option we've discussed is that a special |
But for now, you need to call |
I agree with @omus, however, that we should have a clear use case for this function. Do Python or Perl or Ruby have such a function? If not, why should we? |
Thanks for the explanation about the partial indexing. I added a unicode warning to the chop documentation, which seems like best I can do efficiently? I'm not too invested in this PR, and I don't have a use-case outside of the one above. |
@bramtayl, if we decide we want that functionality, giving a warning like that is not acceptable. A correct implementation would be something like: function chop(s::String, len::Integer=1)
len < 0 && throw(BoundsError())
i = endof(s)
while len > 0 && i > 0
i = prevind(s, i)
len -= 1
end
return s[1:i]
end This will chop up to |
This reverts commit 560b7e7.
Hi @bramtayl, looks like this needs a rebase. Are you still interested in working on this feature? |
No, I guess not? I think this feature won't really be useful until string subsetting and unicode characters start getting along better? |
Why not? With the fixes to work correctly with Unicode, this functionality is totally workable. |
Ok! Well I guess I'm not sure what the next steps are? |
Not urgent, just a handy feature