Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chop optional number of characters #17457

Closed
wants to merge 13 commits into from
Closed

Chop optional number of characters #17457

wants to merge 13 commits into from

Conversation

bramtayl
Copy link
Contributor

Not urgent, just a handy feature

@@ -914,9 +914,9 @@ generate an array of such random numbers.
randexp

"""
chop(string)
chop(string, i = 1)
Copy link
Contributor

@tkelman tkelman Jul 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signature needs to be updated in the rst, then run make docs to populate the docstring content and commit that too

also i should have single backticks around it in the text of the docstring below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since i tends to refer to an index maybe n or len would be a better choice for the variable name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change if you really want but I like the consistency of using s for string and i for integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. with all these non traditional indexing changes, does it still make sense to annotate i as a integer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that i should be changed, but len sounds like it would be the desired length of the resulting string rather than the length of the section to remove. I think n would be the more natural choice.

@bramtayl
Copy link
Contributor Author

This is going to sound stupid, but how do I run make docs?

@@ -333,7 +333,7 @@

``strings`` can be any iterable over elements ``x`` which are convertible to strings via ``print(io::IOBuffer, x)``\ .

.. function:: chop(string)
.. function:: chop(string, i = 1)

.. Docstring generated from Julia source
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't have a source build, you can just make the change below. though rst has different formatting than markdown so the i will need double backticks in the rst file

@bramtayl
Copy link
Contributor Author

Ok, I think I did it?

@@ -36,7 +36,7 @@ startswith(a::Vector{UInt8}, b::Vector{UInt8}) =

# TODO: fast endswith

chop(s::AbstractString) = s[1:end-1]
chop(s::AbstractString, i::Int = 1) = s[1:end-i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be Integer rather than Int

@omus
Copy link
Member

omus commented Jul 18, 2016

I'm curious: what is your use case for this? I feel like you may want to be using rstrip

I believe Perl is the origin of the chop function (Julia / Perl) which has the sister function chomp(Julia / Perl). Both of these functions both deal with removing a single character from the end of a string. I feel that this change isn't necessary and there already exist better functions to deal with removing multiple trailing characters.

@bramtayl
Copy link
Contributor Author

bramtayl commented Jul 18, 2016

I've included my current code below.

chopn(s, n) = s[1:(end - n)]

function remove_suffix(f, suffix)
  f_string = string(f)
  if !(endswith(f_string, suffix))
    error("f must end in $suffix")
  end
  symbol(chopn(f_string, length(suffix)))
end

nonstandard_1(f) =
  quote
    macro $f(args...)
      esc($f(args...) )
    end
  end

function multiblock(f)
  f_chop = remove_suffix(f, "_1")
  quote
    function $f_chop(fs...)
      Expr(:block, map($f, fs)...)
    end
  end
end

function safe_1(f)
  f_chop = remove_suffix(f, "!")
  quote
    $f_chop(x, args...; kwargs...) =
      $f(copy(x), args...; kwargs...)
  end
end

copykw(kw) = (kw[1], copy(kw[2]) )

function allsafe_1(f)
  f_chop = remove_suffix(f, "!")
  quote
     $f_chop(args...; kwargs...) =
       $f(map(copy, args)...;
          map(copykw, kwargs)...)
  end
end

eval(nonstandard_1(:nonstandard_1))
@nonstandard_1 multiblock
@multiblock nonstandard_1 safe_1 allsafe_1
@nonstandard_1 nonstandard
@nonstandard safe allsafe

@simonster
Copy link
Member

This approach does not work for removing multiple Unicode characters:

julia> "🐵🐵🐵"[1:end-2]
"🐵🐵"

@bramtayl
Copy link
Contributor Author

bramtayl commented Jul 18, 2016

Is there a way to fix that? Maybe there should be a special unicode indexing function where "🐵🐵🐵"u[1:end-2] == "🐵"?

@vtjnash
Copy link
Member

vtjnash commented Jul 18, 2016

hm. it seems that existing function should have been defined as x[1:prevind(x, end)] (or equivalently endof(x) instead of end)

@simonster
Copy link
Member

I think the existing function works because the endof the string is the start of the last character.

@bramtayl
Copy link
Contributor Author

bramtayl commented Jul 18, 2016

PS I'm sure someone has already thought about this but negative indexing of strings would be handy. Like s[1:-3] == chop(s, 2) where -3 means the third character from the end

@tkelman
Copy link
Contributor

tkelman commented Jul 18, 2016

We wouldn't do negative indexing for strings and nowhere else. We're unlikely to do negative indexing for general arrays, at least not without custom range types that could be defined in packages.

@bramtayl
Copy link
Contributor Author

bramtayl commented Jul 18, 2016

Ok, I've thought a bit more about unicode subsetting, and I don't see how what julia has makes sense. Who would want to subset part of a character? I think if you really wanted to do that, one syntax could be "🐵🐵🐵"[3][1] to return the first part of the third monkey (presumably some animal pictograph marker?)

@stevengj
Copy link
Member

String indices are a partial function, for performance reasons; see that section of the manual. That is, the user is responsible for ensuring that only valid indices are passed. One option we've discussed is that a special StringIndex type could be used for indexing, rather than an integer (although it would just be an opaque wrapper around an integer, or similar), to prevent the user from naively plugging in arbitrary numbers or doing index + 1 instead of nextindex etcetera (#9297).

@stevengj
Copy link
Member

But for now, you need to call nextind and prevind rather than + 1 and - 1 on string indices.

@stevengj
Copy link
Member

stevengj commented Jul 19, 2016

I agree with @omus, however, that we should have a clear use case for this function. Do Python or Perl or Ruby have such a function? If not, why should we?

@bramtayl
Copy link
Contributor Author

Thanks for the explanation about the partial indexing. I added a unicode warning to the chop documentation, which seems like best I can do efficiently? I'm not too invested in this PR, and I don't have a use-case outside of the one above.

@stevengj
Copy link
Member

stevengj commented Jul 19, 2016

@bramtayl, if we decide we want that functionality, giving a warning like that is not acceptable. A correct implementation would be something like:

function chop(s::String, len::Integer=1)
   len < 0 && throw(BoundsError())
   i = endof(s)
   while len > 0 && i > 0
       i = prevind(s, i)
       len -= 1
   end
   return s[1:i]
end

This will chop up to len Unicode characters off the end of s.

@bramtayl
Copy link
Contributor Author

Ok, I put in @stevengj 's code and used @ararslan 's suggestion of n as an argument name

@kshyatt kshyatt added the strings "Strings!" label Aug 9, 2016
@kshyatt
Copy link
Contributor

kshyatt commented Aug 9, 2016

Hi @bramtayl, looks like this needs a rebase. Are you still interested in working on this feature?

@bramtayl
Copy link
Contributor Author

bramtayl commented Aug 9, 2016

No, I guess not? I think this feature won't really be useful until string subsetting and unicode characters start getting along better?

@StefanKarpinski
Copy link
Member

Why not? With the fixes to work correctly with Unicode, this functionality is totally workable.

@bramtayl
Copy link
Contributor Author

bramtayl commented Aug 9, 2016

Ok! Well I guess I'm not sure what the next steps are?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants