Chop optional number of characters #17457

bramtayl · 2016-07-16T20:48:06Z

Not urgent, just a handy feature

tkelman · 2016-07-16T20:50:33Z

base/docs/helpdb/Base.jl

@@ -914,9 +914,9 @@ generate an array of such random numbers.
 randexp

 """
-    chop(string)
+    chop(string, i = 1)


signature needs to be updated in the rst, then run make docs to populate the docstring content and commit that too

also i should have single backticks around it in the text of the docstring below

Since i tends to refer to an index maybe n or len would be a better choice for the variable name?

I can change if you really want but I like the consistency of using s for string and i for integer.

P.S. with all these non traditional indexing changes, does it still make sense to annotate i as a integer?

I agree that i should be changed, but len sounds like it would be the desired length of the resulting string rather than the length of the section to remove. I think n would be the more natural choice.

bramtayl · 2016-07-16T21:03:33Z

This is going to sound stupid, but how do I run make docs?

tkelman · 2016-07-16T21:05:25Z

doc/stdlib/strings.rst

@@ -333,7 +333,7 @@

   ``strings`` can be any iterable over elements ``x`` which are convertible to strings via ``print(io::IOBuffer, x)``\ .

-.. function:: chop(string)
+.. function:: chop(string, i = 1)

   .. Docstring generated from Julia source


if you don't have a source build, you can just make the change below. though rst has different formatting than markdown so the i will need double backticks in the rst file

bramtayl · 2016-07-16T21:11:24Z

Ok, I think I did it?

omus · 2016-07-18T03:23:02Z

base/strings/util.jl

@@ -36,7 +36,7 @@ startswith(a::Vector{UInt8}, b::Vector{UInt8}) =

 # TODO: fast endswith

-chop(s::AbstractString) = s[1:end-1]
+chop(s::AbstractString, i::Int = 1) = s[1:end-i]


Should probably be Integer rather than Int

omus · 2016-07-18T13:50:00Z

I'm curious: what is your use case for this? I feel like you may want to be using rstrip

I believe Perl is the origin of the chop function (Julia / Perl) which has the sister function chomp(Julia / Perl). Both of these functions both deal with removing a single character from the end of a string. I feel that this change isn't necessary and there already exist better functions to deal with removing multiple trailing characters.

bramtayl · 2016-07-18T17:16:42Z

I've included my current code below.

chopn(s, n) = s[1:(end - n)]

function remove_suffix(f, suffix)
  f_string = string(f)
  if !(endswith(f_string, suffix))
    error("f must end in $suffix")
  end
  symbol(chopn(f_string, length(suffix)))
end

nonstandard_1(f) =
  quote
    macro $f(args...)
      esc($f(args...) )
    end
  end

function multiblock(f)
  f_chop = remove_suffix(f, "_1")
  quote
    function $f_chop(fs...)
      Expr(:block, map($f, fs)...)
    end
  end
end

function safe_1(f)
  f_chop = remove_suffix(f, "!")
  quote
    $f_chop(x, args...; kwargs...) =
      $f(copy(x), args...; kwargs...)
  end
end

copykw(kw) = (kw[1], copy(kw[2]) )

function allsafe_1(f)
  f_chop = remove_suffix(f, "!")
  quote
     $f_chop(args...; kwargs...) =
       $f(map(copy, args)...;
          map(copykw, kwargs)...)
  end
end

eval(nonstandard_1(:nonstandard_1))
@nonstandard_1 multiblock
@multiblock nonstandard_1 safe_1 allsafe_1
@nonstandard_1 nonstandard
@nonstandard safe allsafe

simonster · 2016-07-18T17:38:58Z

This approach does not work for removing multiple Unicode characters:

julia> "🐵🐵🐵"[1:end-2]
"🐵🐵"

bramtayl · 2016-07-18T17:46:30Z

Is there a way to fix that? Maybe there should be a special unicode indexing function where "🐵🐵🐵"u[1:end-2] == "🐵"?

vtjnash · 2016-07-18T19:23:42Z

hm. it seems that existing function should have been defined as x[1:prevind(x, end)] (or equivalently endof(x) instead of end)

simonster · 2016-07-18T19:32:36Z

I think the existing function works because the endof the string is the start of the last character.

bramtayl · 2016-07-18T20:02:19Z

PS I'm sure someone has already thought about this but negative indexing of strings would be handy. Like s[1:-3] == chop(s, 2) where -3 means the third character from the end

tkelman · 2016-07-18T20:11:59Z

We wouldn't do negative indexing for strings and nowhere else. We're unlikely to do negative indexing for general arrays, at least not without custom range types that could be defined in packages.

bramtayl · 2016-07-18T23:54:30Z

Ok, I've thought a bit more about unicode subsetting, and I don't see how what julia has makes sense. Who would want to subset part of a character? I think if you really wanted to do that, one syntax could be "🐵🐵🐵"[3][1] to return the first part of the third monkey (presumably some animal pictograph marker?)

stevengj · 2016-07-19T14:29:17Z

String indices are a partial function, for performance reasons; see that section of the manual. That is, the user is responsible for ensuring that only valid indices are passed. One option we've discussed is that a special StringIndex type could be used for indexing, rather than an integer (although it would just be an opaque wrapper around an integer, or similar), to prevent the user from naively plugging in arbitrary numbers or doing index + 1 instead of nextindex etcetera (#9297).

stevengj · 2016-07-19T14:30:44Z

But for now, you need to call nextind and prevind rather than + 1 and - 1 on string indices.

stevengj · 2016-07-19T14:32:29Z

I agree with @omus, however, that we should have a clear use case for this function. Do Python or Perl or Ruby have such a function? If not, why should we?

bramtayl · 2016-07-19T14:59:40Z

Thanks for the explanation about the partial indexing. I added a unicode warning to the chop documentation, which seems like best I can do efficiently? I'm not too invested in this PR, and I don't have a use-case outside of the one above.

stevengj · 2016-07-19T15:34:12Z

@bramtayl, if we decide we want that functionality, giving a warning like that is not acceptable. A correct implementation would be something like:

function chop(s::String, len::Integer=1)
   len < 0 && throw(BoundsError())
   i = endof(s)
   while len > 0 && i > 0
       i = prevind(s, i)
       len -= 1
   end
   return s[1:i]
end

This will chop up to len Unicode characters off the end of s.

This reverts commit 560b7e7.

bramtayl · 2016-07-21T02:34:09Z

Ok, I put in @stevengj 's code and used @ararslan 's suggestion of n as an argument name

kshyatt · 2016-08-09T18:09:12Z

Hi @bramtayl, looks like this needs a rebase. Are you still interested in working on this feature?

bramtayl · 2016-08-09T18:41:25Z

No, I guess not? I think this feature won't really be useful until string subsetting and unicode characters start getting along better?

StefanKarpinski · 2016-08-09T18:42:24Z

Why not? With the fixes to work correctly with Unicode, this functionality is totally workable.

bramtayl · 2016-08-09T18:53:53Z

Ok! Well I guess I'm not sure what the next steps are?

bramtayl added 3 commits July 16, 2016 16:40

Add optional number of characters to chop

fbc426e

Updated docs with chop optional argument

6a9a42e

Update util.jl

9bf0000

tkelman reviewed Jul 16, 2016
View reviewed changes

bramtayl added 3 commits July 16, 2016 16:58

Added backticks

972890d

Updated strings.rst

75a364b

Update strings.rst

9b40d23

tkelman reviewed Jul 16, 2016
View reviewed changes

Update strings.rst

bf5a83b

omus reviewed Jul 18, 2016
View reviewed changes

Changed Int to Integer

560b7e7

bramtayl added 2 commits July 19, 2016 10:55

Added unicode warning

fc8b15e

Added unicode warning

3bbe382

Update Base.jl

f11aede

bramtayl added 2 commits July 19, 2016 14:40

Revert "Changed Int to Integer"

6f71570

This reverts commit 560b7e7.

Requested revisions

5ce459f

kshyatt added the strings "Strings!" label Aug 9, 2016

StefanKarpinski mentioned this pull request Aug 9, 2016

chop: allow optional number of characters to chop #17922

Closed

StefanKarpinski closed this Aug 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chop optional number of characters #17457

Chop optional number of characters #17457

bramtayl commented Jul 16, 2016

tkelman Jul 16, 2016 •

edited

Loading

omus Jul 18, 2016

bramtayl Jul 18, 2016

bramtayl Jul 18, 2016

ararslan Jul 19, 2016

bramtayl commented Jul 16, 2016

tkelman Jul 16, 2016

bramtayl commented Jul 16, 2016

omus Jul 18, 2016

omus commented Jul 18, 2016

bramtayl commented Jul 18, 2016 •

edited

Loading

simonster commented Jul 18, 2016

bramtayl commented Jul 18, 2016 •

edited

Loading

vtjnash commented Jul 18, 2016

simonster commented Jul 18, 2016

bramtayl commented Jul 18, 2016 •

edited

Loading

tkelman commented Jul 18, 2016

bramtayl commented Jul 18, 2016 •

edited

Loading

stevengj commented Jul 19, 2016

stevengj commented Jul 19, 2016

stevengj commented Jul 19, 2016 •

edited

Loading

bramtayl commented Jul 19, 2016

stevengj commented Jul 19, 2016 •

edited

Loading

bramtayl commented Jul 21, 2016

kshyatt commented Aug 9, 2016

bramtayl commented Aug 9, 2016

StefanKarpinski commented Aug 9, 2016

bramtayl commented Aug 9, 2016

Chop optional number of characters #17457

Chop optional number of characters #17457

Conversation

bramtayl commented Jul 16, 2016

tkelman Jul 16, 2016 • edited Loading

Choose a reason for hiding this comment

omus Jul 18, 2016

Choose a reason for hiding this comment

bramtayl Jul 18, 2016

Choose a reason for hiding this comment

bramtayl Jul 18, 2016

Choose a reason for hiding this comment

ararslan Jul 19, 2016

Choose a reason for hiding this comment

bramtayl commented Jul 16, 2016

tkelman Jul 16, 2016

Choose a reason for hiding this comment

bramtayl commented Jul 16, 2016

omus Jul 18, 2016

Choose a reason for hiding this comment

omus commented Jul 18, 2016

bramtayl commented Jul 18, 2016 • edited Loading

simonster commented Jul 18, 2016

bramtayl commented Jul 18, 2016 • edited Loading

vtjnash commented Jul 18, 2016

simonster commented Jul 18, 2016

bramtayl commented Jul 18, 2016 • edited Loading

tkelman commented Jul 18, 2016

bramtayl commented Jul 18, 2016 • edited Loading

stevengj commented Jul 19, 2016

stevengj commented Jul 19, 2016

stevengj commented Jul 19, 2016 • edited Loading

bramtayl commented Jul 19, 2016

stevengj commented Jul 19, 2016 • edited Loading

bramtayl commented Jul 21, 2016

kshyatt commented Aug 9, 2016

bramtayl commented Aug 9, 2016

StefanKarpinski commented Aug 9, 2016

bramtayl commented Aug 9, 2016

tkelman Jul 16, 2016 •

edited

Loading

bramtayl commented Jul 18, 2016 •

edited

Loading

bramtayl commented Jul 18, 2016 •

edited

Loading

bramtayl commented Jul 18, 2016 •

edited

Loading

bramtayl commented Jul 18, 2016 •

edited

Loading

stevengj commented Jul 19, 2016 •

edited

Loading

stevengj commented Jul 19, 2016 •

edited

Loading