Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent naming of String#chomp after adding String#lchomp #3729

Closed
pkorotkov opened this issue Dec 18, 2016 · 48 comments
Closed

Inconsistent naming of String#chomp after adding String#lchomp #3729

pkorotkov opened this issue Dec 18, 2016 · 48 comments

Comments

@pkorotkov
Copy link

Recent commit 60bf3cda has added two left chomp functions (lchomp) to String's instant methods as counterparts to existing String#chomp(Char), String#chomp(String). Having no issue with the commit implementation, I'd like to point out an inconsistency in method naming: String#lchomp corresponds to String#chomp while, analogously, String#lstrip corresponds to String#rstrip.

@asterite
Copy link
Member

I'd say chomp is more common than lchomp. I don't mind the inconsistency. There's also no argless lchomp, because removing a newline from the beginning of a newline is not very common. I really don't want to change chomp to rchomp and break all code out there. There's also the thing that there's strip, lstrip and rstrip, which all make sense, but having chomp act as lchomp + rchomp has no real usecase. So it's going to be chomp and lchomp.

@exts
Copy link

exts commented Dec 24, 2016

I personally see no reason why the inconsistency exists.

I really don't want to change chomp to rchomp and break all code out there.

@asterite Breaking changes happen, it's been happening for a while now, this really isn't a good reason for keeping an inconsistency this early on.

but having chomp act as lchomp + rchomp has no real usecase

I can find a few, csv file line starts and ends with a empty delimiter

,blah,blah,blah,

This is a simple use case, I've seen many where I'd need to chomp both from the beginning and end of a string.

@asterite
Copy link
Member

OK, what should be the steps to do this? I guess these:

  1. Rename chomp to rchomp
  2. Add chomp that removes from both ends
  3. Rename the optional argument chomp of IO#gets and other methods to rchomp?

Also, do we also get an argless chomp method that removes newlines from both ends of a string? Do we get an argless lchomp that removes a newline from the beginning of the string?

I'm not totally convinced with the above...

Another alternative is to just leave the argless chomp overload around. We also have a String#chop method right now, which:

Returns a new String with the last character removed. If the string ends with \r\n, both characters are removed. Applying chop to an empty string returns an empty string.

This is also present in Ruby. And it doesn't have any args. So maybe we could make chop remove both ends, then have rchop and lchop do what the current chomp and lchomp do.

What should be done here for the API to be perfect and consistent?

@RX14
Copy link
Contributor

RX14 commented Dec 25, 2016

This is what I think is best:

  • Chomp: removes x from string if it starts or ends with it:
    • No args: where x is \r\n or \n
    • 1 arg: specify x, no special case for any characters.
    • lchomp and rchomp variants which only chomp one end.
  • Chop: remove n characters from start and end of the string.
    • no args: n = 1
    • args: set n
    • Provide lchop and rchop which only chop from 1 side.

I think this is the most consistent API I can imagine. It might not be the easiest to implement, or mirror ruby, but I think that in the long run it's probably the best API.

@asterite
Copy link
Member

@RX14 What about IO#gets? Should we rename the optional argument chomp to rchomp?

@RX14
Copy link
Contributor

RX14 commented Dec 26, 2016

@asterite yes, and provide a lchomp option too (just set both for chomp)

@asterite
Copy link
Member

@RX14 What's the use case for IO#gets with lchomp? What is lchomped?

string = gets('d', lchomp: true) # ???

@RX14
Copy link
Contributor

RX14 commented Dec 26, 2016

Hmm, maybe lchomp doesn't make sense. You could specify what to chomp with a named arg, but maybe not.

I think that the proper long term solution would be to make string manipulation cheaper while remaining composable. Being able to compose string operations via a mutable intermediate.

@pkorotkov
Copy link
Author

Why not extend IO#gets definition like that

def gets(lchomp = false, rchomp = true) : String?

@asterite
Copy link
Member

@pkorotkov Because wanting to do lchomp there doesn't make any sense.

@pkorotkov
Copy link
Author

@asterite Call for reopening the issue to let others see this discussion.

@RX14
Copy link
Contributor

RX14 commented Feb 4, 2017

I've yet again come across a need for String#chomp to chomp from both ends while re-implementing https://github.com/atmos/camo in crystal.

I get a HTTP path either like http://example.org/<digest>?url=<image-url> or http://example.org/<digest>/<image-url>. The cleanest way to process this URL is to call request.path.chomp('/').split('/', 2) which cleans up the leading /, any possible trailing / characters and I can now test the length and contents of the array to see which URL matches.

@exts
Copy link

exts commented Feb 4, 2017

Why's @asterite refusing to reopen this?

@spalladino
Copy link
Contributor

We've been discussing this with @asterite. A simple way to fix this would be to rollback the lchomp commit, and just leave chomp with the original functionality, following Ruby's API as is; however, this implies losing some functionality.

On the other hand, having rchomp and lchomp feels like the most consistent way to design the API, as many have mentioned in the thread. What I personally don't like is having chomp to act as lchomp + rchomp, since it is method that goes back to Ruby and Perl, and most people would expect it to behave exactly like rchomp, thus potentially introducing bugs when using it.

That being said, we'd propose:

  • Rename chomp to rchomp, including in arguments to gets.
  • Period. There is no chomp method to avoid confusion, and if you need both, invoke both.

Thoughts?

@spalladino spalladino reopened this Feb 13, 2017
@ysbaddaden
Copy link
Contributor

I prefer to have chomp only: drop trailing LF or CRLF and period.

@bararchy
Copy link
Contributor

bararchy commented Feb 13, 2017

I'm with @ysbaddaden , "\nfoo\n".chomp => "\nfoo"

@RX14
Copy link
Contributor

RX14 commented Feb 13, 2017

@spalladino This is an acceptable solution. I hope that in the future that crystal will grow independent enough from ruby to re-add chomp and make it work from both ends.

@ysbaddaden @bararchy What's your reasoning behind this?

I think string method naming in general is confusing, inconsistent and obsure currently. I hope we can come up with a consistent, descriptive naming scheme for string methods.

@bararchy
Copy link
Contributor

bararchy commented Feb 13, 2017

@RX14 My reasoning is that gets.chomp is the most basic ruby operation, in almost every CLI program you have examples on it's usage and what it does, for me using the same name (chomp) but making it do something other then cutting the right side \n or \r\n is really confusing and unexpected (as a Ruby dev)

It's like we will decide that puts will behave the same way as print (without a \n), it's just messing with basic assumptions

@RX14
Copy link
Contributor

RX14 commented Feb 13, 2017

If the only defence of the current naming is "ruby does it", isn't that a bit telling?

@spalladino
Copy link
Contributor

It's not just that Ruby (and Perl) do it, I googled a bit and found other languages or libs that also have the same behaviour:

  • Java Apache commons: Removes one newline from end of a String if it's there, otherwise leave it alone. A newline is "\n", "\r", or "\r\n".
  • D: If str ends with delimiter, then str is returned without delimiter on its end. If it str does not end with delimiter, then it is returned unchanged.
  • Clojure contrib: Removes all trailing newline \n or return \r characters from string. Note: String.trim() is similar and faster.

Having a string method named exactly as in several other libs that behaves differently is a potential cause for bugs, so I'd either keep chomp with the original behaviour and remove lchomp altogether, or remove chomp and keep only lchomp and rchomp.

@Sija
Copy link
Contributor

Sija commented Feb 14, 2017

@spalladino Please, don't remove lchomp :)

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

Removing functionality to satisfy naming seems wrong, so again I vote for lchomp and rchomp. What about chop. That should be lchop and rchop too.

@bararchy
Copy link
Contributor

bararchy commented Feb 14, 2017

How about aliasing chomp to rchomp, and keep lchomp & rchomp ? this will pretty much satisfy everyone no ?

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

@bararchy Crystal's stdlib will never have method aliases. By doing that, it means that half the devs will pick one naming and half the devs will pick the other. Devs will have to learn both name to read other people's code, and there will doubtless be endless discussion. It's best to pick one solution and stick with it, than compromise and annoy everyone.

@bararchy
Copy link
Contributor

bararchy commented Feb 14, 2017

@RX14 If so then I think keeping chomp is a must, it's a standard both in Ruby and as @spalladino showed in other langs too.

This is also an argument for "Principle of least astonishment"

@exts
Copy link

exts commented Feb 14, 2017

Removing functionality to satisfy naming seems wrong, so again I vote for lchomp and rchomp. What about chop. That should be lchop and rchop too.

@RX14 Same point I brought up in my thread, strip currently has lstrip and rstrip. Make it consistent across similar methods. There's really no point to have the current api naming scheme inconsistent, makes the language look bad. It's actually one of the main reasons why people consistently shit on PHP because their early api functions were inconsistent. I'd rather have this sorted now when we can have serious breaking changes than hit 1.0 and look back and go "well we could have fixed that, but it's too late as it would break tons of production code"

Now I personally don't know what your plans are for backwards compatibility in the future, but fixing this later seems like it would be a bigger hassle than it currently is at the moment.

@asterite
Copy link
Member

Here's another idea. We want to preserve chomp, let's say because it's in Ruby and Perl, but also because we have it as a named argument for IO#gets, String#lines and others. But we want to have an equivalent functionality for the left side of the String, which is what lchomp is, but the name is inconsistent. So, we can:

  • Leave chomp as is (like now)
    • without arguments: removes trailing "\n" or "\r\n"
    • with an argument: removes the suffix if the string ends with it
  • Remove chop (because it currently acts on the right-hand side of a String, so maybe it should be named rchop)
  • Add rchop in two versions:
    • without arguments: remove the last character, unless the string is empty
    • with an argument: removes the suffix if the string ends with it (so it's exactly the same as chomp... yes, this is an alias, but it doesn't bother in my opinion, and it's better than having an inconsistency)
  • Add lchop in two versions:
    • without arguments: remove the first character, unless the string is empty
    • with an argument: removes the prefix if the string starts with it (so this is the current lchomp)
  • Remove lchomp (because the functionality will be provided by lchop)
  • Don't add rchomp (because the functionality will be provided by rchop)

In summary, we'll only have these methods:

  • chomp
  • rchop
  • lchop

There's no inconsistency, although rchop(arg) will work the same as chomp(arg). I think we can leave with that. The reason for preserving chomp(arg) is because one can do io.gets('a', chomp: true) and that would be equivalent of reading a line and calling .chomp('a') on it. We could remove chomp(arg), but then it will be kind of inconsistent.

As for the missing methods I didn't mention:

  • chop without arguments: to remove the first and last character of a string... I don't know what's a use case for that. For example if I want to unqutoe a string like "foo" I can simply call .strip('"').
  • chop with an argument: to remove both the first and last pieces of a string... I don't know what's a use case for that either. Like before strip(arg) seems to cover this. For example .strip('/') would remove all leading and trailing slashes.
  • argless chomp acting on both sides, that would remove a single newline from both the beginning and end of a String: there's no use case for this, because when you read a line from a file, or split a string into lines, it will never have a leading newline.

In any case we could introduce chop later if we really find use cases for it, but both-ways chomp seems to not have a use case.

@refi64
Copy link
Contributor

refi64 commented Feb 14, 2017

...why not just add a direction argument to chomp? e.g. x.chomp(front: true) to chomp from the front instead of the back.

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

@kirbyfan64 That's a very good idea... I think that's a direction we should explore. It simultaneously solves the .lchomp.rchomp problem without an intermediate string, and keeps the changes to the current API minimal. The only problem I see is naming of gets args after the change.

@asterite
Copy link
Member

@RX14 Why would we need to rename gets args?

The only issue I see with x.chomp(front: true) is that it would remove a leading newline... something that for me has no use case. So yes, it will be consistent, but we'll have dead functionality.

Another issue is that it will be kind of inconsistent: chomp API uses front to choose side, while for strip there will be rstrip and lstrip.

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

@asterite It's probably useless, but x.chomp('"', left: true) is not rare. The method to implement newline chomp and character-specific chomp will be separate (they are currently), so you don't have to implement left: true on newline chomp.

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

@asterite You don't have to change gets args, I was thinking we had lchomp on gets but that was wrong and it's useless.

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

Also, we'd probably change strip to use left and right named args too if we went this route.

@exts
Copy link

exts commented Feb 14, 2017

I just don't see a reason not to just add rchomp

  • lchomp
  • rchomp
  • chomp (change: returns string with the first str/char removed from the string)

(update all methods that rely on chomp's old functionality to use rchomp)

no changes (works how you'd expect)

  • lstrip
  • rstrip
  • strip

chop

  • lchop
  • rchop
  • chop (returns string with the first/last character removed)

Now the naming scheme is consistent.

Remove lchomp (because the functionality will be provided by lchop)
Don't add rchomp (because the functionality will be provided by rchop)

No point in doing this if those two methods work completely different from each other

PS:

something that for me has no use case

Just because you don't have a use case for it doesn't mean others do, which have been mentioned in this thread many times that there are use cases for using those methods on the left and both sides of a string.

@asterite
Copy link
Member

asterite commented Feb 14, 2017

@exts The problem is that chomp will now behave different from Ruby and Perl. I understand that we don't want to keep compatibility with Ruby, but since those methods come from those languages, having it behave different is not a good decision.

@exts
Copy link

exts commented Feb 14, 2017

Then what if we introduce a new method if the idea is to keep those methods as similar to ruby/perl as possible? Name it clip and combine chomp and chop's functionality. lclip, clip, rclip. You pass either a str/character to either of the methods it'll act like chomp, but if you don't pass anything to it it acts like chop where it tries to clip the first/last character off a string.

Best of both worlds and we don't have to have any breaking changes and those coming from ruby won't have to relearn how the chop/chomp methods function.

@asterite
Copy link
Member

@exts That's my proposal, except that I use chop instead of clip

@exts
Copy link

exts commented Feb 14, 2017

Alright, i think that works then. Keep chomp the same, change chop.

@bararchy
Copy link
Contributor

Makes sense to me too

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

I think removing the lfoo and rfoo variants and replacing with foo(left: true) and foo(right: true) is the cleanest way.

@asterite
Copy link
Member

So how would strip, lstrip and rstrip translate to this new naming?

@exts
Copy link

exts commented Feb 14, 2017

I think chomp should just be the exception, all future methods should stay rfoo and lfoo

@RX14
Copy link
Contributor

RX14 commented Feb 14, 2017

@asterite

  • lstrip -> strip(left: true)
  • rstrip -> strip(right: true)
  • strip -> strip(left: true, right: true)

But you could alter that to change the default args, for example def strip(*, left = false, right = true).

@zatherz
Copy link
Contributor

zatherz commented Feb 15, 2017

"Have a syntax similar to Ruby (but compatibility with it is not a goal)"

"It came from ruby" doesn't make much sense. Lots of things in Crystal came from other languages, it doesn't mean that it should keep inconsistencies just for the purpose of "it's like this in this language", especially in this stage where backwards compatibility breakage is to be expected. includes? "came from Ruby". File::exists? "came from Ruby".

My opinion is that chomp(left: true, right: true) is incredibly verbose and annoying to type. I'd prefer lchomp, rchomp and chomp.

@RX14
Copy link
Contributor

RX14 commented Feb 15, 2017

@zatherz if you consider that 99% of people who use chomp want to chomp the right side, then you use def chomp(*, left = false, right = true), there's no extra verbosity unless you need to chomp both sides, or only the left side.

@asterite
Copy link
Member

The problem is that to chomp the left side you'll have to write chomp(left: true, right: false) because right is true by default... The same issue arrises if we apply this to strip and others.

@refi64
Copy link
Contributor

refi64 commented Feb 16, 2017

Here's a thought: what about using either symbols or an enum? e.g. chomp(:left)

@RX14
Copy link
Contributor

RX14 commented Feb 18, 2017

I've thought about it some more and I think the accepted solution is probably the best one. It solves my previous complaint of chomp('\n') being a special case, which is a nice bonus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests