Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chomp option to gets, lines, each_line #3704

Merged
merged 1 commit into from
Dec 20, 2016
Merged

Add chomp option to gets, lines, each_line #3704

merged 1 commit into from
Dec 20, 2016

Conversation

asterite
Copy link
Member

This adds a chomp optional argument to IO#gets, IO#each_line, String#lines, String#each_line and generally to every method that reads lines, or by line.

This feature was added recently to Ruby 2.4, and honestly I always wanted to be able to do this, both in Ruby and Crystal, so keeping the way Ruby does it is good (I also imagined the same API before Ruby added it).

There's a slight difference with Ruby, though. The default chomp value in Ruby is always false. This is to preserve backwards compatibility, I guess. In our case breaking backwards compatibility is fine before 1.0. So, the rules are:

  • argless gets, each_line and lines assume chomp = true
  • other gets overloads assume chomp = false

The idea is that argless gets is semantically equivalent to reading a line. So in every case where we read a line, we chomp it if it ends with a newline (or "\r\n"). If we pass an argument to gets, for example gets('a'), we usually want to read up to, and including, that delimiter. Same goes with gets(3): we want to read a String of 3 bytes, please don't chomp it by default.

Note: I wouldn't mind having chomp = true in every case. I actually don't know what use cases exist for gets with a delimiter. If we make it always true by default it's maybe more consistent, and one can always pass chomp = false to disable this behaviour, so please share your thoughts and use cases about this.

In summary:

# No args, chomp is true by default
io = IO::Memory.new("one\ntwo\r\nthree")
io.gets # => "one"
io.gets # => "two"
io.gets # => "three"
io.gets # => nil

# No args, explicit chomp is false
io = IO::Memory.new("one\ntwo\r\nthree")
io.gets(chomp: false) # => "one\n"
io.gets(chomp: false) # => "two\r\n"
io.gets(chomp: false) # => "three"
io.gets(chomp: false) # => nil

# Passing a parameter will not implicitly chomp
io = IO::Memory.new("some long word")
io.gets('e') # => "some"
io.gets('g', chomp: true) # => "lon"

Of course one can always do gets.try &.chomp. The difference is that this will allocate two strings: one that is read, and one chomped. So chomp: true can also improve performance a bit if we don't care about the ending newline (and I'd say in most cases we don't).

As a side story, I was curious about what happens in these cases in Ruby 2.4:

"".lines
"\n".lines
"\r\n".lines

The first one returns an empty array. The second and third ones return an array with a single empty string. Well, almost, the second case actually triggers a segmentation fault, which I just reported. I just wanted to share this little story because it's a good change of air, from receiving segmentation faults and bug reports to Crystal, to be able to actually report one in another project (and help it become better, of course) 😸

if bytesize > 0 && buffer[bytesize - 1] == byte
back(1)

if byte === '\n' && bytesize > 0 && buffer[bytesize - 1] === '\r'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a hack to me, why not 2 methods with one being argless for \r\n?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least Ruby always removes \r when chomping \n or the argless version:

$ irb
irb(main):001:0> "foo\r\n".chomp("\n")
=> "foo"
irb(main):002:0> "foo\r\n".chomp
=> "foo"

So I think it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asterite I think that's confusing behaviour personally, and I can imagine cases where you would want to chomp \n and not \r\n. If you want to chomp both \n and \r\n why not just use chomp with no args? Copying ruby here seems ugly.

Copy link
Contributor

@chocolateboy chocolateboy Dec 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @RX14. Ruby's behaviour in that first example is a bug IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it's OK. I'll merge this, please send a PR to fix this if you find a way.

The problem is, when you do gets, this is equivalent to doing gets('\n'), so passing no delimiter is equivalent to passing \n as a delimiter. If \n is the delimiter, and the line ends with \r\n, then \r\n is also removed. There's currently no way to distinguish between "no delimiter is specified" and "\n was specified as a delimiter". To do this, we'd have to change the general gets(delimiter, limit, chomp) method to be something like gets(delimiter, limit, chomp, argless), or do a separate implementation for the argless case in several IO types.

I personally don't find a use case where you'd want to remove \n but keep the \r, but if you do find such case, send a PR with all the necessary changes.

# Moves the write pointer, and the resulting string bytesize,
# by the given amount
def back(amount : Int)
unless 0 <= amount < @bytesize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't amount == @bytesize be allowed, to reset the builder to the beginning (@bytesize == 0)?

@chocolateboy
Copy link
Contributor

chocolateboy commented Dec 19, 2016

Great to see this!

I'd implemented a version of this myself (as IO::Buffered#get_chomped_line and IO::Buffered#each_chomped_line) and was thinking of PR-ing it as an option when I noticed Ruby 2.4 had added it. Very grateful that I've been beaten to it :-)

Re: setting chomp: true everywhere by default: that would be a firm 👍 from me. I mainly use Crystal for command-line tools. I've never needed to preserve the exact line ending (in any language AFAICR), and the only situation where I can see chomp: false being useful is for a tool like dos2unix that might need to care about the difference.

@asterite
Copy link
Member Author

I'll merge this. Before 1.0 we can tweak the default chomp value for the different overloads if we find it necessary.

@asterite asterite added this to the 0.20.2 milestone Dec 20, 2016
@asterite asterite merged commit 1bac47c into master Dec 20, 2016
@asterite asterite deleted the feature/chomp branch December 20, 2016 11:07
@Sija
Copy link
Contributor

Sija commented Dec 20, 2016

@asterite this PR broke docs rendering :/ I've found it out by doing git bisect and this is the first commit which introduced buggy behaviour.

@asterite
Copy link
Member Author

@Sija I'm doing bin/crystal tool format --check and it works. That is with a compiled compiler.

@Sija
Copy link
Contributor

Sija commented Dec 20, 2016

I'm getting garbled doc comments after doing make clean && make && make doc, see below:

image

@asterite
Copy link
Member Author

@Sija Fixed! (I think ^_^)

@Sija
Copy link
Contributor

Sija commented Dec 20, 2016

@asterite yup, works for me :)

makenowjust added a commit to makenowjust/crystal that referenced this pull request Dec 26, 2016
makenowjust added a commit to makenowjust/crystal that referenced this pull request Dec 27, 2016
vonKingsley added a commit to vonKingsley/crass that referenced this pull request Jan 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants