Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle File.extname edge case #6234

Merged
merged 7 commits into from
Jun 25, 2018
Merged

Conversation

bcardiff
Copy link
Member

Fix #6215. Isn't it nice how many ways it can be written?

#6216 and #6217 were closed before merging.

This PR took the approach of reverse iterating the string until . or separator is found and making all the decisions from there. Similar to what @straight-shoota did in #5635 as Path#extension but with a more linear flow IMO.

@bcardiff bcardiff added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib labels Jun 21, 2018
@bcardiff bcardiff added this to the 0.25.1 milestone Jun 21, 2018
src/file.cr Outdated
return "" unless reader.has_previous?

# otherwise we are not at the beginning, and there is a previous char.
# if current is '/', then the patern is prefix/foo and has no extension
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pattern

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is a mistake in the last line.

src/file.cr Outdated
# we are not at the beginning,
# the previous char is not a '/',
# and we have an extension
return filename[reader.pos + 1, filename.size - 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to use filename.byte_slice because reader.pos is byte index. Also, it's faster than String#[].
By the way, the semantics of the current method call are wrong: it's String#[start, count], not String#[start, end].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess just calling filename.byte_slice(reader.pos + 1) would be enough, because the extension always ends at the end of the string.

@bcardiff
Copy link
Member Author

Thanks for the extra pair of 👀. PR updated.

@RX14
Copy link
Contributor

RX14 commented Jun 22, 2018

start_index = filename.rindex(SEPARATOR) || 0
dot_index = filename.rindex('.')

# If the dot wasn't found in `basename(filename)`, ignore it
dot_index = nil if start_index > dot_index

# If there was no separator, return an empty string
return "" unless dot_index

# If the dot was at the start or end of the basename, ignore it
# Examples: "foo.", ".gitignore"
return "" if dot_index == start_index + 1 || dot_index == filename.size - 1

filename[dot_index..-1]

This is the most readable way I could write it. Not the most performant, but hey. Isn't it fun to bikeshed?

Copy link
Contributor

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from those nitpicks, this is the best version for the stdlib (that keeps current semantics, I question the requirement to handle all these edge cases)

src/file.cr Outdated

# position the reader at the last . or SEPARATOR
while (current_char = reader.current_char) &&
(current_char != SEPARATOR && current_char != '.') &&
Copy link
Contributor

@RX14 RX14 Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use reader.current_char != SEPARATOR && reader.current_char != '.' here? This current code makes me think that reader.current_char returning nil is an exit condition for the loop - it's not and reader.current_char cannot return nil.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Char::Reader is not needed at all for this. You can simply use string.to_slice and go backwards from there, byte per byte. All of '.', / and \ fit in a single byte in UTF-8 so they can't be mistaken for something else. And going byte per byte is both faster and simpler (no need to decode UTF-8).

This is also how it's done in Go.

src/file.cr Outdated
# we are not at the beginning,
# the previous char is not a '/',
# and we have an extension
return filename.byte_slice(reader.pos + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary return.

src/file.cr Outdated
@@ -282,35 +282,37 @@ class File < IO::FileDescriptor
def self.extname(filename) : String
filename.check_no_null_byte

reader = Char::Reader.new(at_end: filename)
bytes = filename.to_slice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return "" if bytes.empty?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and add a test for File.extname(""), which should be raising now)

@RX14 RX14 merged commit 9d99700 into crystal-lang:master Jun 25, 2018
@bcardiff bcardiff deleted the fix/6215-file-extname branch June 26, 2018 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to get extension with File.extname of filenames with length=1
5 participants