-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle File.extname edge case #6234
Conversation
src/file.cr
Outdated
return "" unless reader.has_previous? | ||
|
||
# otherwise we are not at the beginning, and there is a previous char. | ||
# if current is '/', then the patern is prefix/foo and has no extension |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there is a mistake in the last line.
src/file.cr
Outdated
# we are not at the beginning, | ||
# the previous char is not a '/', | ||
# and we have an extension | ||
return filename[reader.pos + 1, filename.size - 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to use filename.byte_slice
because reader.pos
is byte index. Also, it's faster than String#[]
.
By the way, the semantics of the current method call are wrong: it's String#[start, count]
, not String#[start, end]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess just calling filename.byte_slice(reader.pos + 1)
would be enough, because the extension always ends at the end of the string.
Thanks for the extra pair of 👀. PR updated. |
start_index = filename.rindex(SEPARATOR) || 0
dot_index = filename.rindex('.')
# If the dot wasn't found in `basename(filename)`, ignore it
dot_index = nil if start_index > dot_index
# If there was no separator, return an empty string
return "" unless dot_index
# If the dot was at the start or end of the basename, ignore it
# Examples: "foo.", ".gitignore"
return "" if dot_index == start_index + 1 || dot_index == filename.size - 1
filename[dot_index..-1] This is the most readable way I could write it. Not the most performant, but hey. Isn't it fun to bikeshed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from those nitpicks, this is the best version for the stdlib (that keeps current semantics, I question the requirement to handle all these edge cases)
src/file.cr
Outdated
|
||
# position the reader at the last . or SEPARATOR | ||
while (current_char = reader.current_char) && | ||
(current_char != SEPARATOR && current_char != '.') && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use reader.current_char != SEPARATOR && reader.current_char != '.'
here? This current code makes me think that reader.current_char
returning nil is an exit condition for the loop - it's not and reader.current_char
cannot return nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Char::Reader
is not needed at all for this. You can simply use string.to_slice
and go backwards from there, byte per byte. All of '.', /
and \
fit in a single byte in UTF-8 so they can't be mistaken for something else. And going byte per byte is both faster and simpler (no need to decode UTF-8).
This is also how it's done in Go.
src/file.cr
Outdated
# we are not at the beginning, | ||
# the previous char is not a '/', | ||
# and we have an extension | ||
return filename.byte_slice(reader.pos + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unnecessary return
.
src/file.cr
Outdated
@@ -282,35 +282,37 @@ class File < IO::FileDescriptor | |||
def self.extname(filename) : String | |||
filename.check_no_null_byte | |||
|
|||
reader = Char::Reader.new(at_end: filename) | |||
bytes = filename.to_slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return "" if bytes.empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and add a test for File.extname("")
, which should be raising now)
Fix #6215. Isn't it nice how many ways it can be written?
#6216 and #6217 were closed before merging.
This PR took the approach of reverse iterating the string until . or separator is found and making all the decisions from there. Similar to what @straight-shoota did in #5635 as Path#extension but with a more linear flow IMO.