-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UUID.parse?
#11998
Add UUID.parse?
#11998
Conversation
Previously, there was only `UUID.new(String)` to parse a UUID from a string, but it raised an exception on failure. This commit adds another implementation that returns `nil` instead, requiring the caller to handle the failure because it returns more than just a `UUID`.
src/uuid.cr
Outdated
@@ -107,16 +107,59 @@ struct UUID | |||
new(bytes, variant, version) | |||
end | |||
|
|||
def self.parse?(value : String, variant = nil, version = nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to create an error yielding method and use that in .parse?
and .new
instead of duplicating the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Sija It was the first thing I tried and it was, in fact, worse. The block passed in from parse?
was nice because it was empty, but that wasn’t true for the block passed from .new
.
Every single failure mode (that is, every single exception that can be raised in .new
and .hex_pair_at
in the current version) had to be accounted for in the block passed in from .new
(and then passed into hex_pair_at
) as well as the method actually performing the parsing. So the block had to perform all the same checks a second time to figure out which failure mode had been tripped, which not only duplicates the code anyway, but also is slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow. I meant to create a parse method which would return the error instead of raising, so the caller could decide whether he want to raise or ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example of the behavior you’re talking about? Because you said an “error yielding method” above, which I understood to be something like Hash#fetch
where the block is yielded on a miss. And what you’re saying here sounds like something else. A code example of what you mean might help clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. My first post was misguided, I meant a method returning error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, so it you're envisioning a method that returns kinda like how you'd do it in Go, but with a Crystal-style flavor?
private def parse_string(value : String, variant = nil, version = nil) : UUID | ArgumentError
def self.new(value : String, variant = nil, version = nil)
bytes = uninitialized UInt8[16]
case value.size
when 36 # Hyphenated
{8, 13, 18, 23}.each do |offset|
if value[offset] != '-'
return ArgumentError.new "Invalid UUID string format, expected hyphen at char #{offset}"
end
end
{0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34}.each_with_index do |offset, i|
if hex = hex_pair_at? value, offset
bytes[i] = hex
else
return ArgumentError.new("Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'")
end
end
when 32 # Hexstring
16.times do |i|
bytes[i] = hex_pair_at value, i * 2
end
when 45 # URN
return ArgumentError.new "Invalid URN UUID format, expected string starting with \"urn:uuid:\"" unless value.starts_with? "urn:uuid:"
{9, 11, 13, 15, 18, 20, 23, 25, 28, 30, 33, 35, 37, 39, 41, 43}.each_with_index do |offset, i|
if hex = hex_pair_at? value, offset
bytes[i] = hex
else
return ArgumentError.new("Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'")
end
end
else
return ArgumentError.new "Invalid string length #{value.size} for UUID, expected 32 (hexstring), 36 (hyphenated) or 45 (urn)"
end
new(bytes, variant, version)
end
end
And then the other methods would be something like this?
def self.new(value, variant = nil, version = nil) : UUID
case result = parse_string(value, variant, version)
when UUID
result
when Exception
raise result
end
end
def self.parse?(value, variant = nil, version = nil) : UUID?
if (uuid = parse_string(value, variant, version)).is_a? UUID
uuid
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, that was the idea :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return error strategy still allocates an exception instance and builds the error messages- which is just useless if it's going to be thrown out anyway.
I think a better option to avoid duplication would be an internal method with a parameter that determines whether to raise or not.
Not conviced that's really necessary here. A bit duplication isn't bad. And the duplicated code is quite simple. I'm happy to accept the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be an option to separate input verification from the actual parsing. Then we'd only duplicate the former and share the latter.
Simplified example:
def self.parse?(value)
return nil if value.size == 45 && !value.starts_with? "urn:uuid:"
parse_internal(value) { return nil }
end
def self.new(value)
raise ArgumentError.new if value.size == 45 && !value.starts_with? "urn:uuid:"
parse_internal(value) { |pos| raise ArgumentError.new "Invalid hex character at #{pos}" }
end
private def self.parse_internal(value)
# ...
parse_hex_pair?(value, i) || yield i
end
If you like playing around and bit with finding a better way to implement this, go ahead. But you don't need to. I believe we can merge it as is and I'm not sure we would even be able to find some significantly better way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO throwing out one allocation doesn't sound that bad in light of having a cleaner impl. and giving the user precise errors in case of failure.
Co-authored-by: Johannes Müller <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't duplicate that much code
# Raises `ArgumentError` if string `value` at index `i` doesn't contain hex | ||
# digit followed by another hex digit. | ||
private def self.hex_pair_at(value : String, i) : UInt8 | ||
hex_pair_at?(value, i) || raise ArgumentError.new "Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unreadable now. You could change it at least to sth like below:
hex_pair_at?(value, i) || raise ArgumentError.new "Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'" | |
hex_pair_at?(value, i) || raise ArgumentError.new( | |
"Invalid hex character at position #{i * 2} or #{i * 2 + 1}, " \ | |
"expected '0' to '9', 'a' to 'f' or 'A' to 'F'" | |
) |
Feel free to use this patch to avoid duplicating code: diff --git a/src/uuid.cr b/src/uuid.cr
index ccc2ca560..177fbf92b 100644
--- a/src/uuid.cr
+++ b/src/uuid.cr
@@ -32,6 +32,11 @@ struct UUID
V5 = 5
end
+ private record ExpectedHyphen, offset : Int32
+ private record ExpectedUrnUUIDStart
+ private record InvalidStringLength
+ private record InvalidHexChar, offset : Int32
+
@bytes : StaticArray(UInt8, 16)
# Generates UUID from *bytes*, applying *version* and *variant* to the UUID if
@@ -79,44 +84,71 @@ struct UUID
# hexstring (ie. `89370a4ab66440c8add39e06f2bb6af6`) or URN (ie. `urn:uuid:3f9eaf9e-cdb0-45cc-8ecb-0e5b2bfb0c20`)
# format.
def self.new(value : String, variant = nil, version = nil)
+ result = parse_internal(value, variant, version)
+ case result
+ in ExpectedHyphen
+ raise ArgumentError.new "Invalid UUID string format, expected hyphen at char #{result.offset}"
+ in ExpectedUrnUUIDStart
+ raise ArgumentError.new "Invalid URN UUID format, expected string starting with \"urn:uuid:\""
+ in InvalidStringLength
+ raise ArgumentError.new "Invalid string length #{value.size} for UUID, expected 32 (hexstring), 36 (hyphenated) or 45 (urn)"
+ in InvalidHexChar
+ raise ArgumentError.new "Invalid hex character at position #{result.offset * 2} or #{result.offset * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'"
+ in UUID
+ return result
+ end
+ end
+
+ def self.parse?(value : String, variant = nil, version = nil) : UUID?
+ parse_internal(value, variant, version).as?(UUID)
+ end
+
+ private def self.parse_internal(value : String, variant, version)
bytes = uninitialized UInt8[16]
case value.size
when 36 # Hyphenated
{8, 13, 18, 23}.each do |offset|
if value[offset] != '-'
- raise ArgumentError.new "Invalid UUID string format, expected hyphen at char #{offset}"
+ return ExpectedHyphen.new(offset)
end
end
{0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34}.each_with_index do |offset, i|
- bytes[i] = hex_pair_at value, offset
+ byte = hex_pair_at? value, offset
+ return InvalidHexChar.new(offset) unless byte
+
+ bytes[i] = byte
end
when 32 # Hexstring
16.times do |i|
- bytes[i] = hex_pair_at value, i * 2
+ byte = hex_pair_at? value, i * 2
+ return InvalidHexChar.new(i * 2) unless byte
+
+ bytes[i] = byte
end
when 45 # URN
- raise ArgumentError.new "Invalid URN UUID format, expected string starting with \"urn:uuid:\"" unless value.starts_with? "urn:uuid:"
+ return ExpectedUrnUUIDStart.new unless value.starts_with? "urn:uuid:"
+
{9, 11, 13, 15, 18, 20, 23, 25, 28, 30, 33, 35, 37, 39, 41, 43}.each_with_index do |offset, i|
- bytes[i] = hex_pair_at value, offset
+ byte = hex_pair_at? value, offset
+ return InvalidHexChar.new(offset) unless byte
+
+ bytes[i] = byte
end
else
- raise ArgumentError.new "Invalid string length #{value.size} for UUID, expected 32 (hexstring), 36 (hyphenated) or 45 (urn)"
+ return InvalidStringLength.new
end
new(bytes, variant, version)
end
- # Raises `ArgumentError` if string `value` at index `i` doesn't contain hex
+ # Returns `nil` if string `value` at index `i` doesn't contain hex
# digit followed by another hex digit.
- private def self.hex_pair_at(value : String, i) : UInt8
+ private def self.hex_pair_at?(value : String, i) : UInt8?
if (ch1 = value[i].to_u8?(16)) && (ch2 = value[i + 1].to_u8?(16))
ch1 * 16 + ch2
else
- raise ArgumentError.new [
- "Invalid hex character at position #{i * 2} or #{i * 2 + 1}",
- "expected '0' to '9', 'a' to 'f' or 'A' to 'F'",
- ].join(", ")
+ nil
end
end
|
@asterite I appreciate you providing a patch, but it can't be applied on this branch. But also, does a bit of duplication in a struct with very little churn (only one meaningful update since 2019) warrant hard-blocking the PR? I like to refactor things, too, but I spent an entire evening experimenting with various ways to get it to its current state. The duplication is what I arrived at after that exploration. Asking with this forum thread in mind. |
My thoughts:
But, it's fine. I guess someone else can also add the return type restriction and the docs for this method, right? |
I completely forgot about these. I'll go ahead and add them. |
src/uuid.cr
Outdated
# Creates new UUID by decoding `value` string from hyphenated (ie. `ba714f86-cac6-42c7-8956-bcf5105e1b81`), | ||
# hexstring (ie. `89370a4ab66440c8add39e06f2bb6af6`) or URN (ie. `urn:uuid:3f9eaf9e-cdb0-45cc-8ecb-0e5b2bfb0c20`) | ||
# format. | ||
# Creates new UUID by decoding `value` string from hyphenated (ie `ba714f86-cac6-42c7-8956-bcf5105e1b81`), | ||
# hexstring (ie `89370a4ab66440c8add39e06f2bb6af6`) or URN (ie `urn:uuid:3f9eaf9e-cdb0-45cc-8ecb-0e5b2bfb0c20`) | ||
# format, raising an `ArgumentError` if the string fors not match any of these formats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking I could make some refactoring suggestion, but everything has been said and tried already. It is possible to deduplicate but it's not worth it. I really dislike the fact that non-raising alternatives are necessary, but seems that they are. But this is a great case study for a possible language feature that would under the hood automatically create a method body such as this begin; UUID.new(foo); rescue ArgumentError; nil; end For any method inside the guarded code block that raises I should create an RFC for this, but this is way too magical and unlikely to go anywhere, so just wanted to share my thought here anyway. All this said, I have no objections for this PR. |
UUID.parse?
Currently, there was only
UUID.new(String)
to parse a UUID from a string, but it raised an exception on failure. This PR adds another implementation that returnsnil
instead, letting the caller use the type system to require them to handle the failure because it returns more than just aUUID
.Fixes #11996