Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UUID.parse? #11998

Conversation

jgaskins
Copy link
Contributor

Currently, there was only UUID.new(String) to parse a UUID from a string, but it raised an exception on failure. This PR adds another implementation that returns nil instead, letting the caller use the type system to require them to handle the failure because it returns more than just a UUID.

Fixes #11996

Previously, there was only `UUID.new(String)` to parse a UUID from a
string, but it raised an exception on failure. This commit adds another
implementation that returns `nil` instead, requiring the caller to
handle the failure because it returns more than just a `UUID`.
src/uuid.cr Outdated
@@ -107,16 +107,59 @@ struct UUID
new(bytes, variant, version)
end

def self.parse?(value : String, variant = nil, version = nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to create an error yielding method and use that in .parse? and .new instead of duplicating the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sija It was the first thing I tried and it was, in fact, worse. The block passed in from parse? was nice because it was empty, but that wasn’t true for the block passed from .new.

Every single failure mode (that is, every single exception that can be raised in .new and .hex_pair_at in the current version) had to be accounted for in the block passed in from .new (and then passed into hex_pair_at) as well as the method actually performing the parsing. So the block had to perform all the same checks a second time to figure out which failure mode had been tripped, which not only duplicates the code anyway, but also is slower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow. I meant to create a parse method which would return the error instead of raising, so the caller could decide whether he want to raise or ignore it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example of the behavior you’re talking about? Because you said an “error yielding method” above, which I understood to be something like Hash#fetch where the block is yielded on a miss. And what you’re saying here sounds like something else. A code example of what you mean might help clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. My first post was misguided, I meant a method returning error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so it you're envisioning a method that returns kinda like how you'd do it in Go, but with a Crystal-style flavor?

private def parse_string(value : String, variant = nil, version = nil) : UUID | ArgumentError
  def self.new(value : String, variant = nil, version = nil)
    bytes = uninitialized UInt8[16]
    case value.size
    when 36 # Hyphenated
      {8, 13, 18, 23}.each do |offset|
        if value[offset] != '-'
          return ArgumentError.new "Invalid UUID string format, expected hyphen at char #{offset}"
        end
      end
      {0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34}.each_with_index do |offset, i|
        if hex = hex_pair_at? value, offset
           bytes[i] = hex
         else
           return ArgumentError.new("Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'")
         end
      end
    when 32 # Hexstring
      16.times do |i|
        bytes[i] = hex_pair_at value, i * 2
      end
    when 45 # URN
      return ArgumentError.new "Invalid URN UUID format, expected string starting with \"urn:uuid:\"" unless value.starts_with? "urn:uuid:"
      {9, 11, 13, 15, 18, 20, 23, 25, 28, 30, 33, 35, 37, 39, 41, 43}.each_with_index do |offset, i|
        if hex = hex_pair_at? value, offset
           bytes[i] = hex
         else
           return ArgumentError.new("Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'")
         end
      end
    else
      return ArgumentError.new "Invalid string length #{value.size} for UUID, expected 32 (hexstring), 36 (hyphenated) or 45 (urn)"
    end
    new(bytes, variant, version)
  end
end

And then the other methods would be something like this?

def self.new(value, variant = nil, version = nil) : UUID
  case result = parse_string(value, variant, version)
  when UUID
    result
  when Exception
    raise result
  end
end

def self.parse?(value, variant = nil, version = nil) : UUID?
  if (uuid = parse_string(value, variant, version)).is_a? UUID
    uuid
  end
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, that was the idea :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return error strategy still allocates an exception instance and builds the error messages- which is just useless if it's going to be thrown out anyway.
I think a better option to avoid duplication would be an internal method with a parameter that determines whether to raise or not.
Not conviced that's really necessary here. A bit duplication isn't bad. And the duplicated code is quite simple. I'm happy to accept the current implementation.

Copy link
Member

@straight-shoota straight-shoota Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could be an option to separate input verification from the actual parsing. Then we'd only duplicate the former and share the latter.

Simplified example:

def self.parse?(value)
  return nil if value.size == 45 && !value.starts_with? "urn:uuid:"

  parse_internal(value) { return nil }
end
def self.new(value)
  raise ArgumentError.new if value.size == 45 && !value.starts_with? "urn:uuid:"

  parse_internal(value) { |pos| raise ArgumentError.new "Invalid hex character at #{pos}" }
end
private def self.parse_internal(value)
  # ...
  parse_hex_pair?(value, i) || yield i
end

If you like playing around and bit with finding a better way to implement this, go ahead. But you don't need to. I believe we can merge it as is and I'm not sure we would even be able to find some significantly better way to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO throwing out one allocation doesn't sound that bad in light of having a cleaner impl. and giving the user precise errors in case of failure.

src/uuid.cr Outdated Show resolved Hide resolved
Copy link
Member

@asterite asterite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't duplicate that much code

# Raises `ArgumentError` if string `value` at index `i` doesn't contain hex
# digit followed by another hex digit.
private def self.hex_pair_at(value : String, i) : UInt8
hex_pair_at?(value, i) || raise ArgumentError.new "Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unreadable now. You could change it at least to sth like below:

Suggested change
hex_pair_at?(value, i) || raise ArgumentError.new "Invalid hex character at position #{i * 2} or #{i * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'"
hex_pair_at?(value, i) || raise ArgumentError.new(
"Invalid hex character at position #{i * 2} or #{i * 2 + 1}, " \
"expected '0' to '9', 'a' to 'f' or 'A' to 'F'"
)

@asterite
Copy link
Member

asterite commented Apr 15, 2022

Feel free to use this patch to avoid duplicating code:

diff --git a/src/uuid.cr b/src/uuid.cr
index ccc2ca560..177fbf92b 100644
--- a/src/uuid.cr
+++ b/src/uuid.cr
@@ -32,6 +32,11 @@ struct UUID
     V5 = 5
   end
 
+  private record ExpectedHyphen, offset : Int32
+  private record ExpectedUrnUUIDStart
+  private record InvalidStringLength
+  private record InvalidHexChar, offset : Int32
+
   @bytes : StaticArray(UInt8, 16)
 
   # Generates UUID from *bytes*, applying *version* and *variant* to the UUID if
@@ -79,44 +84,71 @@ struct UUID
   # hexstring (ie. `89370a4ab66440c8add39e06f2bb6af6`) or URN (ie. `urn:uuid:3f9eaf9e-cdb0-45cc-8ecb-0e5b2bfb0c20`)
   # format.
   def self.new(value : String, variant = nil, version = nil)
+    result = parse_internal(value, variant, version)
+    case result
+    in ExpectedHyphen
+      raise ArgumentError.new "Invalid UUID string format, expected hyphen at char #{result.offset}"
+    in ExpectedUrnUUIDStart
+      raise ArgumentError.new "Invalid URN UUID format, expected string starting with \"urn:uuid:\""
+    in InvalidStringLength
+      raise ArgumentError.new "Invalid string length #{value.size} for UUID, expected 32 (hexstring), 36 (hyphenated) or 45 (urn)"
+    in InvalidHexChar
+      raise ArgumentError.new "Invalid hex character at position #{result.offset * 2} or #{result.offset * 2 + 1}, expected '0' to '9', 'a' to 'f' or 'A' to 'F'"
+    in UUID
+      return result
+    end
+  end
+
+  def self.parse?(value : String, variant = nil, version = nil) : UUID?
+    parse_internal(value, variant, version).as?(UUID)
+  end
+
+  private def self.parse_internal(value : String, variant, version)
     bytes = uninitialized UInt8[16]
 
     case value.size
     when 36 # Hyphenated
       {8, 13, 18, 23}.each do |offset|
         if value[offset] != '-'
-          raise ArgumentError.new "Invalid UUID string format, expected hyphen at char #{offset}"
+          return ExpectedHyphen.new(offset)
         end
       end
       {0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34}.each_with_index do |offset, i|
-        bytes[i] = hex_pair_at value, offset
+        byte = hex_pair_at? value, offset
+        return InvalidHexChar.new(offset) unless byte
+
+        bytes[i] = byte
       end
     when 32 # Hexstring
       16.times do |i|
-        bytes[i] = hex_pair_at value, i * 2
+        byte = hex_pair_at? value, i * 2
+        return InvalidHexChar.new(i * 2) unless byte
+
+        bytes[i] = byte
       end
     when 45 # URN
-      raise ArgumentError.new "Invalid URN UUID format, expected string starting with \"urn:uuid:\"" unless value.starts_with? "urn:uuid:"
+      return ExpectedUrnUUIDStart.new unless value.starts_with? "urn:uuid:"
+
       {9, 11, 13, 15, 18, 20, 23, 25, 28, 30, 33, 35, 37, 39, 41, 43}.each_with_index do |offset, i|
-        bytes[i] = hex_pair_at value, offset
+        byte = hex_pair_at? value, offset
+        return InvalidHexChar.new(offset) unless byte
+
+        bytes[i] = byte
       end
     else
-      raise ArgumentError.new "Invalid string length #{value.size} for UUID, expected 32 (hexstring), 36 (hyphenated) or 45 (urn)"
+      return InvalidStringLength.new
     end
 
     new(bytes, variant, version)
   end
 
-  # Raises `ArgumentError` if string `value` at index `i` doesn't contain hex
+  # Returns `nil` if string `value` at index `i` doesn't contain hex
   # digit followed by another hex digit.
-  private def self.hex_pair_at(value : String, i) : UInt8
+  private def self.hex_pair_at?(value : String, i) : UInt8?
     if (ch1 = value[i].to_u8?(16)) && (ch2 = value[i + 1].to_u8?(16))
       ch1 * 16 + ch2
     else
-      raise ArgumentError.new [
-        "Invalid hex character at position #{i * 2} or #{i * 2 + 1}",
-        "expected '0' to '9', 'a' to 'f' or 'A' to 'F'",
-      ].join(", ")
+      nil
     end
   end

@jgaskins
Copy link
Contributor Author

@asterite I appreciate you providing a patch, but it can't be applied on this branch.

But also, does a bit of duplication in a struct with very little churn (only one meaningful update since 2019) warrant hard-blocking the PR? I like to refactor things, too, but I spent an entire evening experimenting with various ways to get it to its current state. The duplication is what I arrived at after that exploration.

Asking with this forum thread in mind.

@asterite
Copy link
Member

My thoughts:

  • I didn't see a lot of change here in after that forum post. I think some PRs were pushed some extra commits, but that's it.
  • I'm fine with a little duplication, but here it's duplicating a very complex method
  • Looking at the code I already found a few things to improve (for example it would be simpler to assume the String is ASCII because UUIDs can only be ASCII) and with that in mind we would need to make those changes to two places now

But, it's fine. I guess someone else can also add the return type restriction and the docs for this method, right?

@asterite asterite dismissed their stale review April 16, 2022 09:42

This is not important

@jgaskins
Copy link
Contributor Author

the return type restriction and the docs for this method

I completely forgot about these. I'll go ahead and add them.

src/uuid.cr Outdated
Comment on lines 78 to 80
# Creates new UUID by decoding `value` string from hyphenated (ie. `ba714f86-cac6-42c7-8956-bcf5105e1b81`),
# hexstring (ie. `89370a4ab66440c8add39e06f2bb6af6`) or URN (ie. `urn:uuid:3f9eaf9e-cdb0-45cc-8ecb-0e5b2bfb0c20`)
# format.
# Creates new UUID by decoding `value` string from hyphenated (ie `ba714f86-cac6-42c7-8956-bcf5105e1b81`),
# hexstring (ie `89370a4ab66440c8add39e06f2bb6af6`) or URN (ie `urn:uuid:3f9eaf9e-cdb0-45cc-8ecb-0e5b2bfb0c20`)
# format, raising an `ArgumentError` if the string fors not match any of these formats.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this distinction and removed the dots because they truncated the document string in the method index section in an unexpected spot:

Truncated document string for UUID.new says: Creates new UUID by decoding value string from hyphenated (ie.

@oprypin oprypin requested review from oprypin and removed request for oprypin April 18, 2022 08:48
@oprypin
Copy link
Member

oprypin commented Apr 20, 2022

I was thinking I could make some refactoring suggestion, but everything has been said and tried already. It is possible to deduplicate but it's not worth it.

I really dislike the fact that non-raising alternatives are necessary, but seems that they are.


But this is a great case study for a possible language feature that would under the hood automatically create a method body such as this parse? (for efficiency's sake) just from the fact that someone wrote

begin; UUID.new(foo); rescue ArgumentError; nil; end

For any method inside the guarded code block that raises ArgumentError, the language would create a shadow method that replaces all raise ArgumentError<...> with return Sentinel. And the call site would be rewritten to expect a Sentinel return value as the implementation of this rescue. Then all methods that this method calls which also can raise ArgumentError would need to be rewritten in the same way, recursively.

I should create an RFC for this, but this is way too magical and unlikely to go anywhere, so just wanted to share my thought here anyway.


All this said, I have no objections for this PR.

@straight-shoota straight-shoota added this to the 1.5.0 milestone Apr 27, 2022
@straight-shoota straight-shoota merged commit 453ed77 into crystal-lang:master Apr 28, 2022
@straight-shoota straight-shoota changed the title UUID: add non-raising method to parse from String Add UUID.parse? Apr 28, 2022
@jgaskins jgaskins deleted the jgaskins/uuid-parsing-returns-nil-on-failure branch June 24, 2022 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UUID: add non-raising parse method from String
6 participants