UUID: add non-raising parse method from String #11996

jgaskins · 2022-04-13T03:45:29Z

Feature Request

Is your feature request related to a problem? Please describe clearly and concisely what is it.

When parsing a UUID from a string, there appears to be no way to avoid raising an exception if that string is not a UUID. For example, if I make a request to GET /posts/f5b953f4-9811-4d5e-a674-e5c8b12f1bb8 I might do something like this to fetch the post:

if id_param = params["id"]?
  post = PostQuery.new.find_by_id(UUID.new(id_param))
end

But if that path segment is not in a UUID format (for example, if someone requested /posts/asdf), an exception is raised and I don't see a way to avoid that. In my web service, I either have to know to rescue the exception or I end up serving a 500 response and potentially causing cascading failures.

Describe the feature you would like, optionally illustrated by examples, and how it will solve the above problem.

It'd be awesome if there were a method that returned nil if the string is not in a UUID format that let me avoid a runtime exception by requiring that I handle the nil case. For example:

if (id_param = params["id"]?) && (id = UUID.parse(id_param))
  post = PostQuery.new.find_by_id(id)
end

The parse name is simply a suggestion. I can't think of any naming conventions for this off the top of my head.

Describe considered alternative solutions, and the reasons why you have not proposed them as a solution here.

I haven't considered any alternatives, tbh. The choice between runtime exception and compile-time error typically seems to be done this way in Crystal, such as with Hash#[] vs Hash#[]?.

Does it break backward compatibility, if yes then what's the migration path?

If the String overload for UUID.new is not changed, this would be purely additive.

The text was updated successfully, but these errors were encountered:

Blacksmoke16 · 2022-04-13T04:00:46Z

I either have to know to rescue the exception or I end up serving a 500 response and potentially causing cascading failures.

Validation of the request params would be another viable solution to this problem. I.e. assert the value is actually a UUID before trying to parse it, returning a helpful error to the user if it's not.

In regards to this feature suggestion, I don't have strong feelings either way. Tho reminder you can do inline rescues. E.g. uuid = UUID.new id_param rescue nil.

EDIT: I could possibly see a UUID.valid? method being useful as well.

jgaskins · 2022-04-13T04:37:36Z

@Blacksmoke16

Validation of the request params would be another viable solution to this problem. I.e. assert the value is actually a UUID before trying to parse it, returning a helpful error to the user if it's not.

Calling it a solution to the problem is overselling it. That is a workaround for the specific use case I gave, but web requests are not the only use case for parsing UUIDs from sources you don't control. I used it as an example only because it's the one most people are familiar with.

The point is that there is no way to do this in the standard library without raising an exception.

Tho reminder you can do inline rescues. E.g. uuid = UUID.new id_param rescue nil.

This also does not solve the problem. I still have to know to rescue it or serve a 500 and cannot lean on the type system to require me to handle it like I can with Hash#[]?.

Additionally, the inline rescue involves heap allocations that are not necessary and is, without exaggeration, 5000x slower than returning nil on a parse failure:

 exception 174.76  (  5.72ms) (± 1.83%)  469kB/op  5095.57× slower
return nil 890.48k (  1.12µs) (± 1.06%)   0.0B/op          fastest

Benchmark code — only difference from `UUID.new(String)` is I changed all the `raise` lines to return `nil` instead

require "uuid"
require "benchmark"

uuid = nil
Benchmark.ips do |x|
  iterations = 1_000 # Ensure the `parse` isn't too fast for `Benchmark` to measure precisely enough
  x.report "exception" { iterations.times { uuid = UUID.new("asdf") rescue nil } }
  x.report "return nil" { iterations.times { uuid = UUID.parse("adsf") } }
end

pp uuid

struct UUID
  def self.parse(value : String, variant = nil, version = nil)
    bytes = uninitialized UInt8[16]

    case value.size
    when 36 # Hyphenated
      {8, 13, 18, 23}.each do |offset|
        if value[offset] != '-'
          return nil
        end
      end
      {0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34}.each_with_index do |offset, i|
        bytes[i] = hex_pair_at value, offset
      end
    when 32 # Hexstring
      16.times do |i|
        bytes[i] = hex_pair_at value, i * 2
      end
    when 45 # URN
      return nil unless value.starts_with? "urn:uuid:"
      {9, 11, 13, 15, 18, 20, 23, 25, 28, 30, 33, 35, 37, 39, 41, 43}.each_with_index do |offset, i|
        bytes[i] = hex_pair_at value, offset
      end
    else
      return nil
    end

    new(bytes, variant, version)
  end
end

EDIT: I could possibly see a UUID.valid? method being useful as well.

This would require validating the UUID twice. Once before parsing, and once during parsing. The validation during parsing cannot be eliminated because there are 3 different formats to account for.

And again, it does not solve the problem of forcing me to know that I have to make that additional call before actually parsing.

straight-shoota · 2022-04-13T06:12:13Z

Sounds good to me. Name should probably be UUID.parse? to highlight that it's non-raising.

asterite · 2022-04-16T21:57:25Z

I just tried to see what is slow about raising exceptions. It turns out that, unless I'm mistaken, calling libunwind to raise is extremely slow. Allocating the exception or getting the call stack is very cheap compared to calling that function. Which unfortunately means there's nothing we can do to optimize it :-(

jgaskins · 2022-04-18T00:03:41Z

Fascinating, TIL. I thought generating the call stack was the expensive part. 🤯

asterite · 2022-04-18T09:25:34Z

Actually, I was running out of disk space so I uninstalled XCode together with Instruments. I wanted to profile a program that just loops and raises and rescues to see where most of the time is spent. My guess is libunwind raise, but I could be wrong. If someone has a Mac here (or can profile that program in any other platform) it would really help to see the actual profile graph.

asterite · 2022-04-18T13:11:50Z

Here's a benchmark comparing raise/rescue vs. just getting the callstack (these are IP addresses, turning those into useful descriptions is a bit more expensive, but that is only done if you print an exception: it doesn't happen otherwise):

require "benchmark"

{% for i in 1..10 %}
  def raise{{i}}
    raise{{i - 1}}
  end

  def caller{{i}}
    caller{{i - 1}}
  end
{% end %}

def raise0
  raise "OH NO"
end

def caller0
  Exception::CallStack.new
end

Benchmark.ips do |x|
  x.report("raise") do
    raise10 rescue nil
  end

  x.report("caller") do
    caller10
  end
end

Results:

 raise 256.44k (  3.90µs) (± 5.20%)   576B/op   9.69× slower
caller   2.49M (402.34ns) (± 4.63%)  64.0B/op        fastest

So it's definitely not computing the trace that's slow. I think it's unwinding the stack that's slow, that is, going back in the call chain and processing handlers (there's a personality function involved which we define, but in my tests that wasn't the slow part either)

jgaskins · 2022-04-18T13:35:37Z

@asterite Looks like you're right. With "Top Functions" and "Invert Call Tree" set on Instruments, the hottest call stacks all have libunwind in them, just a couple frames down from the Benchmark module methods:

jgaskins added the kind:feature label Apr 13, 2022

Blacksmoke16 added the topic:stdlib:text label Apr 13, 2022

Blacksmoke16 added the status:discussion label Apr 13, 2022

jgaskins mentioned this issue Apr 14, 2022

Add UUID.parse? #11998

Merged

straight-shoota closed this as completed in #11998 Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UUID: add non-raising parse method from String #11996

UUID: add non-raising parse method from String #11996

jgaskins commented Apr 13, 2022

Blacksmoke16 commented Apr 13, 2022 •

edited

Loading

jgaskins commented Apr 13, 2022

straight-shoota commented Apr 13, 2022

asterite commented Apr 16, 2022

jgaskins commented Apr 18, 2022

asterite commented Apr 18, 2022

asterite commented Apr 18, 2022

jgaskins commented Apr 18, 2022

UUID: add non-raising parse method from String #11996

UUID: add non-raising parse method from String #11996

Comments

jgaskins commented Apr 13, 2022

Feature Request

Blacksmoke16 commented Apr 13, 2022 • edited Loading

jgaskins commented Apr 13, 2022

straight-shoota commented Apr 13, 2022

asterite commented Apr 16, 2022

jgaskins commented Apr 18, 2022

asterite commented Apr 18, 2022

asterite commented Apr 18, 2022

jgaskins commented Apr 18, 2022

Blacksmoke16 commented Apr 13, 2022 •

edited

Loading