Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UUID: add non-raising parse method from String #11996

Closed
jgaskins opened this issue Apr 13, 2022 · 8 comments · Fixed by #11998
Closed

UUID: add non-raising parse method from String #11996

jgaskins opened this issue Apr 13, 2022 · 8 comments · Fixed by #11998

Comments

@jgaskins
Copy link
Contributor

Feature Request

  • Is your feature request related to a problem? Please describe clearly and concisely what is it.

When parsing a UUID from a string, there appears to be no way to avoid raising an exception if that string is not a UUID. For example, if I make a request to GET /posts/f5b953f4-9811-4d5e-a674-e5c8b12f1bb8 I might do something like this to fetch the post:

if id_param = params["id"]?
  post = PostQuery.new.find_by_id(UUID.new(id_param))
end

But if that path segment is not in a UUID format (for example, if someone requested /posts/asdf), an exception is raised and I don't see a way to avoid that. In my web service, I either have to know to rescue the exception or I end up serving a 500 response and potentially causing cascading failures.

  • Describe the feature you would like, optionally illustrated by examples, and how it will solve the above problem.

It'd be awesome if there were a method that returned nil if the string is not in a UUID format that let me avoid a runtime exception by requiring that I handle the nil case. For example:

if (id_param = params["id"]?) && (id = UUID.parse(id_param))
  post = PostQuery.new.find_by_id(id)
end

The parse name is simply a suggestion. I can't think of any naming conventions for this off the top of my head.

  • Describe considered alternative solutions, and the reasons why you have not proposed them as a solution here.

I haven't considered any alternatives, tbh. The choice between runtime exception and compile-time error typically seems to be done this way in Crystal, such as with Hash#[] vs Hash#[]?.

  • Does it break backward compatibility, if yes then what's the migration path?

If the String overload for UUID.new is not changed, this would be purely additive.

@Blacksmoke16
Copy link
Member

Blacksmoke16 commented Apr 13, 2022

I either have to know to rescue the exception or I end up serving a 500 response and potentially causing cascading failures.

Validation of the request params would be another viable solution to this problem. I.e. assert the value is actually a UUID before trying to parse it, returning a helpful error to the user if it's not.

In regards to this feature suggestion, I don't have strong feelings either way. Tho reminder you can do inline rescues. E.g. uuid = UUID.new id_param rescue nil.

EDIT: I could possibly see a UUID.valid? method being useful as well.

@jgaskins
Copy link
Contributor Author

@Blacksmoke16

Validation of the request params would be another viable solution to this problem. I.e. assert the value is actually a UUID before trying to parse it, returning a helpful error to the user if it's not.

Calling it a solution to the problem is overselling it. That is a workaround for the specific use case I gave, but web requests are not the only use case for parsing UUIDs from sources you don't control. I used it as an example only because it's the one most people are familiar with.

The point is that there is no way to do this in the standard library without raising an exception.

Tho reminder you can do inline rescues. E.g. uuid = UUID.new id_param rescue nil.

This also does not solve the problem. I still have to know to rescue it or serve a 500 and cannot lean on the type system to require me to handle it like I can with Hash#[]?.

Additionally, the inline rescue involves heap allocations that are not necessary and is, without exaggeration, 5000x slower than returning nil on a parse failure:

 exception 174.76  (  5.72ms) (± 1.83%)  469kB/op  5095.57× slower
return nil 890.48k (  1.12µs) (± 1.06%)   0.0B/op          fastest
Benchmark code — only difference from `UUID.new(String)` is I changed all the `raise` lines to return `nil` instead
require "uuid"
require "benchmark"

uuid = nil
Benchmark.ips do |x|
  iterations = 1_000 # Ensure the `parse` isn't too fast for `Benchmark` to measure precisely enough
  x.report "exception" { iterations.times { uuid = UUID.new("asdf") rescue nil } }
  x.report "return nil" { iterations.times { uuid = UUID.parse("adsf") } }
end

pp uuid

struct UUID
  def self.parse(value : String, variant = nil, version = nil)
    bytes = uninitialized UInt8[16]

    case value.size
    when 36 # Hyphenated
      {8, 13, 18, 23}.each do |offset|
        if value[offset] != '-'
          return nil
        end
      end
      {0, 2, 4, 6, 9, 11, 14, 16, 19, 21, 24, 26, 28, 30, 32, 34}.each_with_index do |offset, i|
        bytes[i] = hex_pair_at value, offset
      end
    when 32 # Hexstring
      16.times do |i|
        bytes[i] = hex_pair_at value, i * 2
      end
    when 45 # URN
      return nil unless value.starts_with? "urn:uuid:"
      {9, 11, 13, 15, 18, 20, 23, 25, 28, 30, 33, 35, 37, 39, 41, 43}.each_with_index do |offset, i|
        bytes[i] = hex_pair_at value, offset
      end
    else
      return nil
    end

    new(bytes, variant, version)
  end
end

EDIT: I could possibly see a UUID.valid? method being useful as well.

This would require validating the UUID twice. Once before parsing, and once during parsing. The validation during parsing cannot be eliminated because there are 3 different formats to account for.

And again, it does not solve the problem of forcing me to know that I have to make that additional call before actually parsing.

@straight-shoota
Copy link
Member

Sounds good to me. Name should probably be UUID.parse? to highlight that it's non-raising.

@asterite
Copy link
Member

I just tried to see what is slow about raising exceptions. It turns out that, unless I'm mistaken, calling libunwind to raise is extremely slow. Allocating the exception or getting the call stack is very cheap compared to calling that function. Which unfortunately means there's nothing we can do to optimize it :-(

@jgaskins
Copy link
Contributor Author

Fascinating, TIL. I thought generating the call stack was the expensive part. 🤯

@asterite
Copy link
Member

Actually, I was running out of disk space so I uninstalled XCode together with Instruments. I wanted to profile a program that just loops and raises and rescues to see where most of the time is spent. My guess is libunwind raise, but I could be wrong. If someone has a Mac here (or can profile that program in any other platform) it would really help to see the actual profile graph.

@asterite
Copy link
Member

Here's a benchmark comparing raise/rescue vs. just getting the callstack (these are IP addresses, turning those into useful descriptions is a bit more expensive, but that is only done if you print an exception: it doesn't happen otherwise):

require "benchmark"

{% for i in 1..10 %}
  def raise{{i}}
    raise{{i - 1}}
  end

  def caller{{i}}
    caller{{i - 1}}
  end
{% end %}

def raise0
  raise "OH NO"
end

def caller0
  Exception::CallStack.new
end

Benchmark.ips do |x|
  x.report("raise") do
    raise10 rescue nil
  end

  x.report("caller") do
    caller10
  end
end

Results:

 raise 256.44k (  3.90µs) (± 5.20%)   576B/op   9.69× slower
caller   2.49M (402.34ns) (± 4.63%)  64.0B/op        fastest

So it's definitely not computing the trace that's slow. I think it's unwinding the stack that's slow, that is, going back in the call chain and processing handlers (there's a personality function involved which we define, but in my tests that wasn't the slow part either)

@jgaskins
Copy link
Contributor Author

@asterite Looks like you're right. With "Top Functions" and "Invert Call Tree" set on Instruments, the hottest call stacks all have libunwind in them, just a couple frames down from the Benchmark module methods:

Screenshot of macOS Instruments app showing a profile of the run for Ary's benchmark code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants