Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integer square root #10549

Merged
merged 10 commits into from
Jul 23, 2021
Merged

Add integer square root #10549

merged 10 commits into from
Jul 23, 2021

Conversation

kimburgess
Copy link
Contributor

Adds an integer square root operation to stdlib.

Resolves #10545

src/math/math.cr Outdated Show resolved Hide resolved
spec/std/math_spec.cr Outdated Show resolved Hide resolved
@jzakiya
Copy link

jzakiya commented Mar 26, 2021

To get this to run correctly for 64-bits ints I had to do this:

  def isqrtbits(value = self)
    raise ArgumentError.new "Input must be non-negative integer" if value < 0
    return value if value < 2
  
    return value if value < 2
    res = value.class.zero
    one = typeof(value).new(1)                   **********************
    bit = one << (res.leading_zeros_count - 2)   **********************
    bit >>= value.leading_zeros_count & ~0x3
  
    while (bit != 0)
      if value >= res + bit
        value -= res + bit
        res = (res >> 1) + bit
      else
        res >>= 1
      end
      bit >>= 2
    end
  
    res
  end

@jzakiya
Copy link

jzakiya commented Mar 26, 2021

My tests show Newton's method is faster.

def isqrt(value : Int)
  raise ArgumentError.new "Input must be non-negative integer" if value < 0
  return value if value < 2
  res = value.class.zero
  one = typeof(value).new(1)
  bit = one << (res.leading_zeros_count - 2)
  bit >>= value.leading_zeros_count & ~0x3

  while (bit != 0)
    if value >= res + bit
      value -= res + bit
      res = (res >> 1) + bit
    else
      res >>= 1
    end
    bit >>= 2
  end
  res
end

def isqrt1(num : Int)        # Newton's method version used in Ruby for Integer#sqrt
  raise ArgumentError.new "Input must be non-negative integer" if num < 0
  return num if num < 2
  b = num.bit_length
  one = typeof(num).new(1)  # value 1 of type self
  x = one << ((b - 1) >> 1) | num >> ((b >> 1) + 1)
  while (t = num // x) < x
    x = (x + t) >> 1 
  end
  x                          # ouput same Integer type as input
end

num  = 18446744073709551615u64
sqrt = isqrt(num)
puts "isqrt  of #{num} = #{sqrt}"
sqrt1 = isqrt1(num)
puts "isqrt1 of #{num} = #{sqrt1}"
puts
require "benchmark"

val = 18446744073709551615u64
Benchmark.ips do |x|
  x.report("isqrt")  { isqrt(val) }
  x.report("isqrt1") { isqrt1(val) }
end
-----------------------------------------------------
isqrt  of 18446744073709551615 = 4294967295
isqrt1 of 18446744073709551615 = 4294967295

 isqrt  24.19M ( 41.33ns) (± 3.57%)  0.0B/op   2.85× slower
isqrt1  68.92M ( 14.51ns) (± 2.69%)  0.0B/op        fastest

@kimburgess
Copy link
Contributor Author

Your tests are correct. My setup for for that was extremely broken, and I was also using a different impl of your Newton's method. Here's what I'm woking off currently: https://gist.github.com/kimburgess/6a57c220d431fdae1e43b4c2db636708

Results from a local machine:

UInt8
  sqrt (15) 615.55M (  1.62ns) (± 0.36%)  0.0B/op        fastest
newton (15) 140.04M (  7.14ns) (± 0.43%)  0.0B/op   4.40 slower
 isqrt (15) 139.61M (  7.16ns) (±10.99%)  0.0B/op   4.41 slower

UInt16
  sqrt (255) 615.52M (  1.62ns) (± 0.36%)  0.0B/op        fastest
newton (255) 139.84M (  7.15ns) (± 0.14%)  0.0B/op   4.40 slower
 isqrt (255)  63.63M ( 15.72ns) (±12.60%)  0.0B/op   9.67 slower

UInt32
  sqrt (65535) 615.43M (  1.62ns) (± 0.26%)  0.0B/op        fastest
newton (65535) 158.05M (  6.33ns) (± 6.51%)  0.0B/op   3.89 slower
 isqrt (65535)  42.74M ( 23.40ns) (± 0.17%)  0.0B/op  14.40 slower

UInt64
  sqrt (4294967296) 615.39M (  1.62ns) (± 0.29%)  0.0B/op        fastest
newton (4294967295)  64.97M ( 15.39ns) (± 0.16%)  0.0B/op   9.47 slower
 isqrt (4294967295)  27.92M ( 35.82ns) (± 8.34%)  0.0B/op  22.05 slower

Int8
  sqrt (11) 615.51M (  1.62ns) (± 0.27%)  0.0B/op        fastest
newton (11)  71.57M ( 13.97ns) (±12.25%)  0.0B/op   8.60 slower
 isqrt (11) 135.81M (  7.36ns) (±11.63%)  0.0B/op   4.53 slower

Int16
  sqrt (181) 615.49M (  1.62ns) (± 0.29%)  0.0B/op        fastest
newton (181)  46.50M ( 21.50ns) (± 0.24%)  0.0B/op  13.24 slower
 isqrt (181)  87.74M ( 11.40ns) (± 0.14%)  0.0B/op   7.02 slower

Int32
  sqrt (46340) 615.59M (  1.62ns) (± 0.26%)  0.0B/op        fastest
newton (46340)  27.51M ( 36.35ns) (± 6.13%)  0.0B/op  22.38 slower
 isqrt (46340)  53.73M ( 18.61ns) (±13.84%)  0.0B/op  11.46 slower

Int64
  sqrt (3037000499) 615.83M (  1.62ns) (± 0.58%)  0.0B/op        fastest
newton (3037000499)  12.02M ( 83.17ns) (± 7.92%)  0.0B/op  51.22 slower
 isqrt (3037000499)  24.32M ( 41.12ns) (±10.44%)  0.0B/op  25.32 slower

It feels almost dirty, but based on this and given that precision does not degrade until moving into the high ranges of a UInt64 a good option seems to be just taking advantage of the FPU and casting up all signed integers and unsigned up to 32 bit, switching to Newton's method for UInt64's (all, for simplicity), then bumping up to LibGMP for BigInts.

@jzakiya
Copy link

jzakiya commented Mar 26, 2021

Interesting! I get similar relative results running your benchmark code.
The newton method is ~2.7x faster than bit method for unsigned UInts while bit method is ~2x faster for signed Ints.

However, the newton method implementation also works with BigInts, but the bit method fails because of:

 6 | bit = one << (res.leading_zeros_count - 2)
                       ^------------------
Error: undefined method 'leading_zeros_count' for BigInt
def isqrt1(num : Int)        # Newton's method version used in Ruby for Integer#sqrt
  raise ArgumentError.new "Input must be non-negative integer" if num < 0
  return num if num < 2
  b = num.bit_length
  one = typeof(num).new(1)  # value 1 of type self
  x = one << ((b - 1) >> 1) | num >> ((b >> 1) + 1)
  while (t = num // x) < x
    x = (x + t) >> 1 
  end
  x                         # output same Integer type as input
end

require "big"

num  = "1234567890123456789012345678901234567890123456789012345678901".to_big_i
puts "isqrt1 of #{num} = #{isqrt1(num)}"
----------------------------------------------------------------
isqrt1 of 1234567890123456789012345678901234567890123456789012345678901 = 1111111106111111099361111058186

Also, Math.sqrt produces the wrong value for UInt64 (4294967296 vs 4294967295) and accuracy is THE highest priority.

Given that the newton method is more faster for Uints than bit method is for Ints, and can handle BigInts, I'd go with it.

If you want to feel dirty and use Math.sqrt (I wouldn't`) you'd have to account for the edge cases.
We can put in docs up to what Integer|Float values Math.sqrt returns accurate values. I think that is best.

Given that the newton method also handles BigInts I'd like to see benchmarks against BigInt.sqrt to see perf difference (need to benchmark against 100s - 1000s of digit numbers). I imagine for most use cases, you don't need ultimate speed, and just want accurate results when you need them, and being able to use just one method that works for ALL Int types will eliminate users having to know they need to do |userequire ''big'' , just for BigInts.

@jzakiya
Copy link

jzakiya commented Mar 27, 2021

Just thinking.

If you want to create a single hybrid method, then do the fastest for Ints, and also BigInts.sqrt when numbers become large enough to use it when it becomes best. That way the user gets the best of all worlds without worrying about all the gory details.

@kimburgess
Copy link
Contributor Author

kimburgess commented Mar 27, 2021

1bdf7a6 added in BigInt#sqrt via gmp. That one is definitely more efficient than attempting natively as BigInt is already just an interface over nums there.

I agree with the comments on the issue thread that bumping this out of Math and onto Int as an instance method seems a little neater as that would keen this aligned across all Int types. @straight-shoota your call on preferred approach there - both for the move, and naming if this happens (Int#sqrt or Int#isqrt).

For perf, here's a sample for random inputs across each type. Y-axis is iterations / second as provided by Benchmark (higher is better). Previous bench results were just a single input—{{type}}::MAX—which was skewing things.

@kimburgess
Copy link
Contributor Author

Also, obligatory "premature optimisation is the root of all evil"... yes maybe, but it's fun!

@straight-shoota
Copy link
Member

I'd probably prefer adding just a correct default implementation first. Performance optimizations can always be figured out afterwards.

The method should be next to sqrt, that means Math.isqrt.

@kimburgess
Copy link
Contributor Author

The complexity with this being Math.isqrt rather than Int#sqrt is that module currently works over primitives only. If there's a desire to have it there, then either:

  1. Math needs to require big, or
  2. Math handles cases for primitive types only, with a different API for BigInt (as per current state of PR)

Math is predominately an interface over LibM (which does not provide an integer square root). I originally placed this for proximity to #sqrt, but IMO (based on newfound std lib familiarity) this is not the best home for it. Positioning it as an instance method lets that module keep it's form as an interface and avoids tangling dependencies. There is also existing instance operations such as Int#gcd, Int#lcm, Int#fdiv and BigInt#factorial which give some weight to this.

Both Newton's method and the abacus algorithm this implements yield correct results for all primitive types. As above, this does appear to provide the better performance in the majority of use cases, and also consistent performance within each type. @jzakiya you likely have significantly more experience in this area though so very open to contradiction of that statement if there's something that I'm missing.

@asterite
Copy link
Member

You can define isqrt in Math. Define it for primitives by default, and add an overload for BigIn inside big_int.cr

@kimburgess
Copy link
Contributor Author

Not sure what went on with the linux_ci test run - looks to have triggered an unrelated compiler bug. Build is green in the associated action on the fork: https://github.com/kimburgess/crystal/actions/runs/694912741. Might need a re-run here if someone from core can git it a kick?

@straight-shoota
Copy link
Member

I suppose that segfault might warrant a separate investigation but yeah it's unlikely to be related to the changes here. It happens somewehere in a finalize handler. Could just be an LLVM bug or some never-reproducible coincidence 🤷

@jzakiya
Copy link

jzakiya commented Mar 31, 2021

Hey @kimburgess can you provide a link to your benchmark code that created the data for the different Int types.
I assume you ran those on Intel cpus. I want to try to see if those characteristics hold for other architects (ARM, RiscV, PowerPC, et al). I personally only have Intel based hardware, but will see if I can get people with other hardware to run it.

I find those results very interesting (surprising) and maybe I can get some academic types I know to investigate why they occur.

@kimburgess
Copy link
Contributor Author

kimburgess commented Apr 1, 2021

Sure thing. Here's what generated those outputs along with the raw results from my local runs: https://gist.github.com/kimburgess/f85038f4fb9ee9315d9d8b16b41b115f. Columns in the csv's are:

input mean iterations / s standard deviation variance

One extremely solid point from another recent PR is that Benchmark.ips does not do anything with the output so it is highly likely that there's some downstream optimisation happening that nullifies these results. If I have time this weekend I'll run another test with a similar fix in place.

@jzakiya
Copy link

jzakiya commented Apr 2, 2021

Thank you @kimburgess.

Copy link
Member

@beta-ziliani beta-ziliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@beta-ziliani beta-ziliani added this to the 1.2.0 milestone Jul 20, 2021
@straight-shoota straight-shoota merged commit 1433b44 into crystal-lang:master Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add integer square root to stdlib
8 participants