Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/rand: Float32() and Float64() are not uniformly distributed #4965

Closed
gopherbot opened this issue Mar 2, 2013 · 6 comments
Closed

math/rand: Float32() and Float64() are not uniformly distributed #4965

gopherbot opened this issue Mar 2, 2013 · 6 comments
Milestone

Comments

@gopherbot
Copy link
Contributor

by arnehormann:

What steps will reproduce the problem?

See http://play.golang.org/p/uZw1CbxcMx
It uses a constant source (returns seed, sets seed before each call to Float64) to
inspect the distance between uint64 seeds and the generated float64 values.


What is the expected output?

The distance between all adjacent generated float64 values in range [0,1) should be the
same and two adjacent uint64 should not be mapped to the same float64.
The output of the referenced program should not print anything below the line
"======="


What do you see instead?

Intent: illustrate problems with http://golang.org/src/pkg/math/rand/rand.go?#L92
f0  (0) :=  0p-1074 (0)
f1  (1) :=  4503599627370496p-115   (1.0842021724855044e-19)
high    (9223372036854775295) :=    9007199254740991p-53    (0.9999999999999999)
higher  (9223372036854775296) :=    4503599627370496p-52    (1)
max (9223372036854775807) :=    4503599627370496p-52    (1)
=======
max == higher, but the source uint64 difference is 511
 - the mapping uint63 to float64 has 'holes' for adjacent numbers
distances between adjacent float64s are different
 - the largest difference is 8998403161718784p-106 (1.109138822452671e-16)
   which is more than f1 - f0: 4503599627370496p-115 (1.0842021724855044e-19)


Which compiler are you using (5g, 6g, 8g, gccgo)?

play.golang.org and locally 6g


Which operating system are you using?

OS X Mountain Lion


Which version are you using?  (run 'go version')

go version devel +b0c8d47acfc5 Sat Mar 02 10:41:53 2013 +0400 darwin/amd64


Please provide any additional information below.

The calculation used in rand.Float64 (and Float32) uses 63 (31) bits of the passed
integer although the mantissa is only 52 (23) bits long. The binary representation is
not able to contain all possible values and creates a "kind of" logarithmic
distribution with an increasing amount of ints mapped to the same float the nearer it
gets to 1.0. Not having an uniform distribution and this bias probably also influence
the creation of numbers in different (e.g. normal) distributions.

I propose using one of these ways:

either create a number in [1,2) (which are always in uniform distribution) and subtract
1:

func UniformFloat64(b uint64) float64 {
    const range_1_2Exp = 0x3ff0000000000000
    const mantissaMask = 0x000fffffffffffff
    return math.Float64frombits(range_1_2Exp|(mantissaMask&b)) - 1.0
}

or create the smallest legal "step" and multiply it with a number masked to be
in range [0, max mantissa]:

func UniformFloat64(b uint64) float64 {
    const mantissaMask = 0x000fffffffffffff
    step := math.Float64frombits(0x3cb0000000000000)
    return step * float64(mantissaMask&b)
}

Both create the same float for the same input, I didn't benchmark them yet.

I have some tests and code ready if you are interested, I only need to know if the
existing functions can safely be replaced or there may be some weird dependency on their
behavior.
@remyoudompheng
Copy link
Contributor

Comment 2:

The returned distribution is the projection of the (supposedly uniform) distribution
returned by the Source to the non-constant-step realm of floating-point numbers.
It means that may look different near zero and near one, but that doesn't make it
non-uniform. It is even a closer approximation to a mathematical uniform distribution
than your proposal, which makes output more regularly spaced but less uniform.

@gopherbot
Copy link
Contributor Author

Comment 3 by arnehormann:

First, let me say I'm not an expert and I'm glad for corrections.
So far, I'm getting that you are right and my title for the issue might have been wrong.
In laymans terms, the closer to 1.0 we get, the more numbers get "ceiled"and "floored",
but as the same number of ceils and floors take place, it all evens out.
I still think my proposed approach is better. Say you want normal distributed numbers.
Usually, you'd take uniform distributed numbers and convert them. The current approach
would skews the tails.
When you say the current approach is more uniform, it may well be - but not in a "fair"
way. AFAIK regular spacing beats higher resolution near zero - at least for simulations
it is less surprising and more fair, which is a good thing.

@gopherbot
Copy link
Contributor Author

Comment 4 by arnehormann:

I read up on it some more and stand my ground: it's not uniform distributed.
According to Wikipedia, each value in a uniform distribution has the same probability to
"get picked", the probability is 1 / (max - min). That's clearly not the case for the
current implementation, where picking the value closest to 1.0 is 512 times as probable
as picking 0.0. Please ignore my previous comment.
Sources:
http://en.wikipedia.org/wiki/Uniform_distribution_(continuous)#Probability_density_function
and http://en.wikipedia.org/wiki/Probability_density_function

@remyoudompheng
Copy link
Contributor

Comment 5:

You are confusing uniform distribution on discrete sets and the continuous uniform
distribution. The result of rand.Floatxx is certainly not uniformly distributed over a
discrete set of floating-point numbers, and there is no point into doing that unless you
have a particular reason.
However, it is certainly a good approximation of the continuous uniform distribution
over [0,1]. If you think it is not the case, please show a problem that has a good
reason to use rand.Floatxx, that is not correctly addressed by the current
implementation, and would be better solved by your more complicated proposal.
People who want samples of a discrete distribution should definitely use integers, not
floating-point numbers.

@gopherbot
Copy link
Contributor Author

Comment 6 by arnehormann:

I yield, I cannot think of a problematic scenario.
After running the attached benchmark with the current approach, another version
multiplying with a constant factor instead of dividing, an improved version of my second
proposal and my first proposal, the current implementation won (if only by 65 ns for
2000000000 ops vs 67 ns).
The only argument left to me is that maybe it could ease fixing issue #4254 if
source.Uint64() instead of Int63() were used because the conversion code wouldn't have
to be adapted - but I guess that's neither dramatic nor realistic.
I really love this language and had high hopes to contribute something, but apparently
I'll have to search elsewhere. Keep up the great work and thank you for the time!

Attachments:

  1. speed_test.go (852 bytes)

@rsc
Copy link
Contributor

rsc commented Mar 4, 2013

Comment 7:

Status changed to Retracted.

@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1maybe label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants