RANLUX++: Add compatibility engines #8383

hahnjo · 2021-06-09T11:54:20Z

These engines can be used to obtain the same sequences of numbers as RANLUX generators using recursive subtract-with-borrow steps, but with enhanced performance. Apart from the choice of parameters, the main difference between the various implementations is the way of seeding the initial state of the generator.

This commit includes engines for compatibility with:

the original implementation by Fred James, with parameters for
- luxury level 3 (p = 223), also matching gsl_rng_ranlux
- luxury level 4 (p = 389), also matching gsl_rng_ranlux389 producing floating point numbers from 24 bits of randomness;
the family of generators using a second-generation version of the RANLUX algorithm as implemented in the GNU Scientific Library:
- gsl_rng_ranlxs[012] using 24 bits per floating point number, and
- gsl_rng_ranlxd[12] using 48 bits per floating point number;
the implementation by Martin Lüscher written in C that uses four states per generator; similar to GSL, there are ranlxs[012] with 24 bits per number and ranlxd[12] with 48 bits per number; and
the generators std::ranlux{24,48} defined by the C++ standard.

The values in the tests were extracted directly from the mentioned implementations, showing that the LCG implementation is equivalent to the RANLUX algorithm.

I am not adding compatibility engines for CLHEP because its semantics are very weird: While CLHEP::RanluxEngine::setSeed yields the same sequences as the original implementation by James, the seed is treated differently when passed as an argument to the constructor.

These engines can be used to obtain the same sequences of numbers as RANLUX generators using recursive subtract-with-borrow steps, but with enhanced performance. Apart from the choice of parameters, the main difference between the various implementations is the way of seeding the initial state of the generator. This commit includes engines for compatibility with: * the original implementation by Fred James, with parameters for - luxury level 3 (p = 223), also matching gsl_rng_ranlux - luxury level 4 (p = 389), also matching gsl_rng_ranlux389 producing floating point numbers from 24 bits of randomness; * the family of generators using a second-generation version of the RANLUX algorithm as implemented in the GNU Scientific Library: - gsl_rng_ranlxs[012] using 24 bits per floating point number, and - gsl_rng_ranlxd[12] using 48 bits per floating point number; * the implementation by Martin Lüscher written in C that uses four states per generator; similar to GSL, there are ranlxs[012] with 24 bits per number and ranlxd[12] with 48 bits per number; and * the generators std::ranlux{24,48} defined by the C++ standard. The values in the tests were extracted directly from the mentioned implementations, showing that the LCG implementation is equivalent to the RANLUX algorithm. I am not adding compatibility engines for CLHEP because its semantics are very weird: While CLHEP::RanluxEngine::setSeed yields the same sequences as the original implementation by James, the seed is treated differently when passed as an argument to the constructor.

phsft-bot · 2021-06-09T11:54:32Z

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

hahnjo · 2021-06-09T12:05:24Z

To elaborate a bit on CLHEP: RanluxppCompatEngineJamesP3 rng(314159265) yields the same sequence as

CLHEP::RanluxEngine r;
r.setSeed(314159265);

but directly passing the seed to the constructor à la CLHEP::RanluxEngine r(314159265) gives different numbers. The reason is that the constructor, after invoking setSeed which works as documented, also calls setSeeds with the given seed parameter as the only entry in the seed table. That procedure is subtly different and could be mimicked as follows:

diff --git a/math/mathcore/src/RanluxppEngineImpl.cxx b/math/mathcore/src/RanluxppEngineImpl.cxx
index 100f8d8638..bbf508a6a8 100644
--- a/math/mathcore/src/RanluxppEngineImpl.cxx
+++ b/math/mathcore/src/RanluxppEngineImpl.cxx
@@ -219,13 +219,14 @@ public:
       // Multiplicative Congruential generator using formula constants of L'Ecuyer
       // as described in "A review of pseudorandom number generators" (Fred James)
       // published in Computer Physics Communications 60 (1990) pages 329-344.
-      int64_t seed = s;
+      int64_t seed = s & 0xffffff;
       auto next = [&]() {
          const int a = 0xd1a4, b = 0x9c4e, c = 0x2fb3, d = 0x7fffffab;
+         int64_t oldSeed = seed;
          int64_t k = seed / a;
          seed = b * (seed - k * a) - k * c ;
          if (seed < 0) seed += d;
-         return seed & 0xffffff;
+         return oldSeed & 0xffffff;
       };
 
       // Iteration is reversed because the first number from the MCG goes to the

That would add compatibility for the constructor, but leave no way to call SetSeed on an existing object. Moreover this scheme only uses the lower 24 bits of the user's seed...

Axel-Naumann · 2021-06-09T12:48:01Z

Can you motivate why we should include those in ROOT's interface? I understand the motivation for testing! I'm sure you have a good reason to also expose them, I'd just like to see the reasons :-)

hahnjo · 2021-06-09T14:16:25Z

@Axel-Naumann yes, testing is one of the motivations, in particular continuous testing to prevent future regressions (now we can check against an external implementation, instead of just copying the current values and declaring them "known-good").

The other reason, and why I think this might provide benefit for users, is performance: The original RANLUX implementation by James (at least its implementation in GSL) needs 40 seconds to sum 1 million numbers at luxury level 3, gsl_rng_ranlux389 (luxury level 4) takes a bit more than 1 minute. The same sequence takes less than 8 seconds with RanluxppCompatEngineJamesP[34], respectively (due to the LCG, you don't even pay for higher decorrelation!).
The difference is even larger for std::ranlux48 (used directly, not through std::uniform_real_distribution which eats up more than one number per iteration): 2m55s compared to 12 seconds with RanluxppCompatEngineStdRanlux48. And because we can generate the same sequence, switching the generator won't change the output of a simulation / analysis / ... (only the interface is slightly different). Plus the users get the possibility to skip in the very same sequence without generating the intermediate numbers.

Now we could argue that all users should switch to RanluxppEngine2048, which on top of that provides better seeding and even higher decorrelation. On the other hand, the implementations above have been around for some time now and are so widely available (std::ranlux{24,48} comes with any C++ compiler) that they will remain used...

hahnjo · 2021-06-23T12:30:14Z

ping @lmoneta

lmoneta

Looks good to me.
I agree that is good exposing the compatible engines who can generate the same sequences as the old implementations but faster.
Very nice contribution!

These engines can be used to obtain the same sequences of numbers as RANLUX generators using recursive subtract-with-borrow steps, but with enhanced performance. Apart from the choice of parameters, the main difference between the various implementations is the way of seeding the initial state of the generator. This commit includes engines for compatibility with: * the original implementation by Fred James, with parameters for - luxury level 3 (p = 223), also matching gsl_rng_ranlux - luxury level 4 (p = 389), also matching gsl_rng_ranlux389 producing floating point numbers from 24 bits of randomness; * the family of generators using a second-generation version of the RANLUX algorithm as implemented in the GNU Scientific Library: - gsl_rng_ranlxs[012] using 24 bits per floating point number, and - gsl_rng_ranlxd[12] using 48 bits per floating point number; * the implementation by Martin Lüscher written in C that uses four states per generator; similar to GSL, there are ranlxs[012] with 24 bits per number and ranlxd[12] with 48 bits per number; and * the generators std::ranlux{24,48} defined by the C++ standard. The values in the tests were extracted directly from the mentioned implementations, showing that the LCG implementation is equivalent to the RANLUX algorithm. I am not adding compatibility engines for CLHEP because its semantics are very weird: While CLHEP::RanluxEngine::setSeed yields the same sequences as the original implementation by James, the seed is treated differently when passed as an argument to the constructor.

hahnjo added the in:Math Libraries label Jun 9, 2021

hahnjo requested review from Axel-Naumann and lmoneta June 9, 2021 11:54

hahnjo self-assigned this Jun 9, 2021

Axel-Naumann removed their request for review June 9, 2021 12:48

lmoneta approved these changes Jun 23, 2021

View reviewed changes

hahnjo merged commit 86f17eb into root-project:master Jun 23, 2021

hahnjo deleted the RANLUX++-compat branch June 23, 2021 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RANLUX++: Add compatibility engines #8383

RANLUX++: Add compatibility engines #8383

hahnjo commented Jun 9, 2021

phsft-bot commented Jun 9, 2021

hahnjo commented Jun 9, 2021

Axel-Naumann commented Jun 9, 2021

hahnjo commented Jun 9, 2021

hahnjo commented Jun 23, 2021

lmoneta left a comment

RANLUX++: Add compatibility engines #8383

RANLUX++: Add compatibility engines #8383

Conversation

hahnjo commented Jun 9, 2021

phsft-bot commented Jun 9, 2021

hahnjo commented Jun 9, 2021

Axel-Naumann commented Jun 9, 2021

hahnjo commented Jun 9, 2021

hahnjo commented Jun 23, 2021

lmoneta left a comment

Choose a reason for hiding this comment