# [m-dev.] Re: Random number generation package

Kenneth Almquist kalmquist2 at hotmail.com
Sat Sep 1 06:39:54 AEST 2001

```Thomas Conway wrote:
> If you search through the mercury-users archive, you'll see the source
> for a robust, no holes barred, RNG that I translated from George Magalia's
> "Mother of all random number generators". It'll be slower than yours,
> but it is very robust.

I found it, and it looks interesting.  Here are my comments.

1)  The C code operates on 16 bit signed values whereas the Mercury code
operates on 15 bit unsigned values.  This change means that the
Mercury code produces different output than the C code.  Otherwise
the algorithms appear to be identical.

2)  The Mercury code takes 17.3 microseconds, which is an order of
magnitude slower than the package I sent.  I cleaned up the C code,
timed the core algorithm, and got a time of 0.34 microseconds
to generate a 32 bit random number.  (Converting the result to
floating point takes another 0.06 microseconds using a floating
point multiply.)  This is not much slower than the 0.22 microseconds
for the algorithm I used, so the algorithm is not inherently slow.

My guess is that the efficient way to code this algorithm in Mercury
is to extract all the values of the state into individual variables,
compute the values which will go into the new state, and then create
the new state.  This would result in only one memory allocation per
call, other than the allocations required to box floats.  Three
words of the state are used only to hold intermediate values in the
random number generation process so this would also make the state
smaller.

3)  The interface includes a number of ways to use the basic random
number generation functionality.  You appear to have thought more
about the interface issues than I have.

4)  The irange predicate produces incorrect results if Upper = max_int,
because the expression Max + 1 overflows.  This could be fixed by
performing the addition in floating point.

5)  Only the low order 30 bits of the seed are used.  If these bits
are zero, then the pseudo-random sequence produced will consist of
nothing but zeros.  To avoid this, and to avoid duplicate sequences,
the seed should be required to be in the range 0 .. pow(2, 30) - 1.

6)  The state of the generator consists mostly of previous outputs.
Computing the remaining 32 bits takes no more work that computing
the next random number.  Therefore code can be written which
efficiently distinguishes between the output of the generator and
the output of a true source of random numbers.

7)  In an ideal random function, flipping one input bit should change a
given output bit about 50% of the time (depending on the values of
the remaining input bits).  In the random number generator I posted,
this is not the case for all bits, but for each bit in the A output
there are at least 64 input bits where this is true.  For the B
output, which appears in the generated random numbers only as the
low order bits of floating point numbers, the figure is 56 input bits.
For the C version of the "Mother" algorithm, the number is no more
that 12 for any output bit, and is zero for the low order output
bit.  For two rounds, the number varies from 7 to 23, with a median
of 10.  The two round statistics are going to be similar to the
statistics for nine sequential outputs (since the operation of
converting between a state and a sequence of nine output is
simple).  So a pattern detection algorithm might work as follows:

Let C1 = the last nine outputs,
C2 = another set of nine sequential outputs, and
F2 = the output that immediately followed the outputs in C2.
We want to predict the next output of the generator, which we will
call F1, and we attempt to do this by exploiting the relationship
between xor(F1, F2) and xor(C1, C2).  For each bit F2[i], we ask
whether the corresponding bit in F1 will be the same or different.
To do that, we look at the bits which differ between C1 and C2.
If for every one of these bits, flipping the bit is either more
or less likely to result in a change to bit i of the output, then
we can make a prediction of F1[i].  The prediction will be quite
weak, but if repeat the above operation with enough different
values of C2, we will eventually get a large collection of
predictions for an output bit, and combine them.

It's hard to imagine the bit correlations discussed here actually
mattering to any real application, so this is not a real weakness.
It only matters if you are trying to judge how much of a margin
the algorithm provides beyond the minimum requirements for
pseudo-randomness.

I notice that the function to compute half the output contains
8 multiplications by constants and 8 addition operations.  These
are all linear over the field of real numbers.  Then we perform
a shift and a mask operation (equivalent to integer division with
remainder), which are not linear.  Since we have only a couple of
nonlinear operations, a promising area for discovering regularities
is to look at the effect of numerical differences, rather than
at the effect of bit flipping.  If I were going to spend more time
on this, that is what I would look at next.  (In my package I have
10 exclusive or's and 4 right shifts to break up patterns based
on numerical differences.)
Kenneth Almquist
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au