[m-dev.] Random number generation package

Kenneth Almquist kalmquist2 at hotmail.com
Thu Aug 30 05:59:26 AEST 2001


The random number generator in Mercury library produces output that is
obviously nonrandom in some respects.  The low order bit alternates
between zero and one.  If you initialize the random number generator
with a seed of 1 and request a series of random permutations of the
list [1, 2, 3, 4], the last element in every permutation generated
will be 3.  These regularities in the output won't matter for some
applications, but for others they will.  So the algorithm used in the
Mercury library is not useless, but is in my opinion not sufficiently
general to be included in a general purpose library.  I have therefore
written a replacement which appears at the end of this message.

The interface includes routines to generate a random integer in a
specific range and to generate a random floating point number, since
clients are likely to want to do one or the other.  I've named these
rand_int and rand_float, but you might want to change the name of the
former to "random" for backwards compatibility.

I've included an initialization routine which attempts to produce a
unique supply of random numbers (based on the current time).  This is
a peculiar beast because we want to tell the compiler that it can
reorder calls to this routine, but cannot eliminate calls and reuse
the output value.  I *think* that making the output mostly unique
does this (or is the compiler allowed to reuse a unique value by
copying it?).  Since I'm not sure about that, I've made this routine
take io__state arguments.

I'm not sure the ability to accept 64 bits of seed is of practical
value.  I included it because with a 32 bit seed you can distinguish
between the output of the pseudo-random number generator and a true
source of random numbers by trying all possible 32 bit seeds.  With
a 64 bit 

I did a few performance measurements on a Pentium II running Linux
with the current (0.10.1-1) release:

        rand_int/3              0.92 microseconds
        rand_int/4              0.97 to 1.02 (depending on range)
        rand_float              1.38

        Mercury library         0.31
        Ditto, using tuples     0.95

My code runs at about 1/3 of the speed of the existing library
routine.  The difference is almost all due to the fact that in
my code the state of the random number generator is represented
by a tuple of four integers, rather than by a single integer.
Therefore every call to the random number generator calls the
memory allocator.  The last entry shows what happens to the time
of the Mercury library random number generator if we change the
supply type to a four element tuple.  Presumably the Mercury
compiler will one day learn to reuse the memory from destructive
inputs.

I did code a random number generator in Mercury which runs in
about 0.08 microseconds:

:- type supply = int.
:- pred rand_int(int, supply, supply).
:- mode rand_int(out, in, out).

rand_int(Result, RS0, RSout) :-
    RSout = RS0 * 729 + 3,
    A = xor(RSout, logical_right_shift(RSout, 16)),
    B = xor(A, logical_right_shift(A, 8)),
    C = B * 5,
    D = xor(C, logical_right_shift(C, 5)),
    E = D * (2 * D + 1),
    Result = xor(logical_right_shift(E, 1), logical_right_shift(E, 21)).

Since this uses only 32 bits of state, I don't suggest using this,
but it could be considered if you get feedback that random number
generation is too slow.  (Also, the multiplier in the first line should
probably be changed to one whose low order binary digits are binary 101.)

My random number package follows.  I grant the University of Melborne
permission to use this code as it sees fit.
				Kenneth Almquist


%---------------------------------------------------------------------------%
% Copyright (C) 2001 The University of Melbourne.
% This file may only be copied under the terms of the GNU Library General
% Public License - see the file COPYING.LIB in the Mercury distribution.
%---------------------------------------------------------------------------%
%
% file: alt_random.m
% main author: Kenneth Almquist
% stability: low
%
% This module defines a pseudo-random number generator.  Since random
% number generation is stateful, the supply of random numbers is threaded
% through the random number producing routines (similar to the way the
% state of the world is threaded through the I/O routines).  The routines
% use mdi and muo modes so that you don't accidentally create a copy of
% the supply and get two identical sequences of random numbers.  (If you
% do want two copies of a sequence, you can use the copy library routine.)
%
% This random number generator is intended to provide good randomness
% rather than to achieve the absolute maximum of speed, so you should be
% able to use this generator for any purpose other than cryptography
% without worrying about the quality of the random numbers produced.
%
%---------------------------------------------------------------------------%


:- module alt_random.

:- interface.

	% The type supply represents a sequence of pseudo_random
	% numbers.
:- type supply.

:- import_module io.

	% Create a supply of random numbers.  The second argument
	% is a seed value which selects a particular pseudo-random
	% sequence.
:- pred init(supply, int).
:- mode init(muo, in) is det.

	% Like init/2, but takes two seed arguments.  This allows
	% 64 bits of seed to be supplied on machines with 32 bit
	% integers.  It is equivalent to the following on machines
	% where the following function can be computed without integer
	% overflow:
	%
	% init2(RS, X, Y) :-
	%	P32 = pow(2, 32),
	%	init(RS, (X mod P32) * P32 + Y mod P32.
:- pred init2(supply, int, int).
:- mode init2(muo, in, in) is det.

	% Create a unique supply of random numbers.  The supply will
	% be different from any of the pseudo-random sequences
	% returned by init/2 or init2/3.  Also, no two calls to
	% init/3 made within the same run of a program will return
	% the same sequence.  Two calls to init/3 made from different
	% programs, or different runs of the same program, will
	% return different sequences as long as the system clock
	% reports that the two calls were made at different times.
:- pred init(supply, io__state, io__state).
:- mode init(muo, di, uo) is det.


	% Return a pseudorandom integer in the range 0 .. pow(2, 31) - 1.
	% This covers the full range of nonnegative integers on a
	% 32 bit machine.  The same range is used for 64 bit machines
	% in order to make the results portable.
:- pred rand_int(int, supply, supply).
:- mode rand_int(out, mdi, muo) is det.

	% rand_int(Result, Range, Supply_in, Supply_out)
	% Return a pseudo-random integer in the range 0 .. Range - 1.
	% Range should not exceed pow(2, 31).  (Values large than this
	% are not portable since they cannot be represented on a machine
	% with 32 bit ints.)
:- pred rand_int(int, int, supply, supply).
:- mode rand_int(out, in, mdi, muo) is det.

	% Return a uniformly distributed pseudo-random floating point
	% number in closed interval [0.0 .. 1.0].
	%
	% We considered making this routine refrain from returning 1.0,
	% in order to be compatible with other pseudo-random number
	% generators such drand48, but doing this biases the average
	% of the output very slightly away from 0.5, and has no
	% particular advantage in general.  If you are adapting code
	% which requires a random number generator which returns
	% values strictly less than 1.0, you can call rand_float/3
	% and simply discard the value 1.0 if it appears.
:- pred rand_float(float, supply, supply).
:- mode rand_float(out, mdi, muo) is det.


:- implementation.

	% Note:  This random number generator was designed using operations
	% on 32 bit unsigned integers.  Since Mercury doesn't have unsigned
	% integers, we use the int type.  This requires the host to used
	% two's complement representation and to ignore arithmetic overflow,
	% so that signed and unsigned arithmetic are the same.  The one
	% operation we use which differs for signed and unsigned two's
	% complement values is shift right, so we define a predicate in
	% C named logical_right_shift to perform this operation.  An int
	% may contain more than 32 bits.  For most operations, we can
	% ignore any extra high order bits, but logical_right_shift must
	% clear them before shifting.
	%
	% The state of the random number generator is declared to be a
	% mostly unique value.  This makes sense semanticly (since the
	% reuse of a state is likely to be a programming bug), and may
	% also allow future versions of the Mercury compiler to generate
	% better code.  With Mercury version 0.10.1-1, the majority of
	% the cost of generating a random number is allocating the new
	% tuple to contain the updated random number generator state.
	% The Mercury compiler currently does not fully implement unique
	% values, so we make the assumption that the tuple constructor
	% creates a unique tuple, and add unsafe_promise_unique calls
	% based on this assumption.


:- pragma c_header_code("
#ifdef HAVE_SYS_TIME
#include <sys/time.h>	/* for gettimeofday() */
#else
#include <time.h>	/* for time() */
#endif
").

:- import_module int, float.

	% The state of a random number generator is represented by
	% four integers.  We name these A, B, C, and K when we have
	% to assign names to them in the code, but in defining the
	% type we use a tuple.
:- type supply == {int, int, int, int}.


:- func logical_right_shift(int, int) = int.
:- func scale(int, int, int) = int.


	% In the code below, we sign extend Seed to 64 bits and then
	% split it into two 32 bit quantities.  The test of max_int
	% is used to determine whether we are running on a 32 bit or
	% a 64 bit machine.  For a 32 bit machine, we sign extend
	% the seed with either 0 or -1.  For a 64 bit machine, we
	% use a right shift to split the seed into two 32 bit values.
init(Result, Seed) :-
    (if max_int = 2147483647 then
	(if Seed >= 0 then
	    init2(Result, 0, Seed)
	 else
	    init2(Result, -1, Seed))
     else
	init2(Result, Seed >> 32, Seed)).


init2(Result, Seed0, Seed1) :-
    basic_init(Result, 0x1C3BD26E, Seed0, Seed1).


	% init/3 uses get_unique to get a distinct seed for each
	% call.  Get_unique is declared impure so that the compiler
	% is prohibited from reusing the result, but init/3 does
	% not have to be impure because the io__state arguments
	% have the same effect.
:- pragma promise_pure(init/3).

init(Result, IOin, IOout) :-
    impure get_unique(B, C),
    basic_init(Result, 0x650A7354, B, C),
    IOout = IOin.

	% On the first call to get_unique, the current time is
	% used as the result.  This is a pretty good source of
	% data to initialize a PRNG; it means that if you run a
	% program twice on the same machine you will get different
	% random sequences as long as the time between executions
	% of the program is larger than the granularity of the
	% system clock.  For subsequent calls, we add one to the
	% previous time.  This assures that we will never return
	% the same result twice within a single program execution.
	% (The value will wrap around after pow(2, 64) calls, but
	% with current hardware speeds that is not a problem.)
:- impure pred get_unique(int, int).
:- mode get_unique(out, out) is det.

:- pragma foreign_proc("C", get_unique(B::out, C::out),
    [will_not_call_mercury, not_thread_safe], "
	static int cached = 0;
	static MR_Integer save_B;
	static MR_Integer save_C;

	if (! cached) {
#ifdef HAVE_SYS_TIME
	    struct timeval tv;
	    gettimeofday(&tv, (struct timezone *) 0);
	    B = tv.tv_sec;
	    C = tv.tv_usec << 12;
#else
	    B = time((time_t) 0);
	    C = 0;
#endif
	    save_B = B;
	    save_C = C;
	    cached = 1;
	} else {
	    B = save_B;
	    C = ++save_C;
	    if (C == 0)
		B = ++save_B;
	}
").


	% The following predicate is the common initialization code
	% used to create a new supply.  See the comments before the
	% rand/2 predicate.
:- pred basic_init(supply, int, int, int).
:- mode basic_init(muo, in, in, in) is det.

basic_init(Result, Seed_A, Seed_B, Seed_C) :-
    unsafe_promise_unique({Seed_A, Seed_B, Seed_C, 1525878906}, RS1),
    rand(RS1, RS2),
    unsafe_promise_unique(RS2, Result).


	% rand/2 is the core of the random number generator.  It
	% takes maps one state of the random number generator to
	% the next.
	%
	% The basic design is an extended Feistal structure involving
	% the first three tuple elements.  We use three elements
	% because that is the smallest number of elements which
	% will allow us to compute two rounds in parallel.  Since
	% most new hardware will execute at least two instructions
	% in parallel, we want to compute two rounds in parallel in
	% order to make good use of the hardware.
	%
	% The final element of the state, called K, is updated using
	% the relation K[i+i] = K[i] * 5 + 1.  The values form a cycle
	% with length pow(2, 32).  This ensures that the generator as
	% a whole cannot have a cycle length of less that that.  (Since
	% short cycles by definition contain only a few elements, the
	% chances of finding one are miniscule, but we want to be able
	% to prove reasonable minimum cycle length.  Since we always
	% initialize K to the same value, this also means that if two
	% different seeds pick different points in the same cycle,
	% the distance between the two points in the cycle will be
	% at least pow(2, 32).
	%
	% After a call to rand, each bit of A depends on at least 68
	% bits of {A, B, C}, Each bit of B depends on at least 58 bits
	% of {A, B, C}.  Calling rand twice results in avalanche,
	% meaning that the probability that changing one input bit will
	% change a given output bit is approximately 0.5.  (This assumes
	% that the input, other than the bit being changed, is chosen
	% randomly.)  Therefore, when initializing a random number
	% generator, we call rand once.  This ensures that every bit
	% of the first random number generated will depend on every
	% bit of the seed.

:- pred rand(supply, supply).
:- mode rand(mdi, out) is det.

rand({A1, B1, C1, K1}, {Aout, Bout, Cout, Kout}) :-
    K2 = K1 * 5,
    K3 = K2 + 1,

    C2 = xor(C1, B1 * 729),
    B2 = xor(B1, A1 * 729),
    A2 = A1 + xor(C2 * 4, logical_right_shift(C2, 15)),
    C3 = C2 + xor(B2 * 4, logical_right_shift(B2, 15)),
    B3 = xor(B2, xor(A2, K2) * 729),
    A3 = xor(A2, xor(C3, K3) * 729),
    Temp1 = B3 * 5,
    Temp2 = A3 * 5,
    C4 = C3 + xor(Temp1, logical_right_shift(Temp1, 11)),
    B4 = B3 + xor(Temp2, logical_right_shift(Temp2, 11)),

    Aout = B4,
    Bout = C4,
    Cout = A3,
    Kout = K3.


rand_int(Result, S, Sout) :-
    rand(S, S1),
    {A, _, _, _} = S1,
    unsafe_promise_unique(S1, Sout),
    Result = logical_right_shift(A, 1).

rand_int(Result, Num_Values, S, Sout) :-
    rand(S, S1),
    {A, B, _, _} = S1,
    unsafe_promise_unique(S1, Sout),
    Result = scale(A, B, Num_Values).

	% To produce a floating point result, we convert two integers
	% to floats, and then add these floats with appropriate scaling.
	% We shift the values right by 1 before converting to float so
	% that the result will be positive.  Each converted integer
	% contains 31 random bits, so we have 62 bits total.  On most
	% machines type float has a 53 bit mantissa, so the low order
	% bits will be dropped unless the random number happens to be
	% very close to zero.
	%
	% The mathematical sum of the two floating point numbers will
	% always be less than 1.0, so a result of 1.0 can only be
	% produced by rounding upwards.  This means that the value
	% 1.0 is slightly less likely to occur than it should be, but
	% this cannot be detected by statistical tests involving fewer
	% than pow(2, 62) inputs, so there is no real problem here.
	% Also, the actual probability of getting a value of exactly
	% 1.0 depends on the rounding mode.  If we always round up,
	% then the bias introduced by the rounding compensates for the
	% bias introduced by the lack of an integer input exactly
	% corresponding to 1.0.
	%
	% The default rounding mode for IEEE floating point is round
	% towards even.  We would prefer to use the round towards
	% infinity mode, but there is no portable way to access that
	% mode.  Therefore, floating point numbers with the low order
	% bit set will be less common than floating point numbers with
	% this bit cleared.  This effect could easily be detected by
	% performing a statistical test on the low order bit, but not
	% in the normal use of floating point random numbers.  Most
	% floating point code is not terribly sensitive to the values
	% of the low order bits, because these bits will differ from
	% the mathematically correct result due to rounding normally
	% performed as part of computations.
	%
	% The simplest way to get 62 bits would be to call rand_int/3
	% twice.  However, as discussed in the preceding paragraph,
	% the low order bits of a floating point random number are
	% not all that critical.  Therefore, we take two integers of
	% output from a single call to rand.
	%
	% This routine could be written entirely in Mercury, and the
	% commented out lines show how to do this.  However, coding
	% these lines in C reduced execution time by 39% when using
	% verion 0.10.1-1 of the Mercury compiler.  I assume that this
	% is because Mercury stores all floating point values in memory.

rand_float(Result, S, Sout) :-
    rand(S, S1),
    {A, B, _, _} = S1,
    unsafe_promise_unique(S1, Sout),
    float_result(A, B, Result).
/*
    X = float(logical_right_shift(A, 1)),
    Y = float(logical_right_shift(B, 1)),
    Result = X / 2147483648.0 + Y / (2147483648.0 * 2147483648.0).
*/

:- pred float_result(int, int, float).
:- mode float_result(in, in, out) is det.

:- pragma foreign_proc("C", float_result(A::in, B::in, Result::out),
    [will_not_call_mercury, thread_safe], "
	double X = (double)(MR_Integer)(((MR_Unsigned)A & 0xFFFFFFFF) >> 1);
	double Y = (double)(MR_Integer)(((MR_Unsigned)B & 0xFFFFFFFF) >> 1);
	Result = X / 2147483648.0 + Y / (2147483648.0 * 2147483648.0);
").


:- pragma foreign_proc("C", logical_right_shift(Left::in, Right::in) = (Result::out),
    [will_not_call_mercury, thread_safe], "
	Result = ((MR_Unsigned)Left & 0xFFFFFFFF) >> Right;
").


	% scale/3 is used by rand_int/4 to reduce the random bits to
	% the specified range.  One common way to do this is the
	% modulo operation, but that is expensive on most hardware.
	% Therefore, we treat the input as a 64 bit fixed point
	% number with the binary point on the left.  We multiply
	% this by Range and discard the fractional part.  Multiplication
	% is an expensive operation, but most modern machines have
	% hardware that makes multiplication much faster than division
	% or modulo.  Also, unless the range is a power of 2, all results
	% cannot be equally probable.  The use of a multiply distributes
	% the low probability numbers throughout the range, rather than
	% clumping them all at the high end.
	%
	% In general, multiplying a 64 bit fraction by a 32 bit number
	% requires two 32x32 multiplication operations.  However, we
	% discard the low order bits, so the only possible effect of
	% the second multiply is to cause a carry into the bits we do
	% not discard.  Assuming that Range is much less than pow(2, 32),
	% this carry is unlikely.  Therefore, we check the low order bits
	% of the first multiplication to see if a carry is possible,
	% before performing the second multiplication operation.

:- pragma foreign_proc("C", scale(Hi::in, Low::in, Range::in) = (Result::out),
    [will_not_call_mercury, thread_safe], "
	typedef MR_INT_LEAST64_TYPE I64;
	MR_Unsigned	H = (MR_Unsigned)Hi & 0xFFFFFFFF;
	MR_Unsigned	L = (MR_Unsigned)Low & 0xFFFFFFFF;
	MR_Unsigned	R = (MR_Unsigned)Range;
	I64		approx = (I64)H * (I64)R;
	MR_Unsigned	approx_low = (MR_Unsigned)approx & 0xFFFFFFFF;
	if (approx_low >= 0xFFFFFFFF - R) {
	    I64 approx_2 = (I64)L * (I64)R;
	    approx += approx_2 >> 32;
	}
	Result = approx >> 32;
").
--------------------------------------------------------------------------
mercury-developers mailing list
Post messages to:       mercury-developers at cs.mu.oz.au
Administrative Queries: owner-mercury-developers at cs.mu.oz.au
Subscriptions:          mercury-developers-request at cs.mu.oz.au
--------------------------------------------------------------------------



More information about the developers mailing list