[m-rev.] RNGs

Julien Fischer jfischer at opturion.com
Mon Aug 19 01:45:37 AEST 2019


Hi Mark,

On Tue, 13 Aug 2019, Mark Brown wrote:

> I've just made a pull request to add some RNGs to the standard library, as
> well as a typeclass interface to them. Pull request is here:
>
> https://github.com/Mercury-Language/mercury/pull/70
>
> The diff is also attached, for anyone who would prefer to comment on
> it via email.

Thanks for looking into this, it's part of the stdlib that has long needed
fixing.  Here's a partial review that mainly addresses aspects of the interface
design.  I haven't reviewed the actual RNG implementations as yet.

> commit f102ddaf4c128137b27ac6e85ac881faaf435c9b
> Author: Mark Brown <mark at mercurylang.org>
> Date:   Tue Aug 13 16:42:38 2019 +1000
>
>     Some RNGs for the standard library, plus typeclass interfaces.
>
>     library/rng.m:
>         Top-level module containing typeclasses and generic routines,
>         as well as overall documentation.

The top-level module for random number generator facilities should really be
random.  I realise that's taken, however I propose that the new stuff be added
to it and the existing contents of that file deprecated.  (They ought to be
able to co-exist in the interim until the old stuff is removed.)

>     library/rng.binfile.m:
>         A generator that reads from a file.
>
>     library/rng.marsaglia.m:
>         Fast, simple generator. Diehard results for default seed:
>             PASSED: 109
>             WEAK:   2
>             FAILED: 3
>
>     library/rng.tausworthe.m:
>         Combined Tausworthe generators. Diehard results for default seed:
>
>         generator T3
>             PASSED: 113
>             WEAK:   1
>             FAILED: 0
>
>         generator T4
>             PASSED: 112
>             WEAK:   2
>             FAILED: 0

There's a bit of question as to what RNGs should be included in the
stdlib by default.

>     library/library.m:
>     library/MODULES_DOC:
>         Add the new modules to the public interface.
>
>     library/float.m:
>         Add functions to convert from (u)int32/64 to float.
>
>     library/uint32.m:
>         Add function to convert from uint64 to uint32.
>
>     tests/hard_coded/Mmakefile:
>     tests/hard_coded/rng1.{m,exp}:
>     tests/hard_coded/rng2.{m,exp}:
>         Test the generators. The expected outputs are derived from
>         the output of the C reference implementations.

...

> diff --git a/library/float.m b/library/float.m
> index e19981c..4e2abfe 100644
> --- a/library/float.m
> +++ b/library/float.m
> @@ -133,13 +133,25 @@
>  :- func from_int8(int8) = float.
>
>      % Convert a signed 16-bit integer into a float.
> -    % Always succeeds as all unsigned 8-bit integers have an exact
> +    % Always succeeds as all signed 16-bit integers have an exact
>      % floating-point representation.
>      %
>  :- func from_int16(int16) = float.
> 
> +    % Convert a signed 32-bit integer into a float.
> +    % Always succeeds as all signed 32-bit integers have an exact
> +    % floating-point representation.

Not true in .spf grades.

> +    %
> +:- func from_int32(int32) = float.
> +
> +    % Convert a signed 64-bit integer into a float.
> +    % The behaviour when the integer exceeds the range of what can be
> +    % exactly represented by a float is undefined.
> +    %
> +:- func from_int64(int64) = float.
> +
>      % Convert an unsigned 8-bit integer into a float.
> -    % Always succeeds as all signed 16-bit integers have an exact
> +    % Always succeeds as all unsigned 8-bit integers have an exact
>      % floating-point representation.
>      %
>  :- func from_uint8(uint8) =  float.
> @@ -150,6 +162,18 @@
>      %
>  :- func from_uint16(uint16) = float.
> 
> +    % Convert an unsigned 32-bit integer into a float.
> +    % Always succeeds as all unsigned 32-bit integers have an exact
> +    % floating-point representation.

Again, not true in .spf grades.

> +    %
> +:- func from_uint32(uint32) = float.
> +
> +    % Convert an unsigned 64-bit integer into a float.
> +    % The behaviour when the integer exceeds the range of what can be
> +    % exactly represented by a float is undefined.
> +    %
> +:- func from_uint64(uint64) = float.

In the integer modules I added, the from_* names are typically used for checked
versions of operations; I think this and the above should be named
cast_from_uint64/1 etc.

...

> diff --git a/library/rng.binfile.m b/library/rng.binfile.m
> new file mode 100644
> index 0000000..36c1966
> --- /dev/null
> +++ b/library/rng.binfile.m
> @@ -0,0 +1,96 @@
> +%---------------------------------------------------------------------------%
> +% vim: ft=mercury ts=4 sts=4 sw=4 et
> +%---------------------------------------------------------------------------%
> +% Copyright (C) 2019 The Mercury team.
> +% This file is distributed under the terms specified in COPYING.LIB.
> +%---------------------------------------------------------------------------%
> +%
> +% File: rng.binfile.m
> +% Main author: Mark Brown
> +%
> +% "Random" number generator that reads numbers from a binary file.
> +%
> +%---------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
> +
> +:- module rng.binfile.
> +:- interface.
> +
> +:- import_module io.
> +
> +%---------------------------------------------------------------------------%
> +
> +:- type binfile.
> +:- instance urng(binfile, io).
> +
> +    % Open a binfile generator from a filename. This should be closed
> +    % when no longer needed.
> +    %
> +:- pred open(string, io.res(binfile), io, io).
> +:- mode open(in, out, di, uo) is det.

Is there a reason not to use predmode declarations here?

> +
> +    % Close a binfile generator.
> +    %
> +:- pred close(binfile, io, io).
> +:- mode close(in, di, uo) is det.
> +
> +%---------------------------------------------------------------------------%
> +
> +    % Generate a number between 0 and max_uint64. This reads 8 bytes
> +    % at a time from the binfile and interprets them as an unsigned,
> +    % big-endian integer.
> +    %
> +    % Throws an exception if the end-of-file is reached.
> +    %
> +:- pred rand(binfile, uint64, io, io).
> +:- mode rand(in, out, di, uo) is det.
> +
> +%---------------------------------------------------------------------------%
> +
> +:- implementation.
> +
> +:- import_module require.
> +:- import_module uint64.
> +
> +%---------------------------------------------------------------------------%
> +
> +:- type binfile
> +    --->    binfile(binary_input_stream).
> +
> +:- instance urng(binfile, io) where [
> +    pred(urandom/4) is rand,
> +    ( urandom_max(_) = uint64.max_uint64 )
> +].

It would be preferable if each generator also exported the predicates that
implement these methods, so they can be used without going via the type class
interface.

Ditto for the other generators.

...

> diff --git a/library/rng.m b/library/rng.m
> new file mode 100644
> index 0000000..24e8086
> --- /dev/null
> +++ b/library/rng.m
> @@ -0,0 +1,345 @@
> +%---------------------------------------------------------------------------%
> +% vim: ft=mercury ts=4 sts=4 sw=4 et
> +%---------------------------------------------------------------------------%
> +% Copyright (C) 2019 The Mercury team.
> +% This file is distributed under the terms specified in COPYING.LIB.
> +%---------------------------------------------------------------------------%
> +%
> +% File: rng.m
> +% Main author: Mark Brown
> +%
> +% This module provides an interface to several random number generators,
> +% which can be found in the submodules.

It also provides one instance of a distribution function (random_gauss);
such distributions would be better off placed in their own sub-modules.
(If you look at say the random library in Boost or in more recent versions of
C++, for example, there a bunch of others that we may well end up adding here.)

> +% Two styles of the interface are provided, a ground style and a
> +% unique style. Each has its own advantages and disadvantages:
> +%
> +%   - Ground RNGs are easier to use; for example they can be easily
> +%     stored in larger data structures.
> +%   - Ground RNGs are easier to implement instances for.
> +%   - Unique RNGs are able to use destructive update, and therefore
> +%     are often able to operate more efficiently.
> +%   - Unique RNGs need to be explicitly duplicated (i.e., to produce
> +%     a new generator that will generate the same sequence of numbers).
> +%     This may be regarded as an advantage or a disadvantage.
> +%   - Some RNGs, for example the binfile generator that reads data from
> +%     a file, use the IO state and therefore must use the unique interface.

I/O for consistency with other library documentation.

> +% Each RNG defined in the submodules is natively one of these two styles.
> +% Conversion between the two styles can be done with make_urng/3 and
> +% make_shared_rng/2, below, although this incurs additional overhead.
> +%
> +%
> +% Example, ground style:
> +%
> +%   main(!IO) :-
> +%       RNG0 = rng.marsaglia.init,
> +%       roll(RNG0, RNG1, !IO),
> +%       roll(RNG1, _, !IO).
> +%
> +%   :- pred roll(RNG, RNG, io, io) <= rng(RNG).
> +%   :- mode roll(in, out, di, uo) is det.
> +%
> +%   roll(!RNG, !IO) :-
> +%       random_int(1, 6, N, !RNG),
> +%       io.format("You rolled a %d\n", [i(N)], !IO).
> +%
> +%
> +% Example, unique style:
> +%
> +%   main(!IO) :-
> +%       rng.tausworthe.init_t3(RP, RS0),
> +%       roll(RP, RS0, RS1, !IO),
> +%       roll(RP, RS1, _, !IO).
> +%
> +%   :- pred roll(RP, RS, RS, io, io) <= urng(RP, RS).
> +%   :- mode roll(in, di, uo, di, uo) is det.            % note unique modes
> +%
> +%   roll(RP, !RS, !IO) :-
> +%       urandom_int(RP, 1, 6, N, !RS),
> +%       io.format("You rolled a %d\n", [i(N)], !IO).
> +%
> +%
> +% Example, converting style:
> +%
> +%   main(!IO) :-
> +%       rng.tausworthe.init_t3(RP, RS),
> +%       RNG0 = make_shared_rng(RP, RS),
> +%       random_int(1, 6, N, RNG0, RNG1),
> +%       ...
> +%
> +%   main(!IO) :-
> +%       RNG = rng.marsaglia.init,
> +%       make_urng(RNG, RP, RS0),
> +%       urandom_int(RP, 1, 6, N, RS0, RS1),
> +%       ...
> +%
> +%---------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
> +
> +:- module rng.
> +:- interface.
> +
> +:- include_module binfile.
> +:- include_module marsaglia.
> +:- include_module tausworthe.
`> +
> +%---------------------------------------------------------------------------%
> +
> +    % random_int(Start, Range, N, !RNG)
> +    %
> +    % Generate a random integer between Start and Start+Range-1 inclusive.
> +    % Throws an exception if Range < 1 or Range > random_max.
> +    %
> +:- pred random_int(int, int, int, RNG, RNG) <= rng(RNG).
> +:- mode random_int(in, in, out, in, out) is det.

I would hope the intention here is that each call to random_int/5 returns a
random int uniformly distributed in the given range.  If so, the documentation
should say that, if not it should say what the user can assume (or not) about
the distribution of the ints returned.

(I feel a reference to the XKCD random number generator is order here:
<https://xkcd.com/221/>.)

> +
> +    % Generate a random float between 0.0 and 1.0, inclusive.

I believe that [0, 1) is more usual.

> +    %
> +:- pred random_float(float, RNG, RNG) <= rng(RNG).
> +:- mode random_float(out, in, out) is det.

I suggest the following predicates here:

     uniform_int
     uniform_int_range
     uniform_uint
     uniform_uint_range
     ditto for all other integer types

     uniform_float_range
     uniform_float_01     i.e. in [0, 1).

All of the above generate random values uniformly distributed in the
given range.  The following predicate ...

> +    % Generate two random floats from a normal distribution with
> +    % mean 0 and standard deviation 1, using the Box-Muller method.
> +    %
> +    % We generate two at a time for efficiency; they are independent of
> +    % each other.
> +    %
> +:- pred random_gauss(float, float, RNG, RNG) <= rng(RNG).
> +:- mode random_gauss(out, out, in, out) is det.

... should be moved to a separate submodule.  Ditto for the unique variants.

> +%---------------------------------------------------------------------------%
> +
> +    % Interface to random number generators.
> +    %
> +:- typeclass rng(RNG) where [
> +
> +        % Generate a random integer between 0 and random_max, inclusive.
> +        %
> +    pred random(uint64, RNG, RNG),
> +    mode random(out, in, out) is det,

I think the basic unit returned by an RNG ought to be a uint32 rather than a
uint64 -- I note most of the implementations you have here actually do that
anyway and just cast the result to a uint64.

I suggest we impose a stronger requirement on RNGs here, namely that each
call to random/3 returns a random 64-bit (or 32-bit) word.  It is the
responsbility of the RNG instance to fulfil that requirement.   Most modern
pseudo-random number generators should be fine with this requirement.

Also, I prefer the name 'next' to 'random'.

> +
> +        % Return the largest integer that can be generated.
> +        %
> +    func random_max(RNG) = uint64
> +].

I would place the typeclass definitions first and then the predicates
that use them after.

> +%---------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
> +
> +    % urandom_int(RP, Start, Range, N, !RS)
> +    %
> +    % Generate a random integer between Start and Start+Range-1 inclusive.
> +    % Throws an exception if Range < 1 or Range > urandom_max.
> +    %
> +:- pred urandom_int(RP, int, int, int, RS, RS) <= urng(RP, RS).
> +:- mode urandom_int(in, in, in, out, di, uo) is det.
> +
> +    % Generate a random float between 0.0 and 1.0, inclusive.
> +    %
> +:- pred urandom_float(RP, float, RS, RS) <= urng(RP, RS).
> +:- mode urandom_float(in, out, di, uo) is det.
> +
> +    % Generate two random floats from a normal distribution with
> +    % mean 0 and standard deviation 1, using the Box-Muller method.
> +    %
> +    % We generate two at a time for efficiency; they are independent of
> +    % each other.
> +    %
> +:- pred urandom_gauss(RP, float, float, RS, RS) <= urng(RP, RS).
> +:- mode urandom_gauss(in, out, out, di, uo) is det.

Everything I said above applies here with the obvious modifications.

...

> +%---------------------------------------------------------------------------%
> +%---------------------------------------------------------------------------%
> +
> +:- implementation.
> +
> +:- import_module int.
> +:- import_module float.
> +:- import_module math.
> +:- import_module uint64.
> +
> +%---------------------------------------------------------------------------%
> +
> +random_int(Start, Range0, N, !RNG) :-
> +    Range = uint64.det_from_int(Range0),
> +    random(N0, !RNG),
> +    Max = random_max(!.RNG),
> +    N1 = N0 // (Max // Range),
> +    ( if N1 < Range then
> +        N = Start + uint64.cast_to_int(N1)
> +    else
> +        random_int(Start, Range0, N, !RNG)
> +    ).

Not that it needs to be addressed now, but it should be possible to do better
here, see:

    Daniel Lemire
    Fast Random Integer Generation in an Interval
    ACM Transactions on Modeling and Computer Simulation 29 (1), 2019

Julien.


More information about the reviews mailing list