[m-rev.] for review: address problems with random.m

Mark Brown dougl at cs.mu.OZ.AU
Thu Sep 26 14:35:33 AEST 2002


Hi,

When I wrote the stable_sort test case recently, I noticed that the
random numbers it generated were really not very random at all.  I
committed that test case anyway, since the numbers were good enough
for the purposes of that test case, but I also decided to investigate
the problem a bit further to see why the numbers were so bad.

It turned out that the problem was not so much with the RNG itself, but
with the way it was being used.  There are some pitfalls that need to be
avoided when using this kind of RNG; these are mentioned in "Numerical
Recipes ..." which is where the coefficients for our algorithm come
from.  The stable_sort test cases (and some other test cases) got bad
numbers because they were doing precisely what it was advised not to do.

The purpose of this change is to avoid the pitfalls in our test cases,
and to help users avoid the pitfalls by:
	- documenting them
	- providing a predicate that does things in the prescribed way.

This is for review by anyone.

Cheers,
Mark.

Estimated hours taken: 1.5
Branches: main

library/random.m:
	Provide better documentation on how (not) to use the random number
	generator in the standard library.  Generators of this kind are
	very sensitive to the way in which they are used, and if used
	in a way not anticipated they can give very poor results.

	XXX The standard library really should provide a better quality RNG.
	There is still a place for quick and dirty RNGs like this one,
	though, provided users are given adequate information about the
	pitfalls.

	One of the pitfalls to avoid is that you shouldn't use mod (or rem)
	to reduce a number down to some desired range, you should use
	something like div.  We therefore provide a predicate that does
	this the prescribed way.  We also provide a predicate that is
	similar to randmax, but returns the number of distinct random
	numbers that can be generated instead of the maximum.

	Fix random__permutation to avoid the above pitfall.

	Replace some occurrences of is/2 with =/2.

NEWS:
	Mention the new predicate.

tests/hard_coded/Mmakefile:
tests/hard_coded/random_permutation.exp:
tests/hard_coded/random_permutation.m:
	A test case for random__permutation.  We don't test that the
	result is "random", but we at least test that it is a permutation.

tests/hard_coded/random_simple.exp:
tests/hard_coded/random_simple.m:
	A simple test that checks for off-by-one errors in random__random/5.

tests/hard_coded/stable_sort.m:
tests/hard_coded/test_bitset.exp:
tests/hard_coded/test_bitset.m:
	Fix these tests to avoid the above pitfall.  The bitset tester also
	needed to be modified so that it doesn't output anything that may
	depend on the particular random sequence generated.

Index: NEWS
===================================================================
RCS file: /home/mercury1/repository/mercury/NEWS,v
retrieving revision 1.267
diff -u -r1.267 NEWS
--- NEWS	16 Sep 2002 06:07:39 -0000	1.267
+++ NEWS	26 Sep 2002 04:12:17 -0000
@@ -275,6 +275,10 @@
   `io__write_anything', and `io__print_anything', which were long ago
   renamed as `io__read', `io__write', and `io__print' respectively.
 
+* We've added random__random/5, which produces a random integer in a
+  given range, and random__randcount/3, which returns the number of
+  distinct random numbers that can be generated.
+
 Changes to the extras distribution:
 
 * The interface to Moose has been changed in a non-backwards compatible
Index: library/random.m
===================================================================
RCS file: /home/mercury1/repository/mercury/library/random.m,v
retrieving revision 1.20
diff -u -r1.20 random.m
--- library/random.m	28 Feb 2001 14:53:41 -0000	1.20
+++ library/random.m	26 Sep 2002 04:07:42 -0000
@@ -1,19 +1,47 @@
 %---------------------------------------------------------------------------%
-% Copyright (C) 1994-1998,2001 The University of Melbourne.
+% Copyright (C) 1994-1998,2001-2002 The University of Melbourne.
 % This file may only be copied under the terms of the GNU Library General
 % Public License - see the file COPYING.LIB in the Mercury distribution.
 %---------------------------------------------------------------------------%
 %
-% file: rand.m
+% file: random.m
 % main author: conway
 % stability: low
 %
 % Define a set of random number generator predicates. This implementation
-% uses a threaded random-number supply. It could be made non-unique, but
-% since each thread returns the same list of random numbers, in the interests
-% of safety, it is declared with (backtrackable) unique modes.
+% uses a threaded random-number supply.   The supply can be used in a
+% non-unique way, which means that each thread returns the same list of
+% random numbers.  However, this may not be desired so in the interests
+% of safety it is also declared with (backtrackable) unique modes.
+%
 % The coefficients used in the implementation were taken from Numerical
-% Recipes in C (Press et al), and are originally due to Knuth.
+% Recipes in C (Press et al), and are originally due to Knuth.  These
+% coefficients are described as producing a "Quick and Dirty" random number
+% generator, which generates the numbers very quickly but not necessarily
+% with a high degree of quality.  As with all random number generators,
+% the user is advised to consider carefully whether this generator meets
+% their requirements in terms of "randomness".  For applications which have
+% special needs (e.g. cryptographic key generation), a generator such as
+% this is unlikely to be suitable.
+%
+% Note that random number generators of this type have several known
+% pitfalls which the user may need to avoid:
+%
+%	1) The high bits tend to be more random than the low bits.  If
+%	you wish to generate a random integer within a given range, you
+%	should something like 'div' to reduce the random numbers to the
+%	required range rather than something like 'mod' (or just use
+%	random__random/5).
+%
+%	2) Similarly, you should not try to break a random number up into
+%	components.  Instead, you should generate each number with a
+%	separate call to this module.
+%
+%	3) There can be sequential correlation between successive calls,
+%	so you shouldn't try to generate tuples of random numbers, for
+%	example, by generating each component of the tuple in sequential
+%	order.  If you do, it is likely that the resulting sequence will
+%	not cover the full range of possible tuples.
 %
 %---------------------------------------------------------------------------%
 
@@ -38,13 +66,30 @@
 :- mode random__random(out, mdi, muo) is det.
 :- mode random__random(out, in, out) is det.
 
-	% random__randmax(RandMax, RS0, RS): binds Randax to the maximum
+	% random__random(Low, Range, Num, RS0, RS): extracts a number Num
+	% in the range Low .. (Low + Range - 1) from the random number
+	% supply RS0, and binds RS to the new state of the random number
+	% supply.  For best results, the value of Range should be no greater
+	% than about 100.
+:- pred random__random(int, int, int, random__supply, random__supply).
+:- mode random__random(in, in, out, mdi, muo) is det.
+:- mode random__random(in, in, out, in, out) is det.
+
+	% random__randmax(RandMax, RS0, RS): binds RandMax to the maximum
 	% random number that can be returned from the random number
 	% supply RS0, and returns RS = RS0.
 :- pred random__randmax(int, random__supply, random__supply).
 :- mode random__randmax(out, mdi, muo) is det.
 :- mode random__randmax(out, in, out) is det.
 
+	% random__randcount(RandCount, RS0, RS): binds RandCount to the
+	% number of distinct random numbers that can be returned from the
+	% random number supply RS0, and returns RS = RS0.  This will be one
+	% more than the number returned by randmax/3.
+:- pred random__randcount(int, random__supply, random__supply).
+:- mode random__randcount(out, mdi, muo) is det.
+:- mode random__randcount(out, in, out) is det.
+
 	% random__permutation(List0, List, RS0, RS):
 	% binds List to a random permutation of List0,
 	% and binds RS to the new state of the random number supply.
@@ -83,12 +128,27 @@
 random__random(I, RS0, RS) :-
 	RS0 = I0,
 	random__params(A, C, M),
-	I is ((I0 * A) + C) mod M,
+	I = ((I0 * A) + C) mod M,
 	copy(I, RS).
 
-random__randmax(M1, Rs, Rs) :-
+	% We could make this more robust by checking whether the range is
+	% less than a certain threshold, and using a more sophisticated
+	% algorithm if the threshold is exceeded.  But that would defeat
+	% the purpose of having a "quick and dirty" random number generator,
+	% so we don't do that.
+random__random(Low, Range, Num) -->
+	random__random(R),
+	random__randcount(M),
+	% With our current set of parameters and a reasonable choice of Range,
+	% the following should never overflow.
+	{ Num = Low + (Range * R) // M }.
+
+random__randmax(M1, RS, RS) :-
 	random__params(_A, _C, M),
-	M1 is M - 1.
+	M1 = M - 1.
+
+random__randcount(M, RS, RS) :-
+	random__params(_A, _C, M).
 
 %---------------------------------------------------------------------------%
 
@@ -121,10 +181,7 @@
 		RS = RS0
 	;
 		I1 = I - 1,
-		random__random(Random, RS0, RS1),
-		% we know that Random >= 0, so it's safe to use rem
-		% rather than mod, and rem is typically more efficient
-		Index = Random rem I,
+		random__random(0, I, Index, RS0, RS1),
 		array__lookup(Record0, Index, Next),
 		array__lookup(Record0, I1, MaxImage),
 		Order1 = [Next | Order0],
@@ -148,7 +205,7 @@
 	(
 		N > 0
 	->
-		N1 is N - 1,
+		N1 = N - 1,
 		random__random(I, RS0, RS1),
 		random__test_2(N1, Is0, RS1, RS),
 		Is = [I|Is0]
Index: tests/hard_coded/Mmakefile
===================================================================
RCS file: /home/mercury1/repository/tests/hard_coded/Mmakefile,v
retrieving revision 1.166
diff -u -r1.166 Mmakefile
--- tests/hard_coded/Mmakefile	16 Sep 2002 06:07:47 -0000	1.166
+++ tests/hard_coded/Mmakefile	26 Sep 2002 04:03:08 -0000
@@ -114,6 +114,8 @@
 	quantifier \
 	quantifier2 \
 	quoting_bug_test \
+	random_permutation \
+	random_simple \
 	rational_test \
 	redoip_clobber \
 	relation_test \
Index: tests/hard_coded/random_permutation.exp
===================================================================
RCS file: tests/hard_coded/random_permutation.exp
diff -N tests/hard_coded/random_permutation.exp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/hard_coded/random_permutation.exp	24 Sep 2002 10:34:35 -0000
@@ -0,0 +1,9 @@
+Test passed.
+Test passed.
+Test passed.
+Test passed.
+Test passed.
+Test passed.
+Test passed.
+Test passed.
+Test passed.
Index: tests/hard_coded/random_permutation.m
===================================================================
RCS file: tests/hard_coded/random_permutation.m
diff -N tests/hard_coded/random_permutation.m
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/hard_coded/random_permutation.m	24 Sep 2002 10:34:11 -0000
@@ -0,0 +1,42 @@
+:- module random_permutation.
+:- interface.
+:- import_module io.
+:- pred main(io__state::di, io__state::uo) is det.
+:- implementation.
+:- import_module int, list, random.
+
+main -->
+	{ List = gen_sorted_list(1, 100) },
+	{ random__init(1, RS) },
+	do_tests(List, 10, RS).
+
+:- pred do_tests(list(int), int, random__supply, io__state, io__state).
+:- mode do_tests(in, in, mdi, di, uo) is det.
+
+do_tests(List, Count, RS0) -->
+	(
+		{ Count > 1 }
+	->
+		{ random__permutation(List, Perm, RS0, RS1) },
+		{ list__sort_and_remove_dups(Perm, SortedList) },
+		(
+			{ SortedList = List }
+		->
+			io__write_string("Test passed.\n")
+		;
+			io__write_string("Test failed!\n")
+		),
+		do_tests(List, Count - 1, RS1)
+	;
+		[]
+	).
+
+:- func gen_sorted_list(int, int) = list(int).
+
+gen_sorted_list(M, N)
+	= ( M > N ->
+		[]
+	;
+		[M | gen_sorted_list(M + 1, N)]
+	).
+
Index: tests/hard_coded/random_simple.exp
===================================================================
RCS file: tests/hard_coded/random_simple.exp
diff -N tests/hard_coded/random_simple.exp
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/hard_coded/random_simple.exp	26 Sep 2002 04:04:42 -0000
@@ -0,0 +1,2 @@
+Test passed.
+Test passed.
Index: tests/hard_coded/random_simple.m
===================================================================
RCS file: tests/hard_coded/random_simple.m
diff -N tests/hard_coded/random_simple.m
--- /dev/null	1 Jan 1970 00:00:00 -0000
+++ tests/hard_coded/random_simple.m	26 Sep 2002 04:04:28 -0000
@@ -0,0 +1,34 @@
+:- module random_simple.
+:- interface.
+:- import_module io.
+:- pred main(io__state::di, io__state::uo) is det.
+:- implementation.
+:- import_module random, int.
+
+main -->
+	{ Seed = 3 },
+	{ random__init(Seed, RS0) },
+	test(1, 20, RS0, RS1),
+	test(-1, 20, RS1, _).
+
+:- pred test(int::in, int::in, random__supply::mdi, random__supply::muo,
+	io__state::di, io__state::uo) is det.
+
+test(Range, Count, RS0, RS) -->
+	(
+		{ Count > 0 }
+	->
+		{ random__random(0, Range, N, RS0, RS1) },
+		(
+			{ N = 0 }
+		->
+			test(Range, Count - 1, RS1, RS)
+		;
+			io__write_string("Test failed.\n"),
+			{ RS = RS1 }
+		)
+	;
+		io__write_string("Test passed.\n"),
+		{ RS = RS0 }
+	).
+
Index: tests/hard_coded/stable_sort.m
===================================================================
RCS file: /home/mercury1/repository/tests/hard_coded/stable_sort.m,v
retrieving revision 1.1
diff -u -r1.1 stable_sort.m
--- tests/hard_coded/stable_sort.m	16 Sep 2002 06:07:49 -0000	1.1
+++ tests/hard_coded/stable_sort.m	24 Sep 2002 08:40:09 -0000
@@ -11,6 +11,8 @@
 	{ generate_random_list(42, List) },
 	% io__write(List),
 	% io__nl,
+	% io__write(sort(List) `with_type` list({int, int})),
+	% io__nl,
 	{ sort_is_stable(List, Stable) },
 	(
 		{ Stable = yes },
@@ -41,19 +43,12 @@
 	(
 		{ Count > 0 }
 	->
-		rnd_mod_10(R1),
-		rnd_mod_10(R2),
+		random(0, 10, R1),
+		random(0, 10, R2),
 		generate_random_list_2(Count - 1, [{R1, R2} | List0], List)
 	;
 		{ List = List0 }
 	).
-
-:- pred rnd_mod_10(int, random__supply, random__supply).
-:- mode rnd_mod_10(out, mdi, muo) is det.
-
-rnd_mod_10(N) -->
-	random__random(R),
-	{ N = R mod 10 }.
 
 :- pred sort_is_stable(list({int, int}), bool).
 :- mode sort_is_stable(in, out) is det.
Index: tests/hard_coded/test_bitset.exp
===================================================================
RCS file: /home/mercury1/repository/tests/hard_coded/test_bitset.exp,v
retrieving revision 1.3
diff -u -r1.3 test_bitset.exp
--- tests/hard_coded/test_bitset.exp	15 Feb 2001 11:23:27 -0000	1.3
+++ tests/hard_coded/test_bitset.exp	26 Sep 2002 03:42:12 -0000
@@ -2,6 +2,7 @@
 
 List2: -64, -61, -58, -56, -51, -49, -48, -46, -44, -40, -38, -37, -31, -30, -23, -18, -13, -4, -2, 4, 9, 11, 12, 14, 17, 21, 23, 37, 39, 42, 43, 50, 52, 53, 54, 55, 56, 57, 61, 63
 
+testing count
 count: 22 40
 testing foldl
 Sum of List1 = 471
@@ -17,70 +18,70 @@
 [-34, -19, -15, 2, 7, 19, 22, 25, 28, 29, 31, 32, 36, 38, 39, 40, 42, 44, 47, 58, 59]
 testing delete_list
 [-59, -34, -19, -15, 2, 7, 19, 22, 25, 28, 29, 31, 32, 36, 38, 40, 44, 47, 58, 59]
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
 testing difference
 testing remove_least_element
 testing delete_list
-count: 20 40
+testing count
 testing foldl
 testing union
 testing intersection
Index: tests/hard_coded/test_bitset.m
===================================================================
RCS file: /home/mercury1/repository/tests/hard_coded/test_bitset.m,v
retrieving revision 1.3
diff -u -r1.3 test_bitset.m
--- tests/hard_coded/test_bitset.m	15 Feb 2001 11:23:27 -0000	1.3
+++ tests/hard_coded/test_bitset.m	24 Sep 2002 14:57:43 -0000
@@ -60,8 +60,8 @@
 		List = List0,
 		Supply = Supply0
 	;
-		random__random(N, Supply0, Supply1),
-		RN = N rem 128 - 64,	% test negative numbers
+		% Test negative as well as positive numbers.
+		random__random(-64, 128, RN, Supply0, Supply1),
 		get_random_numbers(Num - 1, [RN | List0], List,
 			Supply1, Supply)
 	).
@@ -83,11 +83,18 @@
 	{ Set1 = bitset_tester__list_to_set(List1) },
 	{ Set2 = bitset_tester__list_to_set(List2) },
 
-	io__write_string("count: "),
-	io__write_int(count(Set1)),
-	io__write_string(" "),
-	io__write_int(count(Set2)),
-	io__nl,
+	io__write_string("testing count\n"),
+	{ Count1 = count(Set1) },
+	{ Count2 = count(Set2) },
+	( { Write = yes } ->
+		io__write_string("count: "),
+		io__write_int(Count1),
+		io__write_string(" "),
+		io__write_int(Count2),
+		io__nl
+	;
+		[]
+	),
 
 	io__write_string("testing foldl\n"),
 	{ Sum = (func(Elem, Acc) = Elem + Acc) },
--------------------------------------------------------------------------
mercury-reviews mailing list
post:  mercury-reviews at cs.mu.oz.au
administrative address: owner-mercury-reviews at cs.mu.oz.au
unsubscribe: Address: mercury-reviews-request at cs.mu.oz.au Message: unsubscribe
subscribe:   Address: mercury-reviews-request at cs.mu.oz.au Message: subscribe
--------------------------------------------------------------------------



More information about the reviews mailing list