%% /u/sy/beebe/java/random/README, Wed Mar 24 07:09:10 2004
%% Edit by Nelson H. F. Beebe <beebe@math.utah.edu>

============
Introduction
============

After a talk that I gave yesterday about generation and testing of
pseudorandom numbers at the local IBM AIX User Group meeting, I was
asked about the quality of the Java random number class.

After the talk, I did some digging, and found the Java library source
code, and the source of the algorithm used.  However, I could not find
a definitive set of reported tests for the generator, so this morning,
I wrote Java and C code to make simple tests, C code to generate data
for the Marsaglia Diehard test suite, and C code to test the generator
with the Marsaglia/Tsang Tuftest suite.

The good news is that Java's random number class passes almost all of
the tests.  There are failures in the OQSO (Overlapping Quadruples
Sparse Occupancy) and DNA tests in the Diehard Battery suite, and in
the Gorilla tests of the Tuftest suite; the failures there suggest
poorer quality of 15 low-order bits (NB: both test suites number bits
from high-order (leftmost) (0) to lowest (right-most) (31)).

The bad news is that its documentation, packaging, and efficiency
needs improvement.


=================================
Comments on the Java Random class
=================================

The description of the Java java.util Random class that I found is on
pp. 1399--1405 of this book:

@String{pub-AW                  = "Ad{\-d}i{\-s}on-Wes{\-l}ey"}
@String{pub-AW:adr              = "Reading, MA, USA"}

@Book{Chan:1999:JCLb,
  author =       "Patrick Chan and Rosanna Lee and Doug Kramer",
  title =        "The {Java} Class Libraries: {\tt java.io}, {\tt
                 java.lang}, {\tt java.math}, {\tt java.net}, {\tt
                 java.text}, {\tt java.util}",
  volume =       "1",
  publisher =    pub-AW,
  address =      pub-AW:adr,
  edition =      "Second",
  pages =        "xxvi + 2050",
  year =         "1999",
  ISBN =         "0-201-31002-3",
  LCCN =         "QA76.73.J38 C47 1998",
  bibdate =      "Mon Jun 10 13:32:26 2002",
  price =        "US\$59.99",
  acknowledgement = ack-nhfb,
  annote =       "See \cite{Chan:1998:JCLb} for vol. 2.",
}

A recent online version of the description is available at

	http://java.sun.com/j2se/1.5.0/docs/api/java/util/Random.html

The book and Web page describes these methods:

	protected int 	next(int bits)
			Generates the next pseudorandom number.

	boolean 	nextBoolean()
			Returns the next pseudorandom, uniformly distributed
			boolean value from this random number generator's
			sequence.

	void	 	nextBytes(byte[] bytes)
			Generates random bytes and places them into a
			user-supplied byte array.

	double 		nextDouble()
			Returns the next pseudorandom, uniformly distributed
			double value between 0.0 and 1.0 from this random
			number generator's sequence.

	float 		nextFloat()
			Returns the next pseudorandom, uniformly distributed
			float value between 0.0 and 1.0 from this random
			number generator's sequence.

	double 		nextGaussian()
			Returns the next pseudorandom, Gaussian ("normally")
			distributed double value with mean 0.0 and standard
			deviation 1.0 from this random number generator's
			sequence.

	int 		nextInt()
			Returns the next pseudorandom, uniformly distributed
			int value from this random number generator's
			sequence.

	int	 	nextInt(int n)
			Returns a pseudorandom, uniformly distributed int
			value between 0 (inclusive) and the specified value
			(exclusive), drawn from this random number generator's
			sequence.

	long 		nextLong()
			Returns the next pseudorandom, uniformly distributed
			long value from this random number generator's
			sequence.

	void 		setSeed(long seed)
			Sets the seed of this random number generator using a
			single long seed.

Notably absent from this list is a getSeed() method: well-designed
packages for pseudorandom number generation should always have such a
function or method.  Without it, it is impossible to save the state of
a generator so that it can be started reproducibly.

The documentation says

	The class uses a 48-bit seed, which is modified using a linear
	congruential formula. ... each invocation can supply up to 32
	pseudorandomly generated bits.

	... nextInt() ... All 2^32 possible int values are produced
	with (approximately) equal probability.

One infers that the period is 2^48, but there ought to be a clear
statement of that.

Because Java lacks unsigned integers, the integer methods produce
signed values.  C/C++ packages invariably use unsigned types, since
the values merely represent bundles of pseudorandom bits.  Signed
integer representations must be handled carefully: for example,
right-shifting such a value will propagate the sign bit, wiping out
the randomness of the high-order bits of the result.

Descriptions of the range of the floating-point pseudorandom routines
are careless:

	public float nextFloat()

		Returns the next pseudorandom, uniformly distributed float
		value between 0.0 and 1.0 from this random number generator's
		sequence.

		The general contract of nextFloat is that one float value,
		chosen (approximately) uniformly from the range 0.0f
		(inclusive) to 1.0f (exclusive), is pseudorandomly generated
		and returned.

	...

	Returns:
		The next pseudorandom, uniformly distributed float value
		between 0.0 and 1.0 from this random number generator's
		sequence.

The range should be precisely stated as [0.0, 1.0) or as "a value x
such that (0.0 <= x) && (x < 1.0)".  Whether the endpoint are included
or not is critical information when the value is used to select an
integer from a range [m, n].

If efficiency is of concern, one must criticize the Random class as
being slower than necessary by the generalization of wrapper methods
that hide the relatively simple multiply-add-modulus operation of a
linear congruential generator inside several layers of method calls.

A family of methods that returns vectors of pseudorandom numbers could
produce even faster generation, since it could completely eliminate
the method call overhead.


==========================
Files in this distribution
==========================

Here is a description of the files in this directory tree:

Makefile		Unix Makefile: type "make" to build and run
			all programs.

README			This file.

TestRandom.java		Simple Java test program to generate 25
			pseudorandom numbers from nextInt() with
			a specified starting seed.

logs/testrandom-diehard/Typescript.sunset.math.utah.edu.2004.03.24.06.06.27
			Diehard Battery test suite results for a 10MB
			test file produced by testrandom-diehard.c.

logs/testrandom-tuftests/Typescript.ibapah.math.utah.edu.2004.03.24.06.16.04
			Tuftest test suite results.

testrandom-diehard.c	C/C++ program to generate the Diehard Battery
			test file.

testrandom.c		C/C++ program to generate 25 pseudorandom
			numbers that match the output of TestRandom.java.

The Diehard and Tuftest code is not included here.  I have stored it
locally in /usr/local/src/diehard/die.c/ and
/u/sy/beebe/misc/prng/prng-beta-0.02/.


========================================
The origins of the Java Random algorithm
========================================

Here is a verbatim copy of electronic mail that I sent to one of the
attendees that documents what I found out about the origins of the
Java Random algorithm:

Date: Tue, 23 Mar 2004 19:36:45 -0700 (MST)
From: "Nelson H. F. Beebe" <beebe@math.utah.edu>
Subject: Java random-number generator

Two of the attendees asked about the quality of the Java random-number
generator.  I looked up in my Java Class Libraries, 2nd edition,
volume 1, package java.math, method Random().  It generates values in
[0.0, 1.0), that is, x such that (0.0 <= x) && (x < 1.0).  However,
its documentation says nothing of its period, or of the underlying
algorithm.

I therefore had to resort to peeking inside the source, found on Sun
Solaris 9 in /usr/j2se/src.zip, member java/util/Random.java.  It uses
a 48-bit linear congruential generator recommended by Donald Knuth (a
good sign), but returns only a 32-bit subset of the computed bits.

The default seed is the current time-of-day, returned by
System.currentTimeMillis().

The formula is

	seed = ( 0x5DEECE66DL * seed + 0xBL ) & ((1L << 48) - 1);

This is a 31-bit value times a 32-bit value, producing a 63-bit value
that is then ANDed with a 48-bit value (a cheap, but less desirable,
way of taking the modulus).  The value returned is (seed >>> 16), that
is, the upper 32 bits of the 48-bit result (another good sign: in
linear congruential generators, the low bits are less random than high
bits).  The period of the returned values should be 2^48, which is
better than the 2^32 of most 32-bit generators.

The critical question is: who recommended the multiplier 0x5DEECE66DL
(25214903917 in decimal) and the addend 0xBL (11)?  Neither the
multiplier nor the modulus are prime

	hoc128> load("primes")

	hoc128> isprime(25214903917)
	0

	isprime(2^48 - 1)
	0

and we can easily find their factorizations:

	hoc128> load("factorize")

	hoc128> factorize(25214903917)
	25214903917 = 7 * 443 * 739 * 11003

	hoc128> factorize(2^48 - 1)
	281474976710655 = 3 * 3 * 5 * 7 * 13 * 17 * 97 * 241 * 257 * 673

A google search for 0x5DEECE66DL turns up lots of references to the
Java class library, but searching for "+0x5DEECE66DL -Java" finds
nothing else: thus, it appears that this magic number is restricted to
the Java implementation.

I then tried a search for the decimal value, excluding Java, and found
the answer in some class notes:

	http://nut.bu.edu/~youssef/py502/monte_carlo_supplement.ps
	http://www.inf.ethz.ch/personal/gaertner/texts/own_work/random_matrices.pdf

and in some computer documentation:

	http://developer.apple.com/documentation/Darwin/Reference/ManPages/html/_rand48.3.html

The Youssef notes say:

    ... I can only say that 25214903917_LONG and 11_LONG have
    apparently been chosen by passing a battery of such [meaning
    Marsaglia's DIEHARD] tests.

    ... Even in the case of the 48-bit generators we are discussing
    today, cas26 will generate them all in a month or two of CPU time
    and then start to repeat.

The magic numbers come from the Unix C library's nrand48() function:

>> ...
>> Description
>>
>> The rand48() family of functions generates pseudo-random numbers using
>> a linear congruential algorithm working on integers 48 bits in
>> size. The particular formula employed is r(n+1) = (a * r(n) + c) mod m
>> where the default values are for the multiplicand a = 0xfdeece66d =
>> 25214903917 and the addend c = 0xb = 11. The modulo is always fixed at
>> m = 2 ** 48. r(n) is called the seed of the random number generator.
>> ...

To see if there was something better than anecdotal evidence of the
tested quality of this generator, I searched the ACM Digital Library,
which has full text searching of several million pages of ACM
journals.

I found three references that cite the number 25214903917:

	1 Advanced tutorials: Software for uniform random number generation: distinguishing the good and the bad
	Pierre L'Ecuyer
	December 2001
	Proceedings of the 33nd conference on Winter simulation
	Full text available: 	pdf(175.96 KB)

	Additional Information: full citation, abstract, references, index
	terms The requirements, design principles, and statistical testing
	approaches of uniform random number generators for simulation are
	briefly surveyed. An object-oriented random number package where
	random number streams can be created at will, and with convenient
	tools for manipulating the streams, is presented. A version of this
	package is now implemented in the Arena and AutoMod simulation
	tools. We also test some random number generators available in popular
	software environments such ...


	2 Pitfalls in computing with pseudorandom determinants
	Bernd Grtner
	May 2000
	Proceedings of the sixteenth annual symposium on Computational geometry
	Full text available: 	pdf(704.12 KB)

	Additional Information:	full citation, references, citings, index terms

	3 Bad subsequences of well-known linear congruential pseudorandom number generators
	Karl Entacher
	January 1998
	ACM Transactions on Modeling and Computer Simulation (TOMACS),  Volume 8 Issue 1
	Full text available: 	pdf(336.64 KB)

	Additional Information:	full citation, abstract, references, citings, index terms

	We present a spectral test analysis of full-period subsequences with
	small step sizes generated by well-known linear congruential
	pseudorandom number generators. Subsequences may occur in certain
	simulation problems or as a method to get parallel streams of
	pseudorandom numbers. Applying the spectral test, it is possible to
	find bad subsequences with small step sizes for almost all linear
	pseudorandom number generators currently in use.

	Keywords: lattice structure, linear congruential generator, parallel
	pseudorandom number generator, random number generation, spectral
	test, stochastic simulation

The first is by a very reputable researcher (the one at the University
of Montreal that I mentioned in my talk today).  I downloaded the
three papers and looked at them:

L'Ecuyer says:

	We consider here the following widely-used generators.

	Java. This is the generator used to implement the method nextDouble in
	the class java.util.Random of the Java standard library (see
	http://java.sun.com/j2se/1.3/docs/ api/java/util/Random.html) It is
	based on a linear recurrence with period length 248, but each output
	value is constructed by taking two successive values from the linear
	recurrence, as follows:

		x_{i+1} = (25214903917 x_i + 11) mod 2^48
		u_i = (2^27 floor(x_{2i}/2^22) + floor(x_{2i}+1/2^21) )/2^53.

	Note that the generator rand48 in the Unix standard library uses
	exactly the same recurrence, but produces its output simply via

		u_i = x_i/2^48.

	...

	4.2 Results of the Collision Test

	 "For the generators not given in the table, namely Java,
	 MRG32k3a, and MT19937, none of the results were suspicious."

So far, so good.  For the next test in his test suite, he says:

	4.3 Results of the Birthday Spacings Test

	The Java, VB, Excel, and LCG16807 generators start failing
	decisively with n = 2^18, 2^10, 2^14, and 2^14,
	respectively. These are quite small numbers of points.

Not so good.

Grtner says:

	... drand48 has full period (2^48).

	...

	For example, to get a sequence of pseudorandom float values,
	we might generate a sequence of double values using drand48,
	and take the 24 least significant bits of each value. In this
	case, m is a multiple of k, and the following theorem shows
	that this is a bad choice for generating pseudorandom points
	in d-dimensional space.

It looks to me like the Java Random() function is reasonably good, but
tomorrow, I'm going to run it through the Diehard Battery and Tuftest
suites to see what they report.
