%% /u/sy/beebe/java/random/README, Wed Mar 24 07:09:10 2004
%% Edit by Nelson H. F. Beebe
============
Introduction
============
After a talk that I gave yesterday about generation and testing of
pseudorandom numbers at the local IBM AIX User Group meeting, I was
asked about the quality of the Java random number class.
After the talk, I did some digging, and found the Java library source
code, and the source of the algorithm used. However, I could not find
a definitive set of reported tests for the generator, so this morning,
I wrote Java and C code to make simple tests, C code to generate data
for the Marsaglia Diehard test suite, and C code to test the generator
with the Marsaglia/Tsang Tuftest suite.
The good news is that Java's random number class passes almost all of
the tests. There are failures in the OQSO (Overlapping Quadruples
Sparse Occupancy) and DNA tests in the Diehard Battery suite, and in
the Gorilla tests of the Tuftest suite; the failures there suggest
poorer quality of 15 low-order bits (NB: both test suites number bits
from high-order (leftmost) (0) to lowest (right-most) (31)).
The bad news is that its documentation, packaging, and efficiency
needs improvement.
=================================
Comments on the Java Random class
=================================
The description of the Java java.util Random class that I found is on
pp. 1399--1405 of this book:
@String{pub-AW = "Ad{\-d}i{\-s}on-Wes{\-l}ey"}
@String{pub-AW:adr = "Reading, MA, USA"}
@Book{Chan:1999:JCLb,
author = "Patrick Chan and Rosanna Lee and Doug Kramer",
title = "The {Java} Class Libraries: {\tt java.io}, {\tt
java.lang}, {\tt java.math}, {\tt java.net}, {\tt
java.text}, {\tt java.util}",
volume = "1",
publisher = pub-AW,
address = pub-AW:adr,
edition = "Second",
pages = "xxvi + 2050",
year = "1999",
ISBN = "0-201-31002-3",
LCCN = "QA76.73.J38 C47 1998",
bibdate = "Mon Jun 10 13:32:26 2002",
price = "US\$59.99",
acknowledgement = ack-nhfb,
annote = "See \cite{Chan:1998:JCLb} for vol. 2.",
}
A recent online version of the description is available at
http://java.sun.com/j2se/1.5.0/docs/api/java/util/Random.html
The book and Web page describes these methods:
protected int next(int bits)
Generates the next pseudorandom number.
boolean nextBoolean()
Returns the next pseudorandom, uniformly distributed
boolean value from this random number generator's
sequence.
void nextBytes(byte[] bytes)
Generates random bytes and places them into a
user-supplied byte array.
double nextDouble()
Returns the next pseudorandom, uniformly distributed
double value between 0.0 and 1.0 from this random
number generator's sequence.
float nextFloat()
Returns the next pseudorandom, uniformly distributed
float value between 0.0 and 1.0 from this random
number generator's sequence.
double nextGaussian()
Returns the next pseudorandom, Gaussian ("normally")
distributed double value with mean 0.0 and standard
deviation 1.0 from this random number generator's
sequence.
int nextInt()
Returns the next pseudorandom, uniformly distributed
int value from this random number generator's
sequence.
int nextInt(int n)
Returns a pseudorandom, uniformly distributed int
value between 0 (inclusive) and the specified value
(exclusive), drawn from this random number generator's
sequence.
long nextLong()
Returns the next pseudorandom, uniformly distributed
long value from this random number generator's
sequence.
void setSeed(long seed)
Sets the seed of this random number generator using a
single long seed.
Notably absent from this list is a getSeed() method: well-designed
packages for pseudorandom number generation should always have such a
function or method. Without it, it is impossible to save the state of
a generator so that it can be started reproducibly.
The documentation says
The class uses a 48-bit seed, which is modified using a linear
congruential formula. ... each invocation can supply up to 32
pseudorandomly generated bits.
... nextInt() ... All 2^32 possible int values are produced
with (approximately) equal probability.
One infers that the period is 2^48, but there ought to be a clear
statement of that.
Because Java lacks unsigned integers, the integer methods produce
signed values. C/C++ packages invariably use unsigned types, since
the values merely represent bundles of pseudorandom bits. Signed
integer representations must be handled carefully: for example,
right-shifting such a value will propagate the sign bit, wiping out
the randomness of the high-order bits of the result.
Descriptions of the range of the floating-point pseudorandom routines
are careless:
public float nextFloat()
Returns the next pseudorandom, uniformly distributed float
value between 0.0 and 1.0 from this random number generator's
sequence.
The general contract of nextFloat is that one float value,
chosen (approximately) uniformly from the range 0.0f
(inclusive) to 1.0f (exclusive), is pseudorandomly generated
and returned.
...
Returns:
The next pseudorandom, uniformly distributed float value
between 0.0 and 1.0 from this random number generator's
sequence.
The range should be precisely stated as [0.0, 1.0) or as "a value x
such that (0.0 <= x) && (x < 1.0)". Whether the endpoint are included
or not is critical information when the value is used to select an
integer from a range [m, n].
If efficiency is of concern, one must criticize the Random class as
being slower than necessary by the generalization of wrapper methods
that hide the relatively simple multiply-add-modulus operation of a
linear congruential generator inside several layers of method calls.
A family of methods that returns vectors of pseudorandom numbers could
produce even faster generation, since it could completely eliminate
the method call overhead.
==========================
Files in this distribution
==========================
Here is a description of the files in this directory tree:
Makefile Unix Makefile: type "make" to build and run
all programs.
README This file.
TestRandom.java Simple Java test program to generate 25
pseudorandom numbers from nextInt() with
a specified starting seed.
logs/testrandom-diehard/Typescript.sunset.math.utah.edu.2004.03.24.06.06.27
Diehard Battery test suite results for a 10MB
test file produced by testrandom-diehard.c.
logs/testrandom-tuftests/Typescript.ibapah.math.utah.edu.2004.03.24.06.16.04
Tuftest test suite results.
testrandom-diehard.c C/C++ program to generate the Diehard Battery
test file.
testrandom.c C/C++ program to generate 25 pseudorandom
numbers that match the output of TestRandom.java.
The Diehard and Tuftest code is not included here. I have stored it
locally in /usr/local/src/diehard/die.c/ and
/u/sy/beebe/misc/prng/prng-beta-0.02/.
========================================
The origins of the Java Random algorithm
========================================
Here is a verbatim copy of electronic mail that I sent to one of the
attendees that documents what I found out about the origins of the
Java Random algorithm:
Date: Tue, 23 Mar 2004 19:36:45 -0700 (MST)
From: "Nelson H. F. Beebe"
Subject: Java random-number generator
Two of the attendees asked about the quality of the Java random-number
generator. I looked up in my Java Class Libraries, 2nd edition,
volume 1, package java.math, method Random(). It generates values in
[0.0, 1.0), that is, x such that (0.0 <= x) && (x < 1.0). However,
its documentation says nothing of its period, or of the underlying
algorithm.
I therefore had to resort to peeking inside the source, found on Sun
Solaris 9 in /usr/j2se/src.zip, member java/util/Random.java. It uses
a 48-bit linear congruential generator recommended by Donald Knuth (a
good sign), but returns only a 32-bit subset of the computed bits.
The default seed is the current time-of-day, returned by
System.currentTimeMillis().
The formula is
seed = ( 0x5DEECE66DL * seed + 0xBL ) & ((1L << 48) - 1);
This is a 31-bit value times a 32-bit value, producing a 63-bit value
that is then ANDed with a 48-bit value (a cheap, but less desirable,
way of taking the modulus). The value returned is (seed >>> 16), that
is, the upper 32 bits of the 48-bit result (another good sign: in
linear congruential generators, the low bits are less random than high
bits). The period of the returned values should be 2^48, which is
better than the 2^32 of most 32-bit generators.
The critical question is: who recommended the multiplier 0x5DEECE66DL
(25214903917 in decimal) and the addend 0xBL (11)? Neither the
multiplier nor the modulus are prime
hoc128> load("primes")
hoc128> isprime(25214903917)
0
isprime(2^48 - 1)
0
and we can easily find their factorizations:
hoc128> load("factorize")
hoc128> factorize(25214903917)
25214903917 = 7 * 443 * 739 * 11003
hoc128> factorize(2^48 - 1)
281474976710655 = 3 * 3 * 5 * 7 * 13 * 17 * 97 * 241 * 257 * 673
A google search for 0x5DEECE66DL turns up lots of references to the
Java class library, but searching for "+0x5DEECE66DL -Java" finds
nothing else: thus, it appears that this magic number is restricted to
the Java implementation.
I then tried a search for the decimal value, excluding Java, and found
the answer in some class notes:
http://nut.bu.edu/~youssef/py502/monte_carlo_supplement.ps
http://www.inf.ethz.ch/personal/gaertner/texts/own_work/random_matrices.pdf
and in some computer documentation:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/html/_rand48.3.html
The Youssef notes say:
... I can only say that 25214903917_LONG and 11_LONG have
apparently been chosen by passing a battery of such [meaning
Marsaglia's DIEHARD] tests.
... Even in the case of the 48-bit generators we are discussing
today, cas26 will generate them all in a month or two of CPU time
and then start to repeat.
The magic numbers come from the Unix C library's nrand48() function:
>> ...
>> Description
>>
>> The rand48() family of functions generates pseudo-random numbers using
>> a linear congruential algorithm working on integers 48 bits in
>> size. The particular formula employed is r(n+1) = (a * r(n) + c) mod m
>> where the default values are for the multiplicand a = 0xfdeece66d =
>> 25214903917 and the addend c = 0xb = 11. The modulo is always fixed at
>> m = 2 ** 48. r(n) is called the seed of the random number generator.
>> ...
To see if there was something better than anecdotal evidence of the
tested quality of this generator, I searched the ACM Digital Library,
which has full text searching of several million pages of ACM
journals.
I found three references that cite the number 25214903917:
1 Advanced tutorials: Software for uniform random number generation: distinguishing the good and the bad
Pierre L'Ecuyer
December 2001
Proceedings of the 33nd conference on Winter simulation
Full text available: pdf(175.96 KB)
Additional Information: full citation, abstract, references, index
terms The requirements, design principles, and statistical testing
approaches of uniform random number generators for simulation are
briefly surveyed. An object-oriented random number package where
random number streams can be created at will, and with convenient
tools for manipulating the streams, is presented. A version of this
package is now implemented in the Arena and AutoMod simulation
tools. We also test some random number generators available in popular
software environments such ...
2 Pitfalls in computing with pseudorandom determinants
Bernd Gärtner
May 2000
Proceedings of the sixteenth annual symposium on Computational geometry
Full text available: pdf(704.12 KB)
Additional Information: full citation, references, citings, index terms
3 Bad subsequences of well-known linear congruential pseudorandom number generators
Karl Entacher
January 1998
ACM Transactions on Modeling and Computer Simulation (TOMACS), Volume 8 Issue 1
Full text available: pdf(336.64 KB)
Additional Information: full citation, abstract, references, citings, index terms
We present a spectral test analysis of full-period subsequences with
small step sizes generated by well-known linear congruential
pseudorandom number generators. Subsequences may occur in certain
simulation problems or as a method to get parallel streams of
pseudorandom numbers. Applying the spectral test, it is possible to
find bad subsequences with small step sizes for almost all linear
pseudorandom number generators currently in use.
Keywords: lattice structure, linear congruential generator, parallel
pseudorandom number generator, random number generation, spectral
test, stochastic simulation
The first is by a very reputable researcher (the one at the University
of Montreal that I mentioned in my talk today). I downloaded the
three papers and looked at them:
L'Ecuyer says:
We consider here the following widely-used generators.
Java. This is the generator used to implement the method nextDouble in
the class java.util.Random of the Java standard library (see
http://java.sun.com/j2se/1.3/docs/ api/java/util/Random.html) It is
based on a linear recurrence with period length 248, but each output
value is constructed by taking two successive values from the linear
recurrence, as follows:
x_{i+1} = (25214903917 x_i + 11) mod 2^48
u_i = (2^27 floor(x_{2i}/2^22) + floor(x_{2i}+1/2^21) )/2^53.
Note that the generator rand48 in the Unix standard library uses
exactly the same recurrence, but produces its output simply via
u_i = x_i/2^48.
...
4.2 Results of the Collision Test
"For the generators not given in the table, namely Java,
MRG32k3a, and MT19937, none of the results were suspicious."
So far, so good. For the next test in his test suite, he says:
4.3 Results of the Birthday Spacings Test
The Java, VB, Excel, and LCG16807 generators start failing
decisively with n = 2^18, 2^10, 2^14, and 2^14,
respectively. These are quite small numbers of points.
Not so good.
Gärtner says:
... drand48 has full period (2^48).
...
For example, to get a sequence of pseudorandom float values,
we might generate a sequence of double values using drand48,
and take the 24 least significant bits of each value. In this
case, m is a multiple of k, and the following theorem shows
that this is a bad choice for generating pseudorandom points
in d-dimensional space.
It looks to me like the Java Random() function is reasonably good, but
tomorrow, I'm going to run it through the Diehard Battery and Tuftest
suites to see what they report.