%% /u/sy/beebe/java/random/README, Wed Mar 24 07:09:10 2004 %% Edit by Nelson H. F. Beebe ============ Introduction ============ After a talk that I gave yesterday about generation and testing of pseudorandom numbers at the local IBM AIX User Group meeting, I was asked about the quality of the Java random number class. After the talk, I did some digging, and found the Java library source code, and the source of the algorithm used. However, I could not find a definitive set of reported tests for the generator, so this morning, I wrote Java and C code to make simple tests, C code to generate data for the Marsaglia Diehard test suite, and C code to test the generator with the Marsaglia/Tsang Tuftest suite. The good news is that Java's random number class passes almost all of the tests. There are failures in the OQSO (Overlapping Quadruples Sparse Occupancy) and DNA tests in the Diehard Battery suite, and in the Gorilla tests of the Tuftest suite; the failures there suggest poorer quality of 15 low-order bits (NB: both test suites number bits from high-order (leftmost) (0) to lowest (right-most) (31)). The bad news is that its documentation, packaging, and efficiency needs improvement. ================================= Comments on the Java Random class ================================= The description of the Java java.util Random class that I found is on pp. 1399--1405 of this book: @String{pub-AW = "Ad{\-d}i{\-s}on-Wes{\-l}ey"} @String{pub-AW:adr = "Reading, MA, USA"} @Book{Chan:1999:JCLb, author = "Patrick Chan and Rosanna Lee and Doug Kramer", title = "The {Java} Class Libraries: {\tt java.io}, {\tt java.lang}, {\tt java.math}, {\tt java.net}, {\tt java.text}, {\tt java.util}", volume = "1", publisher = pub-AW, address = pub-AW:adr, edition = "Second", pages = "xxvi + 2050", year = "1999", ISBN = "0-201-31002-3", LCCN = "QA76.73.J38 C47 1998", bibdate = "Mon Jun 10 13:32:26 2002", price = "US\$59.99", acknowledgement = ack-nhfb, annote = "See \cite{Chan:1998:JCLb} for vol. 2.", } A recent online version of the description is available at http://java.sun.com/j2se/1.5.0/docs/api/java/util/Random.html The book and Web page describes these methods: protected int next(int bits) Generates the next pseudorandom number. boolean nextBoolean() Returns the next pseudorandom, uniformly distributed boolean value from this random number generator's sequence. void nextBytes(byte[] bytes) Generates random bytes and places them into a user-supplied byte array. double nextDouble() Returns the next pseudorandom, uniformly distributed double value between 0.0 and 1.0 from this random number generator's sequence. float nextFloat() Returns the next pseudorandom, uniformly distributed float value between 0.0 and 1.0 from this random number generator's sequence. double nextGaussian() Returns the next pseudorandom, Gaussian ("normally") distributed double value with mean 0.0 and standard deviation 1.0 from this random number generator's sequence. int nextInt() Returns the next pseudorandom, uniformly distributed int value from this random number generator's sequence. int nextInt(int n) Returns a pseudorandom, uniformly distributed int value between 0 (inclusive) and the specified value (exclusive), drawn from this random number generator's sequence. long nextLong() Returns the next pseudorandom, uniformly distributed long value from this random number generator's sequence. void setSeed(long seed) Sets the seed of this random number generator using a single long seed. Notably absent from this list is a getSeed() method: well-designed packages for pseudorandom number generation should always have such a function or method. Without it, it is impossible to save the state of a generator so that it can be started reproducibly. The documentation says The class uses a 48-bit seed, which is modified using a linear congruential formula. ... each invocation can supply up to 32 pseudorandomly generated bits. ... nextInt() ... All 2^32 possible int values are produced with (approximately) equal probability. One infers that the period is 2^48, but there ought to be a clear statement of that. Because Java lacks unsigned integers, the integer methods produce signed values. C/C++ packages invariably use unsigned types, since the values merely represent bundles of pseudorandom bits. Signed integer representations must be handled carefully: for example, right-shifting such a value will propagate the sign bit, wiping out the randomness of the high-order bits of the result. Descriptions of the range of the floating-point pseudorandom routines are careless: public float nextFloat() Returns the next pseudorandom, uniformly distributed float value between 0.0 and 1.0 from this random number generator's sequence. The general contract of nextFloat is that one float value, chosen (approximately) uniformly from the range 0.0f (inclusive) to 1.0f (exclusive), is pseudorandomly generated and returned. ... Returns: The next pseudorandom, uniformly distributed float value between 0.0 and 1.0 from this random number generator's sequence. The range should be precisely stated as [0.0, 1.0) or as "a value x such that (0.0 <= x) && (x < 1.0)". Whether the endpoint are included or not is critical information when the value is used to select an integer from a range [m, n]. If efficiency is of concern, one must criticize the Random class as being slower than necessary by the generalization of wrapper methods that hide the relatively simple multiply-add-modulus operation of a linear congruential generator inside several layers of method calls. A family of methods that returns vectors of pseudorandom numbers could produce even faster generation, since it could completely eliminate the method call overhead. ========================== Files in this distribution ========================== Here is a description of the files in this directory tree: Makefile Unix Makefile: type "make" to build and run all programs. README This file. TestRandom.java Simple Java test program to generate 25 pseudorandom numbers from nextInt() with a specified starting seed. logs/testrandom-diehard/Typescript.sunset.math.utah.edu.2004.03.24.06.06.27 Diehard Battery test suite results for a 10MB test file produced by testrandom-diehard.c. logs/testrandom-tuftests/Typescript.ibapah.math.utah.edu.2004.03.24.06.16.04 Tuftest test suite results. testrandom-diehard.c C/C++ program to generate the Diehard Battery test file. testrandom.c C/C++ program to generate 25 pseudorandom numbers that match the output of TestRandom.java. The Diehard and Tuftest code is not included here. I have stored it locally in /usr/local/src/diehard/die.c/ and /u/sy/beebe/misc/prng/prng-beta-0.02/. ======================================== The origins of the Java Random algorithm ======================================== Here is a verbatim copy of electronic mail that I sent to one of the attendees that documents what I found out about the origins of the Java Random algorithm: Date: Tue, 23 Mar 2004 19:36:45 -0700 (MST) From: "Nelson H. F. Beebe" Subject: Java random-number generator Two of the attendees asked about the quality of the Java random-number generator. I looked up in my Java Class Libraries, 2nd edition, volume 1, package java.math, method Random(). It generates values in [0.0, 1.0), that is, x such that (0.0 <= x) && (x < 1.0). However, its documentation says nothing of its period, or of the underlying algorithm. I therefore had to resort to peeking inside the source, found on Sun Solaris 9 in /usr/j2se/src.zip, member java/util/Random.java. It uses a 48-bit linear congruential generator recommended by Donald Knuth (a good sign), but returns only a 32-bit subset of the computed bits. The default seed is the current time-of-day, returned by System.currentTimeMillis(). The formula is seed = ( 0x5DEECE66DL * seed + 0xBL ) & ((1L << 48) - 1); This is a 31-bit value times a 32-bit value, producing a 63-bit value that is then ANDed with a 48-bit value (a cheap, but less desirable, way of taking the modulus). The value returned is (seed >>> 16), that is, the upper 32 bits of the 48-bit result (another good sign: in linear congruential generators, the low bits are less random than high bits). The period of the returned values should be 2^48, which is better than the 2^32 of most 32-bit generators. The critical question is: who recommended the multiplier 0x5DEECE66DL (25214903917 in decimal) and the addend 0xBL (11)? Neither the multiplier nor the modulus are prime hoc128> load("primes") hoc128> isprime(25214903917) 0 isprime(2^48 - 1) 0 and we can easily find their factorizations: hoc128> load("factorize") hoc128> factorize(25214903917) 25214903917 = 7 * 443 * 739 * 11003 hoc128> factorize(2^48 - 1) 281474976710655 = 3 * 3 * 5 * 7 * 13 * 17 * 97 * 241 * 257 * 673 A google search for 0x5DEECE66DL turns up lots of references to the Java class library, but searching for "+0x5DEECE66DL -Java" finds nothing else: thus, it appears that this magic number is restricted to the Java implementation. I then tried a search for the decimal value, excluding Java, and found the answer in some class notes: http://nut.bu.edu/~youssef/py502/monte_carlo_supplement.ps http://www.inf.ethz.ch/personal/gaertner/texts/own_work/random_matrices.pdf and in some computer documentation: http://developer.apple.com/documentation/Darwin/Reference/ManPages/html/_rand48.3.html The Youssef notes say: ... I can only say that 25214903917_LONG and 11_LONG have apparently been chosen by passing a battery of such [meaning Marsaglia's DIEHARD] tests. ... Even in the case of the 48-bit generators we are discussing today, cas26 will generate them all in a month or two of CPU time and then start to repeat. The magic numbers come from the Unix C library's nrand48() function: >> ... >> Description >> >> The rand48() family of functions generates pseudo-random numbers using >> a linear congruential algorithm working on integers 48 bits in >> size. The particular formula employed is r(n+1) = (a * r(n) + c) mod m >> where the default values are for the multiplicand a = 0xfdeece66d = >> 25214903917 and the addend c = 0xb = 11. The modulo is always fixed at >> m = 2 ** 48. r(n) is called the seed of the random number generator. >> ... To see if there was something better than anecdotal evidence of the tested quality of this generator, I searched the ACM Digital Library, which has full text searching of several million pages of ACM journals. I found three references that cite the number 25214903917: 1 Advanced tutorials: Software for uniform random number generation: distinguishing the good and the bad Pierre L'Ecuyer December 2001 Proceedings of the 33nd conference on Winter simulation Full text available: pdf(175.96 KB) Additional Information: full citation, abstract, references, index terms The requirements, design principles, and statistical testing approaches of uniform random number generators for simulation are briefly surveyed. An object-oriented random number package where random number streams can be created at will, and with convenient tools for manipulating the streams, is presented. A version of this package is now implemented in the Arena and AutoMod simulation tools. We also test some random number generators available in popular software environments such ... 2 Pitfalls in computing with pseudorandom determinants Bernd Gärtner May 2000 Proceedings of the sixteenth annual symposium on Computational geometry Full text available: pdf(704.12 KB) Additional Information: full citation, references, citings, index terms 3 Bad subsequences of well-known linear congruential pseudorandom number generators Karl Entacher January 1998 ACM Transactions on Modeling and Computer Simulation (TOMACS), Volume 8 Issue 1 Full text available: pdf(336.64 KB) Additional Information: full citation, abstract, references, citings, index terms We present a spectral test analysis of full-period subsequences with small step sizes generated by well-known linear congruential pseudorandom number generators. Subsequences may occur in certain simulation problems or as a method to get parallel streams of pseudorandom numbers. Applying the spectral test, it is possible to find bad subsequences with small step sizes for almost all linear pseudorandom number generators currently in use. Keywords: lattice structure, linear congruential generator, parallel pseudorandom number generator, random number generation, spectral test, stochastic simulation The first is by a very reputable researcher (the one at the University of Montreal that I mentioned in my talk today). I downloaded the three papers and looked at them: L'Ecuyer says: We consider here the following widely-used generators. Java. This is the generator used to implement the method nextDouble in the class java.util.Random of the Java standard library (see http://java.sun.com/j2se/1.3/docs/ api/java/util/Random.html) It is based on a linear recurrence with period length 248, but each output value is constructed by taking two successive values from the linear recurrence, as follows: x_{i+1} = (25214903917 x_i + 11) mod 2^48 u_i = (2^27 floor(x_{2i}/2^22) + floor(x_{2i}+1/2^21) )/2^53. Note that the generator rand48 in the Unix standard library uses exactly the same recurrence, but produces its output simply via u_i = x_i/2^48. ... 4.2 Results of the Collision Test "For the generators not given in the table, namely Java, MRG32k3a, and MT19937, none of the results were suspicious." So far, so good. For the next test in his test suite, he says: 4.3 Results of the Birthday Spacings Test The Java, VB, Excel, and LCG16807 generators start failing decisively with n = 2^18, 2^10, 2^14, and 2^14, respectively. These are quite small numbers of points. Not so good. Gärtner says: ... drand48 has full period (2^48). ... For example, to get a sequence of pseudorandom float values, we might generate a sequence of double values using drand48, and take the 24 least significant bits of each value. In this case, m is a multiple of k, and the following theorem shows that this is a bad choice for generating pseudorandom points in d-dimensional space. It looks to me like the Java Random() function is reasonably good, but tomorrow, I'm going to run it through the Diehard Battery and Tuftest suites to see what they report.