Random numbers

Introduction
Problem set 1
Statistical tests: the expection
The chi-square test
Problem set 2
References

Introduction

Our goal here is to learn something about sequences of random numbers and how they are generated using a computer. One should in fact view the last part of the preceding statement with suspicion. How can a computer, which operates deterministically, produce a sequence of random numbers? The answer is that it cannot. What it can produce is sequences of numbers whose properties are similar to those of genuine random number sequences. These sequences are called "pseudo-random".

One of the simplest random processes is coin tossing. Here is the record of one experiment, in which a coin was tossed ten times, with H representing heads and T representing tails. Here is one such sequence:

  H T T T T H T H T H

In ten tosses of the coin there were six tails, four heads, and there was a run of four tails. The "expected" number of heads is five. However, one must be careful about what this means. It does not mean that in 2N tosses the number of heads will be N. However, the probability that the number of heads will be close to N is much higher than the probability that it will be far from N, particularly if N is large. Note also that "low probability events" occur frequently. For example, the probability of a run of four tails in four tosses occurs with probability 1/16.

Various algorithms habe been proposed for simulationg coin tossing and other random processes. One of these is the linear congruential method. To explain it, consider the function

  f(x) = ax + b (mod m)

To compute a number k mod m, we divide k by m and take the remainder. For example, 23 mod 7 = 2. More precisely, we say that two numbers a and b are congruent modulo m if a - b is divisible by m. Thus -1 is congruent to 6 mod 7. In particular, every number mod 7 is congruent to a number in the range 0..6. The theory of conguences is due to Gauss.

We can use the function f to generate a sequence of numbers by choosing a "seed" x_0, then setting

  x_{n+1} = f(x_n)

Thus the sequence { x_n } is generated by a dynamical system.

Let us see how this works when f(x) = 5x + 1 (mod 7), with x_0 = 1. We find

   { x_n } = { 0, 1, 6, 3, 2, 4, 0 }

We see that x_6 = x_0, so the sequence is actually a cycle of length 6. The number 5 mod 7 is missing; it fits in a cycle of length one, since

  f(5) = 5

In other words, 5 is a fixed point of f.

We can try to simulate coin tossing by reducing the sequence { x_n } mod 2, where 0 represents heads and 1 represents tails. Calling this sequence { y_n } and using only the 6 first elements we have

  { y_n } = { 0, 1, 0, 1, 0, 0 }

Of course the seed x_0 = 5 would not lead to a very interesting sequence.

Problem Set 1

Devise a program which implements the linear congruential method. It should produce output like this:

  f(x) = 21x + 1 (mod 101)
  Seed: 11
  N: 10

        x_1:             30
        x_2:             25
        x_3:             21
        x_4:             38
        x_5:             92
        x_6:             14
        x_7:             93
        x_8:             35
        x_9:             29
        x_10:             4

Your program should automatically print out the current values of a, b, and m.

Determine the cycle structure for f(x) = 21x + 1 (mod 101). What is the longest cycle?
Use your program to simulate coin tossing and evaluate the results. Does it do a good job? Answer this last question in as detailed and quantitative way as possible.
By experimentation or other means find what you think is the best random number generator f(x) = ax + b (mod m) with m < 101. Explain your conclusions.

Statistical tests: the expected value

Let us consider in more detail the question of whether the linear congruential method gives good "random" sequences for a given triple (a,b,m). Consider, for example, the following simulation of coin tossing:

  f(x) = 32x + 1 (mod 103)
  Seed: 5
  N: 50

  0  1  1  1  1  0  1  0  1  1
  0  0  1  0  1  0  0  0  1  1
  1  1  0  0  1  0  0  1  1  0
  0  0  0  0  0  0  0  1  0  0
  0  1  0  1  0  1  0  0  1  1

There are 21 tails (1's) in a sequence of 50 tosses. (You can quickly compute this by adding up the ones in each of the ten columns). This seems a bit low compared to the expected value of 25 tails. However, we know that there is a great deal of variability in random process. Thus we expect the results to be close to the expected value more often than not. It would be a remarkable coincidence if the expected value was observed in each trial of a random process. The question, therefore is whether 21 tails is close enough to the expected value of 25 to be considered a relatively likely occurence.

To answer this question we need to know a little bit about random variables. Consider first the variable

  X = number of tails in one coin toss

The value of this variable for the experiment "toss a coin once" is either 0 or 1, with each alternative occuring with probability 1/2. The expected value of X is

  E(X) = (1/2)(0) + (1/2)(1) = 1/2

We multiply each possible value of X by the probability of the occurence of that value, then add the results. Consider next the random variable

  Y = number of tails in N coin tosses

The possible values of Y are 0, ..., N, and one can show that

  E(Y) = N E(X)

The essential point is that Y totals up the result of N independent coin tosses, i.e., N independent measurements of X.

We can think about the previous result this way. Let X_1, X_2, etc. represent the outcome of the first, second, etc., toss, then

  Y = X_1 + X_2 + ... + X_N

If X_1, X_2, are independent, then

  E(Y) = E(X_1) + E(X_2) + ... + E(X_N) = N E(X).

For the last equality we use that fact that X_i and X have the same statistical properties.

The next notion we need is that of the standard deviation of a random variable. This is a measure of the extent to which values of the random variable are concentrated near the expected value. If X takes values x_1, ... x_k with probabilities p_1, ... p_k, and if e is the expected value, then the standard deviation is

  SD(X) = square root of  p_1( x_1 - e )^2 + .. + p_k( x_k - e )^2

Let us calculate the standard deviation of X = number of tails in one coin toss:

  SD(X) = square root of (1/2)( 0 - 1/2 )^2 + (1/2)( 1 - 1/2 )^2
        = square root of (1/2)(1/4) + (1/2)(1/4)
        = 1/2

When Y = X_1 + ... X_N is a sum of N independent random variables with the same probability distribution, we have

  SD(Y) = sqrt(N) SD(X)

Thus, for

  Y = number of tails in 50 coin tosses

we have

  SD(Y) = sqrt(50)(1/2) ~ 3.5

Whenever Y = X_1 + ... + X_N, where the X_i are independent and identically distributed, we have the following facts, provided that N is large enough:

  The probability that  E - SD < X < E + SD
  is about 68%

  The probability that  E - 2 SD < X < E + 2 SD
  is about 95%

These facts are consequences of the central limit theorem and of the normal distribution, both discovered by Gauss . They assert that most occurences of X lie within one or two standared deviations of the expected value. In the case at hand the one-standard deviation interval is

   [ 25 - 2.5, 25 + 3.5 ] = [ 21.5, 28.5 ]

The value of Y we observed is just outside this interval but well within the two-standard deviation interval. It is thus what we might expect of a random number generator. We shall therefore give it a provisional pass.

Problem A: Use the methods discussed above to test the generator f(x) = 21x + 1 (mod 101) with other seed values. Does it still get a provisional pass?

The Chi-square test

A pseudo-random sequence should imitate random sequences in other respects besides behaving well with respect to the expected value. Consider, for example, the sequence

  0 1 0 1 0 1 0 1 0 1

The value of Y is 5, which is the expected value. However, the regular alternation of 0's and 1's is remarkable and we do not see it often in real random sequences.

We can use this remark as the basis for another test using the same data, namely,

  0  1  1  1  1  0  1  0  1  1
  0  0  1  0  1  0  0  0  1  1
  1  1  0  0  1  0  0  1  1  0
  0  0  0  0  0  0  0  1  0  0
  0  1  0  1  0  1  0  0  1  1

Reading across in groups of two we find the following statistics:

   Pattern    Frequency
  ----------------------
      00         8
      01         6
      10         6
      11         5
  ----------------------

The question is, once again, are these statistics that we would expect to find from a random sequence of zeros and ones, or do they indicate a bias, say, towards the pattern 00?

To answer this question we shall try to measure the deviation of our experiment from the expected result. In 50 trials there are 25 patterns, each of which is equally likely. Thus the "expected" number of occurences of each pattern is 25/4 = 6.25. A measure of the deviation from the ideal is

    ( 8 - 6.25 )^2 + ( 6 - 6.25 )^2 + ( 6 - 6.25 )^2 + ( 5 - 6.25 )^2
    --------------   --------------   --------------   --------------
         6.25            6.25              6.25            6.25

This is the chi-square statistic K, and we compute

   K = 0.76

This seems fairly small, and it is. To understand why, we need to understand that K is a random variable with its own probability distribution. In the case at hand it is the " chi-square distribution with 3 degrees of freedom". The number of degrees of freedom is one less than the number of possible outcomes, in this case, the patterns. Thus we can make tables that give information such as

   Probability( K <= 1.42 ) = 0.7

In fact, this is a correct statement for the chi-square distribution with three degrees of freedom. We read it from the table below:

                               Chi-square table
  --------------------------------------------------------------------
  Degrees
    of
  Freedom   99%      95%    90%    70%   50%   30%   10%   5%     1%
  --------------------------------------------------------------------
     1     0.00016  0.0039  0.016  0.15  0.46  1.07  2.71  3.84   6.64
     2     0.020    0.10    0.21   0.71  1.39  2.41  4.60  5.99   9.21
     3     0.12     0.35    0.58   1.42  2.37  3.67  6.25  7.82  11.34
     4     0.30     0.71    1.06   2.20  3.36  4.88  7.78  9.49  15.09
  --------------------------------------------------------------------

In fact, since our value of K is closer to the 90% value of 0.58, we know that the probability that K <= 0.76 is closer to 90%. The high probability associated with our value of K suggests that we give provisional approval to our random number generator.

Another way of gaining some understanding of what we have just done is to think about the chi-square statistic in an atypical case, say, when the data is 01010101..., so that only the pattern 01 occurs in fifty tosses of a coin. The total number of heads is 25, as expected, so our sequence passes the test of the previous section. Its chi-square statistic is

   ( 0 - 6.25 )^2 + ( 25 - 6.25 )^2 + ( 0 - 6.25 )^2 + ( 0 - 6.25 )^2
   --------------   --------------   --------------   --------------
        6.25            6.25              6.25            6.25

which give

  K = 56.25

The probability of such a K occuring as the result of a random process is, according to our table, less than 1%.

Problem B: Study the random number generator with f(x) = 21x + 1 (mod 101) using the chi-square statistic. Does it pass or fail?

Problem set 2

Do problems A and B above.
Use your best random numbr generator so far to produce random integers 0, 1, 2, e.g., sequences like
```
    0 2 1 0 1 1 2
```
This simulates the experiment "spin a pointer, see where it stops on a dial with three equal divisions ". Test the resulting output using the expected value and the chi-square test. Include the output and a copy of your program in you write-up. (The latter should be an appendix).
Modify your best random number generator to that it produces random real numbers in the range 0 to 1. Test it by dividing the interval [0,1] into four equal subintervals and finding the relevant statistics: How many numbers in your "experiment" lie in each subinterval. Use the chi-square test to analyze the results.
Use your best random number generator to produce a sequence of 0's and 1's, e.g.,
```
  0 0   1 0   1 1   1 0   1 1   01
```
Notice how we have grouped the data into pairs. Consider next the sequence
```
  0 1 2 1 2 1
```
obtained by adding the members of each pair together. What are the statistical properties of this sequence? What is the probability of a 0, a 1, or a 2? What are the expected value and standard deviation for the random variable X with these outcomes (0, 1, 2)? When you create a long sequence of 0's, 1's, and 2's in this way, does it pass the chi-square test for randomness?
Note that X is a sum of two random variables: X = X_1 + X_2, where X_i is the outcome of a single coin toss: 0 for heads and 1 for tails.

References

David Freedman, Robert Pisani, and Roger Purves, Statistics, W. W. Norton (1978).
The authors strive to develop statistical intution with as little mathematics as possible.
L. Garding and T. Tambour, Algebra for Computer Science, Springer-Verlag (1988).
A good treatment the number theory and modern algebra needed to understand random number geneators, public codes, and other topics.
David Kahaner, Cleve Moler, and Stephen Nash, Numerical Methods and Software (1977)
Donald Knuth, Seminumerical Algorithms (the Art of Computer Programming, volume 2), Addison-Wesley (1969).
Chapter 3 gives an extensive account of random number generators and their numerous pitfalls.
Sheldon Ross, A First Course in Probability, Macmillan (1976).