Mathematics 1040-1: An Introduction to Statistical Thinking
University of Utah, Fall 2003
Lectures' Case Studies (Oct 22, 2003)

This lecture is the starting-point of regression. We will study regression more deeply than the text, so you may wish to keep a copy of these notes, and study them carefully. Throughout,

r denotes the correlation between the variables x (the explanatory variable) and y (the response variable).

The Regression Line

What is the best predictor of the variable y, using only the variable x? This was answered by C. Gauss using his method of least squares. The "formula" is:

y in standard units = r times x in standard units.
Alternatively, the regression line:

  1. Goes through the point of averages; i.e., the point whose x-coordinate is the average of x, and whose y-coordinate is the average of y; and
  2. has slope = r times SD(y) divided by SD(x).
My advice, if you want it, is for you to find a good understanding using one of the two formulations. By a good understanding, I mean one which allows you to predict y from x in various settings (e.g., the assigned homework exercises, and the examples that are worked out during the lectures).

A Quick Case Study

In the femur vs. humerus example of yore, we had:
   x = femur length (in cm);
   y = humerus length (in cm).
Check that the following hold:
   average(x) = 58.2 cm
   average(y) = 66 cm
        SD(x) = 13.20 cm
        SD(y) = 15.89 cm.
We have not yet discussed how the correlation r is computed (nor what it really is). So let me just tell you that r is 0.994. This is excellent correlation (what does this mean?). So regression predictions should be good.

Let us use the second description of regression to find the equation of the regression line: We have y = a + bx, and we know that b is the slope. I.e.,

   b = r SD(y)/SD(x) 
     = 0.994 times 15.89/13.20
     =  1.197,    approximately.
So we know that y = a + 1.197 x. What is a? We know that when x = average(x)=58.2, then y=average(y)=66. So
   66 = a + ( 1.197 times 58.2).
Solve for a (math 1030) to get a = -3.66. That is, the equation of the regression line is
   y = -3.66 + 1.197 x.
So, for example, if we find a femure bone of one of this species that is 80 cm long, then our regression estimate for the length of its humerus is
   y = -3.66 + 1.196 times 80 = 92 cm,   approximately.

Warning. You cannot solve for x to regress y vs. x that way. The regression equation is not an algebraic equation. It is a statistical estimate! To see if you understand this notion, try the following. If you do not succeed, do not be discouraged; it is a subtlenotion. Seek help until you understand this notion.

Question: What is the regression estimate for the femur length of one such fossil whose humerus length turned out to be 42 cm long?

Disclaimer
© 2003 by the Dept of Math. University of Utah