Distribution of the Regression Slope

  To determine whether particular values for regression slopes are surprising or not, we need to know the distribution of those slopes. Below we derive various characteristics of the distribution of the regression slope.

Formula for the Slope

Earlier we derived an equation for the slope as the weighted average of each individual point's slope "vote." That is,
slope as weighted individual slopes
For purposes of deriving the variance and other purposes, it is convenient to reexpress this conceptual formula in simple mathematical terms. Cancelling the common term from the numerator of the first braceted term and the denominator of the second yields:
reexpression of the slope formula
To simplify this expression further, consider the numerator
reexpression of the slope formula
The last term in this expression equals zero; that is,
numerator rearranged
because the definition of the mean requires
sum of the x deviations must equal zero
Substituting the simpler form of the numerator in the formula for the slope gives:
simplifed formula for the slope
In other words, the slope is simply a weighted average of the original data values where the weight is given by:
expression for the weight in the weighted average for slope
The important point is that the slope is a weighted average or weighted mean of the data values. Thus, according to the Central Limit Theorem, it will tend to have a normal distribution itself. The mean of that normal distribution is the true value of the regression slope. Next we turn to deriving the variance of that normal distribution.

Variance of an Observation

According to the regression model, each observation is

Variance of the Slope

There are two important properties of the variance are that are needed to derive the variance of the slope. First, the variance of the sum of independent or uncorrelated random variables equals the sum of their variances; that is,
variance of a sum of two independent random variables
Second, the variance of a constant times a random variable equals the constant squared times the variance; that is,
variance of constant times random variable
For each value or level of X, each individual observation y observation is a random variable with a variance
variance of Y controling for X
The Y.X subscript indicates that it is the variance of Y when we know X. Applying the rules for sums of variances to the variance of slope expressed as a weighted average yields:
variance of weighted sum of Y's
The sum of the squared weights can be reexpressed as
sum of the squared weights
Further note that the denominator is the key term in computing the variance of X; that is,
variance of x
This allows expression of the sum of the squared weights as
sum of weights expressed as variance of x
Substituting into the equation for the variance of the slope yields the simple expression of that variance in terms of the variance of the data variable, the variance of the predictor variable, and the number of observations.
variance of the slope
 
 
File:
© 1999, Duxbury Press.