Class 5. Stat701 Fall 1997

Multiple regression, transformation and prediction.

Last time we reviewed leverage, influence and residuals.


Residuals are useful, but may have two problems:
They retain the scale of the data
A highly leveraged point my drag the regression line toward it giving a misleadingly small residual

Fix this up as follows:
Divide through by the square root of the estimated variance - that is standardizing to get a standard deviation of 1 for the standardized residual
Using Jackknifed residuals. The jackknifed residual is the distance from the point to the line, BUT, when the line is estimated without that particular point.

Each point gets it's own jackknifed residual based on a regression that left that point out. Conceptually, if there were 100 points then you would have to run 100 separate regressions to get the jackknifed residuals, but miraculously there is a simple formula to go from the plain residual to the jackknifed residual, jackknifed = plain/(1 - hii).
The standardized jackknifed residuals are what we will work with and they are termed studentized residuals.
The plot of leverage against studentized residual is useful for identifying points that dominated the regression - they have high leverage and large studentized residual.

Todays material

Assignment 1 handouts and discussion.
Guidelines for exhibits.
How to read in the Berndt data.
Multiple regression - illustrated with production functions.
Link to 608 notes.

Transformation discussion.

Know your transformations -handout
Understand interpretations on the log scale, why log transforms result in percentage change interpretations.


Prediction discussion.

Two types corresponding to the ``data = signal + noise'' paradigm.
Prediction of just the signal or prediction that also includes the noise.
Prediction in the range of the data (interpolation) is pretty safe.
Prediction out of the range of the data (extrapolation) is pretty dangerous.
Prediction for a new observation has 3 sources of uncertainty.
The fit is not quite right - uncertainty in the true regression line.
There's variability about the regression line - noise.
There is uncertainty because this may not be the correct model - model misspecification.

Richard Waterman
Wed Sep 17 22:17:18 EDT 1997