Class 5. Stat701 Fall 1997


Multiple regression, transformation and prediction.

Last time we reviewed leverage, influence and residuals.

Recap.

*
Residuals are useful, but may have two problems:
*
They retain the scale of the data
*
A highly leveraged point my drag the regression line toward it giving a misleadingly small residual

*
Fix this up as follows:
*
Divide through by the square root of the estimated variance - that is standardizing to get a standard deviation of 1 for the standardized residual
*
Using Jackknifed residuals. The jackknifed residual is the distance from the point to the line, BUT, when the line is estimated without that particular point.

*
Each point gets it's own jackknifed residual based on a regression that left that point out. Conceptually, if there were 100 points then you would have to run 100 separate regressions to get the jackknifed residuals, but miraculously there is a simple formula to go from the plain residual to the jackknifed residual, jackknifed = plain/(1 - hii).
*
The standardized jackknifed residuals are what we will work with and they are termed studentized residuals.
*
The plot of leverage against studentized residual is useful for identifying points that dominated the regression - they have high leverage and large studentized residual.

Todays material

*
Assignment 1 handouts and discussion.
*
Guidelines for exhibits.
*
How to read in the Berndt data.
*
Multiple regression - illustrated with production functions.
*
Link to 608 notes.
*

Transformation discussion.

*
Know your transformations -handout
*
Understand interpretations on the log scale, why log transforms result in percentage change interpretations.

*

Prediction discussion.

*
Two types corresponding to the ``data = signal + noise'' paradigm.
*
Prediction of just the signal or prediction that also includes the noise.
*
Prediction in the range of the data (interpolation) is pretty safe.
*
Prediction out of the range of the data (extrapolation) is pretty dangerous.
*
Prediction for a new observation has 3 sources of uncertainty.
*
The fit is not quite right - uncertainty in the true regression line.
*
There's variability about the regression line - noise.
*
There is uncertainty because this may not be the correct model - model misspecification.



Richard Waterman
Wed Sep 17 22:17:18 EDT 1997