Class 5. Stat701 Fall 1997
Multiple regression, transformation and prediction.
Last time we reviewed
leverage, influence and residuals.
Recap.
- Residuals are useful, but may have two problems:
- They retain the scale of the data
- A highly leveraged point my drag the regression line toward
it giving a misleadingly small residual
- Fix this up as follows:
- Divide through by the square root of the estimated variance -
that is standardizing to get a standard deviation of 1 for
the standardized residual
- Using Jackknifed residuals. The jackknifed residual is the
distance from the point to the line, BUT, when the line is
estimated without that particular point.
- Each point gets it's own jackknifed residual based on a regression that
left that point out. Conceptually, if there were 100 points then you would
have to run 100 separate regressions to get the jackknifed residuals, but
miraculously there is a simple formula to go from the plain residual to the
jackknifed residual, jackknifed = plain/(1 - hii).
- The standardized jackknifed residuals are what we will work with and they are termed studentized residuals.
- The plot of leverage against studentized residual is useful for
identifying points that dominated the regression - they have high leverage
and large studentized residual.
Todays material
- Assignment 1 handouts and discussion.
-
Guidelines for exhibits.
- How to read in the Berndt data.
- Multiple regression - illustrated with production functions.
-
Link to 608 notes.
Transformation discussion.
- Know your transformations -handout
- Understand interpretations on the log scale, why log transforms result
in
percentage change interpretations.
-
- Two types corresponding to the ``data = signal + noise'' paradigm.
- Prediction of just the signal or prediction that also includes the
noise.
- Prediction in the range of the data (interpolation) is pretty safe.
- Prediction out of the range of the data (extrapolation) is pretty dangerous.
- Prediction for a new observation has 3 sources of uncertainty.
- The fit is not quite right - uncertainty in the true regression
line.
- There's variability about the regression line - noise.
- There is uncertainty because this may not be the correct model -
model misspecification.
Richard Waterman
Wed Sep 17 22:17:18 EDT 1997