Class 5. Stat701 Fall 1997
Multiple regression, transformation and prediction.
Last time we reviewed
leverage, influence and residuals.
Recap.

- Residuals are useful, but may have two problems:

- They retain the scale of the data

- A highly leveraged point my drag the regression line toward
it giving a misleadingly small residual

- Fix this up as follows:

- Divide through by the square root of the estimated variance -
that is standardizing to get a standard deviation of 1 for
the standardized residual

- Using Jackknifed residuals. The jackknifed residual is the
distance from the point to the line, BUT, when the line is
estimated without that particular point.

- Each point gets it's own jackknifed residual based on a regression that
left that point out. Conceptually, if there were 100 points then you would
have to run 100 separate regressions to get the jackknifed residuals, but
miraculously there is a simple formula to go from the plain residual to the
jackknifed residual, jackknifed = plain/(1 - hii).

- The standardized jackknifed residuals are what we will work with and they are termed studentized residuals.

- The plot of leverage against studentized residual is useful for
identifying points that dominated the regression - they have high leverage
and large studentized residual.
Todays material

- Assignment 1 handouts and discussion.

-
Guidelines for exhibits.

- How to read in the Berndt data.

- Multiple regression - illustrated with production functions.

-
Link to 608 notes.

Transformation discussion.

- Know your transformations -handout

- Understand interpretations on the log scale, why log transforms result
in
percentage change interpretations.

-

- Two types corresponding to the ``data = signal + noise'' paradigm.

- Prediction of just the signal or prediction that also includes the
noise.

- Prediction in the range of the data (interpolation) is pretty safe.

- Prediction out of the range of the data (extrapolation) is pretty dangerous.

- Prediction for a new observation has 3 sources of uncertainty.

- The fit is not quite right - uncertainty in the true regression
line.

- There's variability about the regression line - noise.

- There is uncertainty because this may not be the correct model -
model misspecification.
Richard Waterman
Wed Sep 17 22:17:18 EDT 1997