- Read Myers chapter 6
- new homework problem

- Plot the data, run a simple regression and create prediction
bounds. Does the data appear homoskedastic?
- Transform the data to do a weighted least squares. Try using
both a standard deviation proportional to X and portional to
squareroot of X. Plot both. Which appears to be more
homoskedastic?
- Use the White estimator (the sandwich estimator) to generate
standard errors for the slope and intercept.
- Discussion question: (Please type up a one page answer to the
following.) Compare the confidence intervals for the slope in
each of the methods above. Which ones do you believe? Are the
ones that are theoretically wrong qualitatively wrong?
Our theory suggests that the intercept should be zero. Which
is the correct test to run? Do we fail to reject the null? Do
any of the other test incorrectly reject the null?
- Pick the weighted least squares model that appears to be the
most homoskedastic. Now use the White estimator on that
model. Does it change the SE's very much?
- The envelope please: Add up all the rooms cleaned. Add up
all the crews. Divide these two to come up with an average
number of rooms cleaned per crew. This should match one of the
slopes you computed above. (You could also compute a standard
error by hand to see which confidence intervals above are the
closest to describing the right answer. But you don't have to
do this.)
## Leverage

- How much do betas depend on a single point? (this leads to p x n matrix of leverages)
- How much do predictions depend on a single point? (this leads a a vector of leverages)
- How much does prediction of i depend on value of i?
- prediction is X
_{i}beta-hat. - beta-hat = (X'X)
^{-1}X'Y - prediction is X
_{i}(X'X)^{-1}X'Y - To convert Y to a prediction: (X (X'X)
^{-1}X')Y - hat matrix: h = (X (X'X)
^{-1}X') - Called projection matrix. Or "hat matrix" since it puts a hat on y.
- h
_{ii}is dy-hat/dy--called leverate

- prediction is X
- Nice property: depends ONLY on X's. So if you decide to toss a point out based on its leverage, you aren't biasing your results.
- Use mahalobious to "see" leverage

## Influence = leverage x outlier

- A point that isn't leveraged doesn't effect outcome very much
- a point that isn't an outlier doesn't effect outcome very much
- need both to be influential
- draw pictures of various outliers: MBA call them, cottages, direct mail, and crime)

## Various definitions of influence

- DFFITS = change in forecast i if you leave out observation i
- (y-hat
_{i}- y_{i,-i})/SE(y-hat) - R-student sqrt(h/1-h)
- looking at squared values, and noting that h is approximately zero leads to influence = leverage x outlier concept

- (y-hat
- DFBETAS = change if slope j if you leave out obseration i
- no easy formula
- no longer related to leverage (hat
_{ii}) - depends on scale of beta--change scale of X changes value of DFBETAS

- DFALL = Cooks D = change in all betas = change in all
predictions
- use "natural" parameter space/prediction loss
- Cooks D is squared-distance change in adding point i
- tells how much parameters in natural basis
- tells how much all the predictions change on average
- r
^{2}h/(p(1-h)) - again justifies outlier = leverage x outlier

Last modified: Thu Feb 22 08:41:05 2001