- Homework:
- I typed in table 7.7 on pages 330 - 332 from Myers. Load it into your favorite software for analysis. (You might scan it for errors!)
- In spite of the fact that we can't actually look at any
good plots, we still need to check assumptions.
We don't have to worry about hetroskadasticity since we
KNOW the variance. But we have to worry about curvature.
So add varables corresponding to vol
^{2}and rate^{2}. Are the quadratic terms significantly better? - Maybe some other transformation of the X's makes sense?
Instead of using a linear term for volumn use a 4th degree
polynomial. Does it fit signficantly better than the
linear fit? (In other words, you want to have (vol, rate,
vol
^{2}, vol^{3}, vol^{4}) as X's in your regression. THen test if vol^{2}, vol^{3}, vol^{4}are collectively all zero.) - Suppose we are interested in predicting the point vol=5, rate=3. First describe a measure of how much of an extrapolation this is (Mahalanobis would be one possible measure for this.) Now make prediction intervals using your linear model and your quadratic model for this point. Which prediction would you prefer to champion?
- Consider the point (ave(vol),ave(rate)), in other words, the center of the data. What is the Mahalanobis measure for this point? Make predictions using both the linear and the quadratic model. Does it matter which interval you use?
- Type up a paragraph saying what model do you think is best
for fitting this data. You don't have to restrict
yourself to the models listed above--you can try other
transformations. For example, you might consider the
interaction (vol times rate) in the presence of the
quadratic terms (namely vol, rate, vol
^{2}, rate^{2}, vol*rate). This is called a quadradic surface.

- Logistic gives one particular form.
- Adding polynomials will possibly fix it.
- But has strange modeling assumptions for large values of X.
- Either goes to 1, goes to zero or goes to "Unknown."

- The regression doesn't know which X to extrapolate with
- So it will give wide intervals that match the "last good part of the data."
- If we now do trimmed polynomials--we avoid extrapolation AND can fit any function
- Unfortunately, most software will break due to colinearity problems. Oh well.

We have two approaches. We can simply use the weights given by the last round of the IRLS, or we can use a likelihood based method.

Likelihood method for standard regression:

- Consider X-bar = Normal(mu,sigma
^{2}/n) - We compute t = X-bar*sqrt(n)/sigma = Normal(mu*sqrt(n)/sigma,1)
- likelihood under null takes t = Normal(0,1)
- likelihood under mu-hat t = Normal(xbar*sqrt(n)/sigma,1)
- So the likelihood ratio is: exp(-t
^{2}/2) - So the log likelihood ratio is: -t
^{2}/2 - So a p-value = .05 implies t = 2, implies lambda = -2
- So intuition is a likelihood difference of 2 is about p-value of .05
- In general, (- 2 *log likelihood) has about a chi-squared distribution

- Compute the likelihood ratio statistic: - 2 log likilhood
- Reject at .05 if bigger than 4

- EG: Y regressed on X where X takes on value A/a and Y takes on values B/b.
- What is the model?
- P(B|A)/P(b|A) = exp(alpha)
- P(B|a)/P(b|a) = exp(alpha + beta)
- Independence iff P(B|A)/P(b|A) = P(B|a)/P(b|a)
- I.e. independence iff beta = 0

- Typical test is chi-squared test. Gives almost same answer--but chi-square is an approximation
- Permutation tests also available

Last modified: Tue Mar 27 08:57:19 2001

*
*