Last modified: Tue Sep 27 12:16:40 EDT 2005
by Dean Foster
Statistical Data mining: Bonferroni
Admistrivia
- Please do the reading
- (we have ways to know what you read!)
- When DRM come around this will be actually be true.
Can't compute best fit in general, so assume orthoganal
- Avoid colinearity
- optimization problem is now seperatable
- Optimize each coordinate individually.
Which variables should be included?
- Adding any variable increases insample fit
So: Add them all
- Adding any variable increases insample fit
- But in sample fit is not an unbiased estimate of out-of-sample
fit.
- Adjusted-R-sqd is "unbiased" if no selection is done
- Adding a variable with t &le 1 actually decreases the the
adjusted-r-sqd.
So: Add all of them with a t > 1
- If |beta| < 1 fitting by zero is better than fitting by
betahat.
- E(t2) = beta2/SE2 + 1
- So t^2 - 1 is an unbiased estimate of
beta2
- Related to Stein's unbiased estimate of risk (more on this later)
- IDEA: if less than 1 leave it out, otherwise put it in
- This is the idea of: AIC, Cp
So: Add all of them with a |t| > sqrt(2)
- But assumptions for AIC are violated if you use do variable
selection.
- So lets focus on selection
- |t| > sqrt(2 log p) rarely occur under the null
So: Add all of them with a |t| > sqrt(2 log p)
- Called Bonferroni
- Prove tail of the null: Phi(x) < phi(x) for x < -1.
- Prove sum of events: P(U Ai) &le sum P(Ai)
- First suggested by Foster and George (for regression) / Donoho
and Johnstone (for wavelets)
dean@foster.net