Last modified: Tue Sep 27 12:16:40 EDT 2005 by Dean Foster

Statistical Data mining: Bonferroni

Admistrivia

Please do the reading
(we have ways to know what you read!)
When DRM come around this will be actually be true.

Can't compute best fit in general, so assume orthoganal

Avoid colinearity
optimization problem is now seperatable
Optimize each coordinate individually.

Which variables should be included?

Adding any variable increases insample fit

So: Add them all

Adding any variable increases insample fit
But in sample fit is not an unbiased estimate of out-of-sample fit.
Adjusted-R-sqd is "unbiased" if no selection is done
Adding a variable with t &le 1 actually decreases the the adjusted-r-sqd.

So: Add all of them with a t > 1

If |beta| < 1 fitting by zero is better than fitting by betahat.
E(t²) = beta²/SE² + 1
So t^{^2} - 1 is an unbiased estimate of beta²
Related to Stein's unbiased estimate of risk (more on this later)
IDEA: if less than 1 leave it out, otherwise put it in
This is the idea of: AIC, C_p

So: Add all of them with a |t| > sqrt(2)

But assumptions for AIC are violated if you use do variable selection.
So lets focus on selection
|t| > sqrt(2 log p) rarely occur under the null

So: Add all of them with a |t| > sqrt(2 log p)

Called Bonferroni
- Prove tail of the null: Phi(x) < phi(x) for x < -1.
- Prove sum of events: P(U A_i) &le sum P(A_i)
First suggested by Foster and George (for regression) / Donoho and Johnstone (for wavelets)

dean@foster.net