Last modified: Tue Sep 27 12:16:40 EDT 2005 by Dean Foster

# Statistical Data mining: Bonferroni

• (we have ways to know what you read!)
• When DRM come around this will be actually be true.

## Can't compute best fit in general, so assume orthoganal

• Avoid colinearity
• optimization problem is now seperatable
• Optimize each coordinate individually.

## Which variables should be included?

• Adding any variable increases insample fit

### So: Add them all

• Adding any variable increases insample fit
• But in sample fit is not an unbiased estimate of out-of-sample fit.
• Adjusted-R-sqd is "unbiased" if no selection is done
• Adding a variable with t &le 1 actually decreases the the adjusted-r-sqd.

### So: Add all of them with a t > 1

• If |beta| < 1 fitting by zero is better than fitting by betahat.
• E(t2) = beta2/SE2 + 1
• So t^2 - 1 is an unbiased estimate of beta2
• Related to Stein's unbiased estimate of risk (more on this later)
• IDEA: if less than 1 leave it out, otherwise put it in
• This is the idea of: AIC, Cp

### So: Add all of them with a |t| > sqrt(2)

• But assumptions for AIC are violated if you use do variable selection.
• So lets focus on selection
• |t| > sqrt(2 log p) rarely occur under the null

### So: Add all of them with a |t| > sqrt(2 log p)

• Called Bonferroni
• Prove tail of the null: Phi(x) < phi(x) for x < -1.
• Prove sum of events: P(U Ai) &le sum P(Ai)
• First suggested by Foster and George (for regression) / Donoho and Johnstone (for wavelets)
dean@foster.net