Last modified: Tue Sep 27 12:16:40 EDT 2005
by Dean Foster

# Statistical Data mining: Bonferroni

## Admistrivia

- Please do the reading
- (we have ways to know what you read!)
- When DRM come around this will be actually be true.

## Can't compute best fit in general, so assume orthoganal

- Avoid colinearity
- optimization problem is now seperatable
- Optimize each coordinate individually.

## Which variables should be included?

- Adding any variable increases insample fit

### So: Add them all

- Adding any variable increases insample fit
- But in sample fit is not an unbiased estimate of out-of-sample
fit.
- Adjusted-R-sqd is "unbiased" if no selection is done
- Adding a variable with t &le 1 actually decreases the the
adjusted-r-sqd.

### So: Add all of them with a t > 1

- If |beta| < 1 fitting by zero is better than fitting by
betahat.
- E(t
^{2}) = beta^{2}/SE^{2} + 1
- So t
^{^2} - 1 is an unbiased estimate of
beta^{2}
- Related to Stein's unbiased estimate of risk (more on this later)
- IDEA: if less than 1 leave it out, otherwise put it in
- This is the idea of: AIC, C
_{p}

### So: Add all of them with a |t| > sqrt(2)

- But assumptions for AIC are violated if you use do variable
selection.
- So lets focus on selection
- |t| > sqrt(2 log p) rarely occur under the null

### So: Add all of them with a |t| > sqrt(2 log p)

- Called Bonferroni
- Prove tail of the null: Phi(x) < phi(x) for x < -1.
- Prove sum of events: P(U A
_{i}) &le sum P(A_{i})

- First suggested by Foster and George (for regression) / Donoho
and Johnstone (for wavelets)

dean@foster.net