Last modified: Tue Nov 1 14:48:15 EST 2005
by Dean Foster
Statistical Data mining: Streaming searching
Admistrivia
- Collect HW 3
- HW 4 due next thursday.
Biostat: alpha spending rules
- Stopping in clinical trials.
- multiple endpoints / tests.
Alpha spending in variable selection
- Sequentially look at each variable
- Spend some alpha on it to see if it should enter
FWER
- Probability of union less than sum of probabilities
- Union is chance of making even one mistake
- FWER = Family wide error rate = worst chance of error
- Theorem: alpha spending controls FWER at level alpha.
But do we want FWER for prediction?
- FWER guarentees not over fitting
- Draw typical out-of-sample MSE graph
- Out of sample graph version of FWER: It guarentees never going up by alpha by
bad luck
- We want the minimum, not a conservative left point
- Note: Some people argue we don't even want FWER for multiple testing
- Makes more sense to tradeoff between type I and type II error
Better scheme
- For each rejection, give out new alpha to spend
- Called alpha investing rule
- amount given out controls tradeoff between type I and type II error
Better analysis
- 2 x 2 table (see handout) U/V/T/S. V+S=R. U+V=m0 =
number true nulls
FDR: False Discovery Rate
- FDR = E(V/R)
- FDR < alpha is target
- Simes procedure will control FDR
EDC: Excess discovery count
- EDC = E(S - gamma R) + alpha
- EDC > 0 is target
- alpha controls FWER(0)
- gamma controls FDR at rate about 1-gamma
Theorem: alpha investing controls EDC
dean@foster.net