Last modified: Thu Nov 3 14:11:17 EST 2005
by Dean Foster

# Statistical Data mining: Streaming searching

## Admistrivia

## Recall definitions of V/R/S/T/etc

## FDR: False Discovery Rate

- FDR = E(V/R)
- FDR < alpha is target
- Simes procedure will control FDR

## EDC: Excess discovery count

- EDC = E(S - gamma R) + alpha
- EDC > 0 is target
- alpha controls FWER(0)
- gamma controls FDR at rate about 1-gamma

## Theorem: alpha investing controls EDC

## Simes proceedure for FDR

- Does alpha investing control FDR? Unknown.
- What does control FDR? Simes does.
- order the p-values
- Try first at alpha/m, second at 2 alpha/m, etc
- Once you fail to reject, stop.

- If independent tests it is "easy" to prove controls the FDR.
- In worse case, it is "easy" to you lose a log(m) factor

