Last modified: Tue Sep 27 12:16:53 EDT 2005 by Dean Foster

Statistical Data mining: High dimensions

First intuition: Always think p > n

Classical statistics has p finite, and n close to infinite.

Short and fat data has p bigger than n. Natural limit is either n fixed and p goes to infinity. Or both go to infinity.

Mike Steele's example of the square and the circle.
Theorem: All random vectors are approximately orthoganal.
- consider X_i a d-dimensional normal
- if we have d of them, we span the d-dimensional space
- But the d+1'st variable we add is still almost orthagonal to the d variables.
- Holds true up to exponentially many (Bonus homework: prove this!)
Above theorem is used in quick proof of shrinkage. (Draw picture to confuse students. Use Pythagoras's proof: Behold.)

dean@foster.net