Last modified: Tue Sep 27 12:16:53 EDT 2005
by Dean Foster

# Statistical Data mining: High dimensions

## First intuition: Always think p > n

Classical statistics has p finite, and n close to infinite.
Short and fat data has p bigger than n. Natural limit is either n
fixed and p goes to infinity. Or both go to infinity.

## How bad is our intuition about large dimensions?

- Mike Steele's example of the square and the circle.
- Theorem: All random vectors are approximately orthoganal.
- consider X
_{i} a d-dimensional normal
- if we have d of them, we span the d-dimensional space
- But the d+1'st variable we add is still almost orthagonal
to the d variables.
- Holds true up to exponentially many (Bonus homework: prove
this!)

- Above theorem is used in quick proof of shrinkage. (Draw
picture to confuse students. Use Pythagoras's proof: Behold.)

dean@foster.net