Last modified: Tue Sep 27 12:17:21 EDT 2005 by Dean Foster

Statistical Data mining: Stepwise regression

Admistrivia

Review complexity via sorting algorithms: O(n^2) vs. O(2^n). (use cards)
Can exponential problems be interesting?
- No, since we can never know if the answer were correct.
- Yes, since sometimes sometimes checking is easier than deriving.
These class of problems are called "NP".
- No known way to actually solve them quickly.
- Even quantium computers might not solve them quickly. (Open conjecture)
- All good machine learning problems fall in this class.
Aside: NP complete
- The whole class is either all easy, or all hard
- If one problem (say TSP/3STAT, or stepwise regression).......
- Sudoku are good example of NP complete problem
Millennium prize

Must phrase it carefully
Do there exist q < p variables such that the R-sq'd is bigger than x?
Certainly this is polynomial time checkable (do the regression)
Can it be found quickly?
- Try the best first? (Fails since might lead down wrong path)
- Add them all and eliminate useless ones (fails since p might be bigger than n)
reduce it to another previously shown to be hard problem
So if someone says, "I've got a fast way to solve subsets regression" they shouldn't be telling statisticians, instead they should be telling the Clay foundation.

dean@foster.net