Last modified: Tue Dec 6 14:22:01 EST 2005
by Dean Foster
Admistrivia
- I'll write a homework 5 by thursday, if you all's promiss
to turn in HW 3 and 4 by thursday. Deal?
- If so, then no project, and no final.
Worst case modeling
Alternative models for data
- Standard model
- data = signal + noise
- noise is random
- goal: recover true signal
- Function recovery
- data = signal + noise
- sum(noise2) is bounded
- goal: recover true signal
- PAC (Probably Approximately Correct) learning
- data = f(X)
- X is noisy
- goal: recover f
- Worst case / individual sequence
- data comes in a sequence
- alternative models
- goal: fit future as well as any possible model
Comments on models
- PAC motivates Boosting, but not much else
- Function recovery framework and hi-d normal are identical
- Doesn't worst sound cool?
- worst case is motivation for LZ algorithm
Attraction for data mining
- Given our search through millions of variables, we don't
believe we have the true model.
- So why not get rid of the idea altogether?
- If it worked for all data wouldn't that be cool?
General theorem
- consider known set of forecasts F.
- goal: sum loss(Y - yhat) = min loss(Y - f)
- Problem: Oops, might do too well.
- better goal: sum loss(Y - yhat) <= min loss(Y - f)
- Theorem: if loss is bounded and F is compact, then such a yhat
exists.
Examples
- Least squares
- L1 (used in classification)
- log loss (used in information theory)
- guaranteed calibration (used in game theory)
Whats to like
- Dispense with all those nasty independence assumptions
- No distributional assumptions
- Nada!
Whats to dislike
- Doesn't do robust statistics
- I.e. one outlier kills the comparison class and y-hat, so
what's the biggee???
- Doesn't guarentee a smart fit
- I.e. Suppose all comparison class experts are linear, you
still can have curvature in the residuals
dean@foster.net