Last modified: Tue Dec 6 14:22:01 EST 2005 by Dean Foster

Admistrivia

I'll write a homework 5 by thursday, if you all's promiss to turn in HW 3 and 4 by thursday. Deal?
If so, then no project, and no final.

Worst case modeling

Alternative models for data

Standard model
- data = signal + noise
- noise is random
- goal: recover true signal
Function recovery
- data = signal + noise
- sum(noise²) is bounded
- goal: recover true signal
PAC (Probably Approximately Correct) learning
- data = f(X)
- X is noisy
- goal: recover f
Worst case / individual sequence
- data comes in a sequence
- alternative models
- goal: fit future as well as any possible model

Comments on models

PAC motivates Boosting, but not much else
Function recovery framework and hi-d normal are identical
Doesn't worst sound cool?
worst case is motivation for LZ algorithm

Attraction for data mining

Given our search through millions of variables, we don't believe we have the true model.
So why not get rid of the idea altogether?
If it worked for all data wouldn't that be cool?

General theorem

consider known set of forecasts F.
goal: sum loss(Y - yhat) = min loss(Y - f)
Problem: Oops, might do too well.
better goal: sum loss(Y - yhat) <= min loss(Y - f)
Theorem: if loss is bounded and F is compact, then such a yhat exists.

Examples

Least squares
L1 (used in classification)
log loss (used in information theory)
guaranteed calibration (used in game theory)

Whats to like

Dispense with all those nasty independence assumptions
No distributional assumptions
Nada!

Whats to dislike

Doesn't do robust statistics
I.e. one outlier kills the comparison class and y-hat, so what's the biggee???
Doesn't guarentee a smart fit
I.e. Suppose all comparison class experts are linear, you still can have curvature in the residuals

dean@foster.net