Last modified: Tue Dec  6 14:22:01 EST 2005
by Dean Foster
Admistrivia
   -  I'll write a homework 5 by thursday, if you all's promiss
to turn in HW 3 and 4 by thursday.  Deal?
   
-  If so, then no project, and no final.
Worst case modeling
Alternative models for data
   -  Standard model
     
     -  data = signal + noise
     
-  noise is random
     
-  goal: recover true signal
     
 
-  Function recovery
     
     -  data = signal + noise
     
-  sum(noise2) is bounded
     
-  goal: recover true signal
     
 
-  PAC (Probably Approximately Correct) learning
     
     -  data = f(X)
     
-  X is noisy
     
-  goal: recover f
     
 
-  Worst case / individual sequence
     
     -  data comes in a sequence
     
-  alternative models
     
-  goal: fit future as well as any possible model
     
 
Comments on models
   -  PAC motivates Boosting, but not much else
   
-  Function recovery framework and hi-d normal are identical
   
-  Doesn't worst sound cool?
   
-  worst case is motivation for LZ algorithm 
Attraction for data mining
   -  Given our search through millions of variables, we don't
believe we have the true model.
   
-  So why not get rid of the idea altogether?
   
-  If it worked for all data wouldn't that be cool?
General theorem
   -  consider known set of forecasts F.
   
-  goal: sum loss(Y - yhat) = min loss(Y - f) 
   
-  Problem:  Oops, might do too well.  
   
-  better goal: sum loss(Y - yhat) <= min loss(Y - f)
   
-  Theorem: if loss is bounded and F is compact, then such a yhat
exists. 
Examples
   -  Least squares
   
-  L1 (used in classification)
   
-  log loss (used in information theory)
   
-  guaranteed calibration (used in game theory)
Whats to like
   -  Dispense with all those nasty independence assumptions
   
-  No distributional assumptions
   
-  Nada!
Whats to dislike
   -  Doesn't do robust statistics
   
-  I.e. one outlier kills the comparison class and y-hat, so
what's the biggee???
   
-  Doesn't guarentee a smart fit
   
-  I.e. Suppose all comparison class experts are linear, you
still can have curvature in the residuals
dean@foster.net