Last modified: Mon Aug 22 15:53:00 EDT 2005
by Dean Foster
Statistical Data mining: OUTLINE
 Outline: (26 days of class)
  
  -  Introduction: The modeling spectrium (1 class)
     
       -  linear regression: the paradign low dimention model
       
-  nearest neighbor: the paradign infinite dimention model
       
-  course goals: find a happy middle ground
      
 
-  Introduction to high dimensional data (intuition isn't a good
guide anymore) 
  
-  Variable Selection: (4 classes)
     
       -  Bonferroni/Risk inflation
       
-  FDR/Simes
       
-  alpha spending, alpha investing
       
-  Information theory
     
 
-  Wavelets: Such pretty pictures!
  
-  Loss functions (1 class)
     
       -  Classification loss
       
-  proper scoring rules: mixtures of clasification loss
       
-  KL divergence, quadratic losses
     
 
-  Lasso (1 classes) l1-priors
  
-  regularization (1 class) l2 priors
  
-  Computing p-values (3 classes)
     
       -  White estimator / GEEs
       
-  Bennett's bound and other "tight" probabilistic bounds
     
 
-  variable creation (4 classes)
     
       -  interactions
       
-  missing data
       
-  RKHS
       
-  PCA
       
-  tree stubs
     
 
-  Searching for natural kinds (2 classes)
     
  
-  Text data (2 classes)
     
       -  The wikipedia
       
-  bag of words model
       
-  Using other peoples parses
     
 
-  Inductive Logic Programming (2 classes)
     
       -  citation graphs (i.e. links in wikipedia / www)
       
-  expanding activation
     
 
-  Non-regression methods (4 classes)
     
       -  SVM
       
-  Trees
       
-  boosting
       
-  comparison to regression
     
 
(current total: 27 classes)
Last modified: Thu Oct  6 12:02:12 EDT 2005