Last modified: Mon Aug 22 15:53:00 EDT 2005
by Dean Foster
Statistical Data mining: OUTLINE
Outline: (26 days of class)
- Introduction: The modeling spectrium (1 class)
- linear regression: the paradign low dimention model
- nearest neighbor: the paradign infinite dimention model
- course goals: find a happy middle ground
- Introduction to high dimensional data (intuition isn't a good
guide anymore)
- Variable Selection: (4 classes)
- Bonferroni/Risk inflation
- FDR/Simes
- alpha spending, alpha investing
- Information theory
- Wavelets: Such pretty pictures!
- Loss functions (1 class)
- Classification loss
- proper scoring rules: mixtures of clasification loss
- KL divergence, quadratic losses
- Lasso (1 classes) l1-priors
- regularization (1 class) l2 priors
- Computing p-values (3 classes)
- White estimator / GEEs
- Bennett's bound and other "tight" probabilistic bounds
- variable creation (4 classes)
- interactions
- missing data
- RKHS
- PCA
- tree stubs
- Searching for natural kinds (2 classes)
- Text data (2 classes)
- The wikipedia
- bag of words model
- Using other peoples parses
- Inductive Logic Programming (2 classes)
- citation graphs (i.e. links in wikipedia / www)
- expanding activation
- Non-regression methods (4 classes)
- SVM
- Trees
- boosting
- comparison to regression
(current total: 27 classes)
Last modified: Thu Oct 6 12:02:12 EDT 2005