Last modified: Thu Oct 27 14:46:59 EDT 2005
by Dean Foster
Statistical Data mining: Streaming searching
Admistrivia
ID1 stepwise regression
- Nearest neighbor 1
- As we discussed previously, need to do subspace selection
- Modelling after forward stepwise
- Test a bunch of features
- Evaluate each on to see how it performs
- Add the best feature
Classification evaluation
- calinski-harabasz
- T. Calinski and J. Harabasz. A dendrite method for cluster
analysis. Communications in statistics, 3(1):1--27, 1974. 2.
- "C&H use the Variance Ratio Criterion which is analogous to
F-Statistics to estimate the number of clusters a given data naturally
falls into. They minimize Within Cluster/Group Sum of Squares (WGSS)
and maximize Between Cluster/Group Sum of Squares (BGSS)" cpan
Beam search and streaming
- Beam as a compromize between breadth first and depth first
- Beams of width one: Same as depth first search
- Beams of unbounded with is stepwise in this context
- Not address in paper: order of variables
dean@foster.net