Last modified: Thu Oct 27 14:46:59 EDT 2005
by Dean Foster

# Statistical Data mining: Streaming searching

## Admistrivia

## ID1 stepwise regression

- Nearest neighbor 1
- As we discussed previously, need to do subspace selection
- Modelling after forward stepwise
- Test a bunch of features
- Evaluate each on to see how it performs
- Add the best feature

## Classification evaluation

- calinski-harabasz
- T. Calinski and J. Harabasz. A dendrite method for cluster
analysis. Communications in statistics, 3(1):1--27, 1974. 2.
- "C&H use the Variance Ratio Criterion which is analogous to
F-Statistics to estimate the number of clusters a given data naturally
falls into. They minimize Within Cluster/Group Sum of Squares (WGSS)
and maximize Between Cluster/Group Sum of Squares (BGSS)" cpan

## Beam search and streaming

- Beam as a compromize between breadth first and depth first
- Beams of width one: Same as depth first search
- Beams of unbounded with is stepwise in this context
- Not address in paper: order of variables

dean@foster.net