Last modified: Tue Nov 29 14:53:10 EST 2005 by Dean Foster

Admistrivia

What to do with all those clusters?
If clusters are "natural kinds" then additive models of clusters make sense.
Can add them to a regression (as indicator variables)
Not all that interesting if only one set of clusters
But, if there are many sets of clusters, additive models could generate interesting models.
Alternatively, can use clusters in a tree.
Suppose there are cluster types A,B,...,C
Draw tree built out of these.
Trees of natural kinds is motivation for:
- CART
- ID3
- C4.5 / C5.0
- MARS
- Context trees

Trees

fits is the average in the node
Draw picture: Note it is kinda bumpy
In leaf nodes, use average (This is the description for CART)
In leaf node, use most popular value in classification (This is the idea for C4.5)

As a forester (3nd generation) I know pruning trees is important
Easy to consider lots of comparisions
Allows seperation of searching from purning
All one needs is a good fitting function
See for example: ( Variable Length Markov Chains. P. Buhlmann and A.J. Wyner, The Annals of Statistics, Vol. 27, No. 2, pp. 480-513, 1999.)