Last modified: Thu Nov 10 14:41:33 EST 2005 by Dean Foster

Statistical Data mining: RKHS

Admistrivia

Jon will finish his lecture next Tuesday.
I'll talk about RKHS today that is useful in both SVM and regression

Hilbert spaces, what are they good for?

My definition of datamining: "p >> n"
But with sufficient data, all problems are classical statistics
BUT, if p = infinity, then we are always in the domain of data mining!
An infinite dimension space is called a Hilbert space
We need Hilbert instead of Banach since we need a projection operator (i.e. right angles and hence inner products)

Building a Hilbert space up by hand

Need an inifintely long feature vector
Using coordinates will take a long time to describe such an vector
Use abstract function phi(x).
x is in R^p
phi(x) is in H = R^infinity.
If phi() is linear, not very interesting
We can do all of regression if we can compute < phi(x),phi(w) >
For convienence let K(x,w) = < phi(x),phi(w) >

Example 1: quadradic kernel

Draw picture of donut
Can't get a good fit using either SVM or regression
But if we used x² and w² we could do it as long as it were a circle.
We could do elipses if we added sqrt(2) x*w.
K(x,w) = ... = (x - w)^4 = < x,w > ²
Ok, very cool, but still finite

Example 2: radial basis functions

Draw picture of 3-sphere with curve on it
Imagine an infinite sphere which locally looks like this
What does the inner product of two points on the curve look like?
If close, K(x,w) = 1. If far away K(x,w) --> 0.
How about: K(x,w) = exp(-|x-w|^2)? Does that work? If so, what does phi(a) look like?

Difficult to describe phi, easy to describe K

Suppose we don't want to be explicite about phi. Can we say it exists without exhibiting it?
If K() is nice we can do this.
Using functional analysis: phi(x) = sum < x,b_i > b_i for basis b
It is sufficient to consider phi(x) = K(x,.) a function in space R^p
It might have redundent basis elements--but we can Gram-Smidt them away.
It might miss some basis elements--but then they never appear so are not needed. We can live on a smaller Hilbert space.

Which K work?

K(x,w) must be:
1. continuous.
2. symetric. (I.e. K(x,w) = K(w,x))
3. positive definite. (i.e. acts like an inner product for finite number of x's.)
4. Called a Mercer kernel.

Theorem: If K is a Mercer kernel, then there exists a phi and a H such that K(x,w) = < phi(x),phi(w) > .

dean@foster.net