Last modified: Thu Nov 10 14:41:33 EST 2005
by Dean Foster
Statistical Data mining: RKHS
Admistrivia
- Jon will finish his lecture next Tuesday.
- I'll talk about RKHS today that is useful in both SVM and regression
Hilbert spaces, what are they good for?
- My definition of datamining: "p >> n"
- But with sufficient data, all problems are classical
statistics
- BUT, if p = infinity, then we are always in the domain of data
mining!
- An infinite dimension space is called a Hilbert space
- We need Hilbert instead of Banach since we need a projection
operator (i.e. right angles and hence inner products)
Building a Hilbert space up by hand
- Need an inifintely long feature vector
- Using coordinates will take a long time to describe such an
vector
- Use abstract function phi(x).
- x is in Rp
- phi(x) is in H = Rinfinity.
- If phi() is linear, not very interesting
- We can do all of regression if we can compute
< phi(x),phi(w) >
- For convienence let K(x,w) = < phi(x),phi(w) >
Example 1: quadradic kernel
- Draw picture of donut
- Can't get a good fit using either SVM or regression
- But if we used x2 and w2 we could do
it as long as it were a circle.
- We could do elipses if we added sqrt(2) x*w.
- K(x,w) = ... = (x - w)^4 = < x,w > 2
- Ok, very cool, but still finite
Example 2: radial basis functions
- Draw picture of 3-sphere with curve on it
- Imagine an infinite sphere which locally looks like this
- What does the inner product of two points on the curve look
like?
- If close, K(x,w) = 1. If far away K(x,w) --> 0.
- How about: K(x,w) = exp(-|x-w|^2)? Does that work? If so,
what does phi(a) look like?
Difficult to describe phi, easy to describe K
- Suppose we don't want to be explicite about phi. Can we say
it exists without exhibiting it?
- If K() is nice we can do this.
- Using functional analysis: phi(x) = sum
< x,bi > bi for basis b
- It is sufficient to consider phi(x) = K(x,.) a function in
space Rp
- It might have redundent basis elements--but we can Gram-Smidt
them away.
- It might miss some basis elements--but then they never appear
so are not needed. We can live on a smaller Hilbert space.
Which K work?
- K(x,w) must be:
- continuous.
- symetric. (I.e. K(x,w) = K(w,x))
- positive definite. (i.e. acts like an inner product for
finite number of x's.)
- Called a Mercer kernel.
Theorem: If K is a Mercer kernel, then there exists a phi and a H such
that K(x,w) = < phi(x),phi(w) > .
dean@foster.net