Last modified: Thu Nov 10 14:41:33 EST 2005 by Dean Foster

# Statistical Data mining: RKHS

• Jon will finish his lecture next Tuesday.
• I'll talk about RKHS today that is useful in both SVM and regression

## Hilbert spaces, what are they good for?

• My definition of datamining: "p >> n"
• But with sufficient data, all problems are classical statistics
• BUT, if p = infinity, then we are always in the domain of data mining!
• An infinite dimension space is called a Hilbert space
• We need Hilbert instead of Banach since we need a projection operator (i.e. right angles and hence inner products)

## Building a Hilbert space up by hand

• Need an inifintely long feature vector
• Using coordinates will take a long time to describe such an vector
• Use abstract function phi(x).
• x is in Rp
• phi(x) is in H = Rinfinity.
• If phi() is linear, not very interesting
• We can do all of regression if we can compute < phi(x),phi(w) >
• For convienence let K(x,w) = < phi(x),phi(w) >

• Draw picture of donut
• Can't get a good fit using either SVM or regression
• But if we used x2 and w2 we could do it as long as it were a circle.
• We could do elipses if we added sqrt(2) x*w.
• K(x,w) = ... = (x - w)^4 = < x,w > 2
• Ok, very cool, but still finite

## Example 2: radial basis functions

• Draw picture of 3-sphere with curve on it
• Imagine an infinite sphere which locally looks like this
• What does the inner product of two points on the curve look like?
• If close, K(x,w) = 1. If far away K(x,w) --> 0.
• How about: K(x,w) = exp(-|x-w|^2)? Does that work? If so, what does phi(a) look like?

## Difficult to describe phi, easy to describe K

• Suppose we don't want to be explicite about phi. Can we say it exists without exhibiting it?
• If K() is nice we can do this.
• Using functional analysis: phi(x) = sum < x,bi > bi for basis b
• It is sufficient to consider phi(x) = K(x,.) a function in space Rp
• It might have redundent basis elements--but we can Gram-Smidt them away.
• It might miss some basis elements--but then they never appear so are not needed. We can live on a smaller Hilbert space.

## Which K work?

• K(x,w) must be:
1. continuous.
2. symetric. (I.e. K(x,w) = K(w,x))
3. positive definite. (i.e. acts like an inner product for finite number of x's.)
4. Called a Mercer kernel.
Theorem: If K is a Mercer kernel, then there exists a phi and a H such that K(x,w) = < phi(x),phi(w) > .
dean@foster.net