Last modified: Thu Nov 10 14:41:33 EST 2005
by Dean Foster

# Statistical Data mining: RKHS

## Admistrivia

- Jon will finish his lecture next Tuesday.
- I'll talk about RKHS today that is useful in both SVM and regression

## Hilbert spaces, what are they good for?

- My definition of datamining: "p >> n"
- But with sufficient data, all problems are classical
statistics
- BUT, if p = infinity, then we are always in the domain of data
mining!
- An infinite dimension space is called a Hilbert space
- We need Hilbert instead of Banach since we need a projection
operator (i.e. right angles and hence inner products)

## Building a Hilbert space up by hand

- Need an inifintely long feature vector
- Using coordinates will take a long time to describe such an
vector
- Use abstract function phi(x).
- x is in R
^{p}
- phi(x) is in H = R
^{infinity}.
- If phi() is linear, not very interesting
- We can do all of regression if we can compute
< phi(x),phi(w) >
- For convienence let K(x,w) = < phi(x),phi(w) >

## Example 1: quadradic kernel

- Draw picture of donut
- Can't get a good fit using either SVM or regression
- But if we used x
^{2} and w^{2} we could do
it as long as it were a circle.
- We could do elipses if we added sqrt(2) x*w.
- K(x,w) = ... = (x - w)^4 = < x,w >
^{2}
- Ok, very cool, but still finite

## Example 2: radial basis functions

- Draw picture of 3-sphere with curve on it
- Imagine an infinite sphere which locally looks like this
- What does the inner product of two points on the curve look
like?
- If close, K(x,w) = 1. If far away K(x,w) --> 0.
- How about: K(x,w) = exp(-|x-w|^2)? Does that work? If so,
what does phi(a) look like?

## Difficult to describe phi, easy to describe K

- Suppose we don't want to be explicite about phi. Can we say
it exists without exhibiting it?
- If K() is nice we can do this.
- Using functional analysis: phi(x) = sum
< x,b
_{i} > b_{i} for basis b
- It is sufficient to consider phi(x) = K(x,.) a function in
space R
^{p}
- It might have redundent basis elements--but we can Gram-Smidt
them away.
- It might miss some basis elements--but then they never appear
so are not needed. We can live on a smaller Hilbert space.

## Which K work?

- K(x,w) must be:
- continuous.
- symetric. (I.e. K(x,w) = K(w,x))
- positive definite. (i.e. acts like an inner product for
finite number of x's.)
- Called a Mercer kernel.

Theorem: If K is a Mercer kernel, then there exists a phi and a H such
that K(x,w) = < phi(x),phi(w) > .

dean@foster.net