Last modified: Thu Dec 8 14:53:17 EST 2005
by Dean Foster
Admistrivia
- Homework 5 is on line. Due Dec 19th.
- None are supposed to be killers. So if you are having
problems, send me an email. I probably phrase the question wrong or
thought it was much easier than it really is. So I'll rewrite it in
either case.
- As usual questions on homework? Not everyone is done.
Summary and open problems
Review: What we have learned this semester
- High dim. data takes its own way of thinking
- Get to study old things all over again
- For example: how would design of experiments look in 1 Million
dimensions?
- Minor philosophy issues go away: p-values vs. posterior are
irrevelant at bonferroni scale
- Good statistical reasoning still the foundation
Why am I interested in data mining?
- I trained in AI before it did statistics
- They solved noise by fuzzy logic
- Tried to cover ALL exceptions in definitions
- Generally an abasimal failure
- I flunked out since I kept saying: "Do statistics"
- Its payback time!
- Big picture: Old school AI with actual answers
- Small picture: game of "Go"
- What keeps me doing it:
- Relevant theory!
- exciting applications!
- Get me drunk enough, and I'll even tell you about the
singilarity. In fact, you won't be able to stop me from telling you
about it!
So you want to do a PhD in data mining?
Why: Easy Advances are possible
- Since it is a new area, a little theory and a little
application goes a long way
- Become an expert with respect. (Everyone thinks they know as
much stochastic processes as they want to know already.)
We have some of the best people working on it
- Adi does the information theory and Boosting
- Andreas does high dimenstion visualation
- Abba does applications
- Bob, Lyle and I have a research group with several on going
projects.
Applications
- Marketing
- Biology
- Genes
- Proteins
- Function between these
- Doctors / medicine
- Astronomy
- Weather
- Linguistics
- text mining
- search (all of google)
- text understanding
- vison
Theory
- Compromizes between classification and forecasting
- Finding new features
- Functional analysis stuff
- Bayes nets
- trees
- Pick any two above and combine
Connection to machine learning
- Recursive structure: Ontological leaps
- Different kind of models: connection between many things
- Very different approaches: same goals
- Advantage of doing data mining is you can publish 3 times:
- in ML conference (turn around 3 months)
- in domain area (turn around 1 year)
- in statistics (turn around 3 years)
- We have a strong group on campus (raided Bell Labs a few years
back when they were busy folding.)
dean@foster.net