Statistical Data Mining

Statistical Data Mining

Schedule (T Th 3:00-4:30 in G92)

General information

Some estimate that there is now 4 exabytes of data being produced each year. This is a different world than that which Fisher pioneered. He developed a theory that can deal with a 2x2 contingency table which might have a total of 4 bytes of data in it. This 1018 increase in data is changing the world of statistics. The goal if this course is follow this change.

Exactly what data mining is depends on who you talk to. For example, Andrew Moore takes a very wide view of data mining. He includes lovely topics from economics (i.e. game theory) to topics from classical AI (i.e. A* algorithm). This will contrast with the approach I will take. I'll focus much more highly on statistical regression.

I've written a crude outline of what the course will cover.


This course is targeted at PhD students. Some mathematical sophistication will be assumed. You will be expected to carefully read research papers. The primary statistics tool will be regression, so at least a few weeks of background on that would be desirable. If you are unsure, send me an email and we can chat.

Last modified: Thu Sep 25 13:00:07 EDT 2014