Last modified: Thu Oct 6 14:36:52 EDT 2005 by Dean Foster

Statistical Data mining: Alternatives to proper scoring rules

Admistrivia

OPIM is running a data mining seminar (Tuesday at noon, in G50.) You should all go. This tuesday my friend and long term coauthor and generally great speaker is talking. (He will be talking about calibration.)

ROC curves
- Type I, type II error graph: easy to understand: FN/S, FP/H
  - alpha = FP/H
  - beta = FN/S
  - power = (sensitivity) = TP/S
- ROC is transposed version of this: specificity=TN/H, sensitivity=TP/S (oppisite of error rates)
- sensitivity = fraction true positives
- specificity = fraction true negatives
- Both increase as threashold decreases
- All concept of claimed probability is lost
Precision / recall
- recall = TP/S
- precision = TP/P
- F measure = 2 * precision * recall / (precision + recall)
Area under the curve (ROC typically)
lift charts
- Y = TP/P vs P

dean@foster.net