Last modified: Thu Oct 6 14:36:52 EDT 2005
by Dean Foster
Statistical Data mining: Alternatives to proper scoring rules
Admistrivia
- OPIM is running a data mining seminar
(Tuesday at noon, in G50.) You should all go. This tuesday my friend
and long term coauthor and generally great speaker is talking. (He
will be talking about calibration.)
2 x 2 table of outcomes
- action taken x outcome
- 4 numbers: TP, FP, TN, FN (P=test result is positive)
- Marginals will be called: P,N, S=sick, H=health
- How much time can we spend analysing them?
- Note: action might be something like: I(p > .5). The p has
been lost.
Alternatives to proper scoring rules
- ROC curves
- Type I, type II error graph: easy to understand: FN/S, FP/H
- alpha = FP/H
- beta = FN/S
- power = (sensitivity) = TP/S
- ROC is transposed version of this: specificity=TN/H,
sensitivity=TP/S (oppisite of error rates)
- sensitivity = fraction true positives
- specificity = fraction true negatives
- Both increase as threashold decreases
- All concept of claimed probability is lost
- Precision / recall
- recall = TP/S
- precision = TP/P
- F measure = 2 * precision * recall / (precision + recall)
- Area under the curve (ROC typically)
- lift charts
Look at error graph
- Intepret slope
- Note: parametric curve
- If slope = parameter, called prediction is called calibrated
Calibration
- Plot frequency vs claimed probability
- See handout from talk
- Assuming calibration: AUC, F-measure and others start making
sense
- Without calibration: you are getting things truely wrong
dean@foster.net