Statistics 701: Large data sets
Statistics 701: Large Data sets
Announcement
Discuss overfitting in last homework (Why did some people achieve Bonferroni bounds?)
Mention group project (email me your list of members). First come first listed for times. There will be a bonus for going on first day.
Time series is due next week.
I'll post HW 5 in the next week or so.
Cross validation
(See handout)
What is cross validation good for?
First question: What is the RMSE?
use 6 points to get the RMSE within a factor of 2.
use 13 points to get the RMSE within 50% accuracy.
use 35 points to get the RMSE within 25% accuracy.
use 50 points to get the RMSE within 20% accuracy.
use 137 points to get the RMSE within 10% accuracy.
use 13,700 points to get the RMSE within 1% accuracy.
Second question: Which is better CART/MARS/Neural nets?
Save 1/2 data for "out of sample."
Do 10 fold cross-validation (which saves out 10%, but does it 10 times)
Trees
(see handout)
Last modified: Thu Nov 13 16:30:07 EST 2003